Description
First of all, thanks for this great crate! I've been enjoying using it in a side project I'm working o (grimp), which builds a graph of a python package's internal structure and import statements and then uses pathfinding to find paths between different subpackages and enforce constraints/contracts around the architecture (e.g. constraints like "subpackage foo should not import subpackage bar"). Enforcing these contsraints boils down to doing pathfinding calculation on the graph of imports. For these pathfinding calculations I've been using the BFS algorithms of this crate, since the imports form an unweighted, directed graph.
Problem
The details of grimp probably aren't super interesting to you (sorry for the rambling), but one thing I noticed while doing this work was that it would be really helpful (for me at least!) if the pathfinding algorithms could accept multiple start nodes. For example, in the case of "subpackage foo should not import subpackage bar" we want to find the shortest path between foo and bar. However, foo and bar both packages i.e. they contain many modules i.e. they represent many nodes of the graph. So we want to find the shortest path between many start nodes and many end nodes (shortest path between any of the start nodes to any of the end nodes).
One way to implement this query with the current crate is as a for loop over all the start nodes in the foo package.
let mut paths = vec![];
for foo_module in foo_modules {
paths.push(bfs(&foo_module, ..., |m| bar_modules.contains(m)));
}
// Now look at all the paths in `paths` and pick the shortest.
This typically has poor performance since many of these paths will overlap. As a workaround, to still allow using this crate I implemented a pathfinding node enum:
enum PathfindingNode {
Initial, // `Initial` will expand to all the start nodes.
Module(Module)
}
The shortest path can then be found in a single BFS, which is much more performant:
let path = bfs(
&PathfindingNode::Initial,
|m| {
match m {
PathfindingNode::Initial => foo_modules,
PathfindingNode::Module(module) => ...
}
},
|m| match m {
PathfindingNode::Initial => false,
PathfindingNode::Module(module) => bar_modules.contains(module),
},
);
// Post-process `path` to skip the first node (`PathfindingNode::Initial`) and
// unwrap the rest (unwrap the `PathfindingNode::Module`s)
However, it's not really nice to do this workaround. It's verbose and fiddly. It would be much better IMO if the pathfinding crate would support multiple start nodes directly (for the algorithms where this makes sense). I'm especially interested in the BFS algorithms.
Suggested solution
For algorithms where it makes sense - e.g. starting with BFS - allow passing a set of start nodes rather than just a single start node.
I think it should be possible to do this in a backwards compatible way via something like:
// Introduce a new `NodeSet<N>` type with an implementation of `From<N>`:
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct NodeSet<N: Eq + Hash + Clone>(FxHashSet<N>);
impl<N: Eq + Hash + Clone> NodeSet<N> {
pub fn new(nodes: impl IntoIterator<Item = N>) -> Self {
Self(nodes.into_iter().collect())
}
}
impl<N: Eq + Hash + Clone> Deref for NodeSet<N> {
type Target = FxHashSet<N>;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl<N: Eq + Hash + Clone> From<N> for NodeSet<N> {
fn from(value: N) -> Self {
NodeSet::new([value])
}
}
impl<N: Eq + Hash + Clone> From<&N> for NodeSet<N> {
fn from(value: &N) -> Self {
NodeSet::new([value.clone()])
}
}
// The changes to BFS then look something like this:
pub fn bfs<N, S, FN, IN, FS>(starts: S, successors: FN, success: FS) -> Option<Vec<N>>
where
N: Eq + Hash + Clone,
S: Into<NodeSet<N>>,
FN: FnMut(&N) -> IN,
IN: IntoIterator<Item = N>,
FS: FnMut(&N) -> bool,
{
let starts: NodeSet<N> = starts.into();
bfs_core(starts, successors, success, true)
}
// Since `S: Into<NodeSet<N>>` and both `N` and `&N` implement
// `Into<NodeSet<N>>` then it should still be possible to pass a single start node into `bfs`.
fn bfs_core<N, S, FN, IN, FS>(
starts: S,
mut successors: FN,
mut success: FS,
check_first: bool,
) -> Option<Vec<N>>
where
N: Eq + Hash + Clone,
S: Into<NodeSet<N>>,
FN: FnMut(&N) -> IN,
IN: IntoIterator<Item = N>,
FS: FnMut(&N) -> bool,
{
let starts = starts.into().0;
if check_first {
let successful_starts = starts
.clone()
.into_iter()
.filter(|start| success(start))
.collect::<Vec<_>>();
if !successful_starts.is_empty() {
return Some(vec![successful_starts.first().unwrap().clone()]);
}
}
let mut i = 0;
let mut parents: FxIndexMap<N, usize> = FxIndexMap::default();
for start in starts {
parents.insert(start.clone(), usize::MAX);
}
while let Some((node, _)) = parents.get_index(i) {
for successor in successors(node) {
if success(&successor) {
let mut path = reverse_path(&parents, |&p| p, i);
path.push(successor);
return Some(path);
}
if let Vacant(e) = parents.entry(successor) {
e.insert(i);
}
}
i += 1;
}
None
}
If we like this idea I would probably be able to help implement the changes, at least for BFS where I feel confident of my understanding.
P.S. One other pain point I had while using this crate for grimp was the lack of a bidirectional BFS. I ended up implementing my own, and saw that there was an issue open here for that so I've offered my implementation to pathfinding in this PR. I actually want to combine these two ideas - for grimp I really want a many-many, bidirectional BFS (this is really fast for the problems grimp is solving!).
FYI @seddonym tagging you here since you might find this interesting.