Cypher: Calculating Shortest Weighted Path

June 24th, 2013  |  Published in Cypher, Neo4j by Ian Robinson

This example shows a simple solution for calculating the shortest weighted path between two nodes in a reasonably dense graph (say 30 relationships per node) using Cypher.

Consider the following graph:

shortest-weighted-path

Let’s say we want to calculate the shortest weighted path between A and I. All we have to do is sum the weights along each path. Doing this, we find that (A)->(B)->(C)->(I) is weight 15, (A)->(D)->(E)->(I) is weight 12, and (A)->(F)->(G)->(H)->(I) is weight 8. Therefore, (A)->(F)->(G)->(H)->(I) is the shortest weighted path, even though it has one more relationship than either of the other two paths.

Implementation

The Cypher to accomplish this is as follows:

START  startNode=node:node_auto_index(name={startNode}),
       endNode=node:node_auto_index(name={endNode})
MATCH  p=(startNode)-[:CONNECTED_TO*1..4]->(endNode)
RETURN p AS shortestPath, 
       reduce(weight=0, r in relationships(p) : weight+r.weight) AS totalWeight
       ORDER BY totalWeight ASC
       LIMIT 1

The summing of weights is accomplished using Cypher’s reduce function.

For modestly sized graphs with up to 30 relationships per node, this query performs reasonably well. For very large, extremely densely connected graphs, however, it’s not going to cut it: those types of graph require an implementation of a shortest weighted path algorithm such as Dijkstra or A*. Running on a MacBook Air against a graph comprising 1 million nodes, 12.5 million relationships, with between 4 and 20 outgoing CONNECTED_TO relationships per node (average 12), average execution time is 120 ms. Increasing the depth of the search and/or the density of the graph will increase query times.

This example has been tested against Neo4j 1.9. The code for the example can be downloaded from here.

Alternative implementation

Instead of using r in relationships(p) to iterate the relationships along each path, we could bind a path’s CONNECTED_TO relationships to an identifier instead, like this:

MATCH   p=(startNode)-[rels:CONNECTED_TO*1..4]->(endNode)
RETURN  p AS shortestPath, 
        reduce(weight=0, r in rels : weight+r.weight) AS totalWeight

My tests, however, show that using r in relationships(p) is faster than using identifiers.

Atom   RSS 2.0   Email

Comments are closed.