## Cypher: Calculating Shortest Weighted Path

June 24th, 2013 | Published in Cypher, Neo4j by Ian Robinson

This example shows a simple solution for calculating the shortest weighted path between two nodes in a reasonably dense graph (say 30 relationships per node) using Cypher.

Consider the following graph:

Let’s say we want to calculate the shortest weighted path between `A`

and `I`

. All we have to do is sum the `weight`

s along each path. Doing this, we find that `(A)->(B)->(C)->(I)`

is weight 15, `(A)->(D)->(E)->(I)`

is weight 12, and `(A)->(F)->(G)->(H)->(I)`

is weight 8. Therefore, `(A)->(F)->(G)->(H)->(I)`

is the shortest weighted path, even though it has one more relationship than either of the other two paths.

### Implementation

The Cypher to accomplish this is as follows:

```
START startNode=node:node_auto_index(name={startNode}),
endNode=node:node_auto_index(name={endNode})
MATCH p=(startNode)-[:CONNECTED_TO*1..4]->(endNode)
RETURN p AS shortestPath,
reduce(weight=0, r in relationships(p) : weight+r.weight) AS totalWeight
ORDER BY totalWeight ASC
LIMIT 1
```

The summing of `weight`

s is accomplished using Cypher’s `reduce`

function.

For modestly sized graphs with up to 30 relationships per node, this query performs reasonably well. For very large, extremely densely connected graphs, however, it’s not going to cut it: those types of graph require an implementation of a shortest weighted path algorithm such as Dijkstra or A*. Running on a MacBook Air against a graph comprising 1 million nodes, 12.5 million relationships, with between 4 and 20 outgoing `CONNECTED_TO`

relationships per node (average 12), average execution time is 120 ms. Increasing the depth of the search and/or the density of the graph will increase query times.

This example has been tested against Neo4j 1.9. The code for the example can be downloaded from here.

### Alternative implementation

Instead of using `r in relationships(p)`

to iterate the relationships along each path, we could bind a path’s `CONNECTED_TO`

relationships to an identifier instead, like this:

```
MATCH p=(startNode)-[rels:CONNECTED_TO*1..4]->(endNode)
RETURN p AS shortestPath,
reduce(weight=0, r in rels : weight+r.weight) AS totalWeight
```

My tests, however, show that using `r in relationships(p)`

is faster than using identifiers.