Archive for June, 2013

Cypher: Calculating Shortest Weighted Path

June 24th, 2013  |  Published in Cypher, Neo4j by Ian Robinson

This example shows a simple solution for calculating the shortest weighted path between two nodes in a reasonably dense graph (say 30 relationships per node) using Cypher.

Consider the following graph:

shortest-weighted-path

Let’s say we want to calculate the shortest weighted path between A and I. All we have to do is sum the weights along each path. Doing this, we find that (A)->(B)->(C)->(I) is weight 15, (A)->(D)->(E)->(I) is weight 12, and (A)->(F)->(G)->(H)->(I) is weight 8. Therefore, (A)->(F)->(G)->(H)->(I) is the shortest weighted path, even though it has one more relationship than either of the other two paths.

Implementation

The Cypher to accomplish this is as follows:

START  startNode=node:node_auto_index(name={startNode}),
       endNode=node:node_auto_index(name={endNode})
MATCH  p=(startNode)-[:CONNECTED_TO*1..4]->(endNode)
RETURN p AS shortestPath, 
       reduce(weight=0, r in relationships(p) : weight+r.weight) AS totalWeight
       ORDER BY totalWeight ASC
       LIMIT 1

The summing of weights is accomplished using Cypher’s reduce function.

For modestly sized graphs with up to 30 relationships per node, this query performs reasonably well. For very large, extremely densely connected graphs, however, it’s not going to cut it: those types of graph require an implementation of a shortest weighted path algorithm such as Dijkstra or A*. Running on a MacBook Air against a graph comprising 1 million nodes, 12.5 million relationships, with between 4 and 20 outgoing CONNECTED_TO relationships per node (average 12), average execution time is 120 ms. Increasing the depth of the search and/or the density of the graph will increase query times.

This example has been tested against Neo4j 1.9. The code for the example can be downloaded from here.

Alternative implementation

Instead of using r in relationships(p) to iterate the relationships along each path, we could bind a path’s CONNECTED_TO relationships to an identifier instead, like this:

MATCH   p=(startNode)-[rels:CONNECTED_TO*1..4]->(endNode)
RETURN  p AS shortestPath, 
        reduce(weight=0, r in rels : weight+r.weight) AS totalWeight

My tests, however, show that using r in relationships(p) is faster than using identifiers.

Atom   RSS 2.0   Email

Cypher: Adding a Node to a Linked List

June 20th, 2013  |  Published in Cypher, Neo4j by Ian Robinson

This example shows how to add a node to the head of a linked list using Cypher.

Let’s assume we have a graph containing an indexed node that can act as the list’s owner. What we want to do is build up a linked list, with the head of the list attached to the owner node using a HEAD relationship. When a new node is added to the list, it becomes the head node. The HEAD relationship is detached from the old head node and attached to this new head node. At the same time, the new head node is linked to the old head node using a PREV relationship.

The diagram below shows a linked list comprising three nodes. The list is attached to an owner node named ‘my-list’.

linked-list

Here’s the Cypher for adding a node to the head of a linked list:

START owner=node:node_auto_index(name={listName})
CREATE UNIQUE (owner)-[:HEAD]->(newHead{newElement})
WITH owner, newHead
MATCH (newHead)<-[:HEAD]-(owner)-[oldRel:HEAD]->(oldHead)
DELETE oldRel
CREATE (newHead)-[:PREV]->(oldHead)

To build the linked list shown above, we would execute this Cypher statement three times.

How it works

The START clause looks up the node representing the list’s owner in an index. The CREATE UNIQUE clause then creates a new node and links it to the owner using a HEAD relationship. The {newElement} parameter here allows us to pass a map of the new node’s properties to the query.

list-1

Having created the new head node, we then pipe it and the owner node to a second query using WITH.

In this second query we try to match the list owner, the new head node and the old head node (if there is one):

list-2

If we find an old head node, we delete its HEAD relationship, because it is no longer valid. We then create a PREV relationship that links the new head node to this old head node:

list-3

The MATCH clause here is similar to a pattern matching clause in a functional language: if Cypher can match the (newHead)<-[:HEAD]-(owner)-[oldRel:HEAD]->(oldHead) pattern in the graph, it will invoke the subsequent DELETE and CREATE clauses; otherwise it will terminate without deleting or creating any further relationships.

This example has been tested against Neo4j 1.8.1 and 1.9. The code for the example can be downloaded from here.

Atom   RSS 2.0   Email

Running the Graph Databases Book Examples

June 11th, 2013  |  Published in Graph Databases, Neo4j by Ian Robinson

The current version of Neo4j has two different types of index: named indexes and automatic indexes. The Cypher query examples throughout the Graph Databases book use named indexes. In this query, for example, venue, city and author are named node indexes:

START theater=node:venue(name='Theatre Royal'), 
      newcastle=node:city(name='Newcastle'), 
      bard=node:author(lastname='Shakespeare')
MATCH (newcastle)<-[:STREET|CITY*1..2]-(theater) 
      <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]-> 
      (play)<-[:WROTE_PLAY]-(bard)
RETURN DISTINCT play.title AS play

There’s a problem, however: data created using a Cypher CREATE statement won’t be indexed in a named index. This has led to some confusion for anyone wanting to code along with the examples: if you use the CREATE statements as published, the query examples won’t work.

(You can find out more about Neo4j indexes in the documentation.)

If you want to code along with the examples, the easiest thing to do is create the data using Cypher and automatic indexes. You’ll need to modify the queries to use the automatic indexes, but the changes aren’t complicated.

Configure Automatic Indexes

The examples below show how to run the Shakespeare examples (pp.47-50) using automatic indexes.

Before you create any data, you’ll need to configure the automatic indexes. The approach you take will depend on whether you are running Neo4j in embedded or server mode.

Embedded mode

If running in embedded mode, configure the database at startup:

GraphDatabaseService graphDb = new GraphDatabaseFactory().
    newEmbeddedDatabaseBuilder( storeDirectory ).
    setConfig( GraphDatabaseSettings.node_keys_indexable, "name,lastname" ).
    setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ).
    newGraphDatabase();

Server mode

If running in server mode and executing Cypher statements through the console or using the REST API, add the following to the conf/neo4j.properties file.

node_auto_indexing=true
node_keys_indexable=name,lastname

Modify the Queries

The Shakespeare queries should then be modified as follows. In each case we’ve replaced a named node index (e.g. node:author) with an automatic node index (node:node_auto_index).

p.47:

START theater=node:node_auto_index(name='Theatre Royal'), 
      newcastle=node:node_auto_index(name='Newcastle'), 
      bard=node:node_auto_index(lastname='Shakespeare')
MATCH (newcastle)<-[:STREET|CITY*1..2]-(theater)
      <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]-> 
      (play)<-[:WROTE_PLAY]-(bard)
RETURN DISTINCT play.title AS play

p.48:

START theater=node:node_auto_index(name='Theatre Royal'),
      newcastle=node:node_auto_index(name='Newcastle'), 
      bard=node:node_auto_index(lastname='Shakespeare')
MATCH (newcastle)<-[:STREET|CITY*1..2]-(theater)
      <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]-> 
      (play)<-[w:WROTE_PLAY]-(bard)
WHERE w.year > 1608
RETURN DISTINCT play.title AS play

p.49:

START theater=node:node_auto_index(name='Theatre Royal'), 
      newcastle=node:node_auto_index(name='Newcastle'), 
      bard=node:node_auto_index(lastname='Shakespeare')
MATCH (newcastle)<-[:STREET|CITY*1..2]-(theater)
      <-[:VENUE]-()-[p:PERFORMANCE_OF]->()-[:PRODUCTION_OF]-> 
      (play)<-[:WROTE_PLAY]-(bard)
RETURN play.title AS play, count(p) AS performance_count 
ORDER BY performance_count DESC

p.50:

START bard=node:node_auto_index(lastname='Shakespeare') 
MATCH (bard)-[w:WROTE_PLAY]->(play)
WITH play
ORDER BY w.year DESC
RETURN collect(play.title) AS plays

Download Working Examples

The code that we used to generate most of the examples for the book, including the use cases, can be found on my GitHub account.

Atom   RSS 2.0   Email