Cypher: Calculating Shortest Weighted Path

June 24th, 2013  |  Published in Cypher, Neo4j by Ian Robinson

This example shows a simple solution for calculating the shortest weighted path between two nodes in a reasonably dense graph (say 30 relationships per node) using Cypher.

Consider the following graph:

shortest-weighted-path

Let’s say we want to calculate the shortest weighted path between A and I. All we have to do is sum the weights along each path. Doing this, we find that (A)->(B)->(C)->(I) is weight 15, (A)->(D)->(E)->(I) is weight 12, and (A)->(F)->(G)->(H)->(I) is weight 8. Therefore, (A)->(F)->(G)->(H)->(I) is the shortest weighted path, even though it has one more relationship than either of the other two paths.

Implementation

The Cypher to accomplish this is as follows:

START  startNode=node:node_auto_index(name={startNode}),
       endNode=node:node_auto_index(name={endNode})
MATCH  p=(startNode)-[:CONNECTED_TO*1..4]->(endNode)
RETURN p AS shortestPath, 
       reduce(weight=0, r in relationships(p) : weight+r.weight) AS totalWeight
       ORDER BY totalWeight ASC
       LIMIT 1

The summing of weights is accomplished using Cypher’s reduce function.

For modestly sized graphs with up to 30 relationships per node, this query performs reasonably well. For very large, extremely densely connected graphs, however, it’s not going to cut it: those types of graph require an implementation of a shortest weighted path algorithm such as Dijkstra or A*. Running on a MacBook Air against a graph comprising 1 million nodes, 12.5 million relationships, with between 4 and 20 outgoing CONNECTED_TO relationships per node (average 12), average execution time is 120 ms. Increasing the depth of the search and/or the density of the graph will increase query times.

This example has been tested against Neo4j 1.9. The code for the example can be downloaded from here.

Alternative implementation

Instead of using r in relationships(p) to iterate the relationships along each path, we could bind a path’s CONNECTED_TO relationships to an identifier instead, like this:

MATCH   p=(startNode)-[rels:CONNECTED_TO*1..4]->(endNode)
RETURN  p AS shortestPath, 
        reduce(weight=0, r in rels : weight+r.weight) AS totalWeight

My tests, however, show that using r in relationships(p) is faster than using identifiers.

Atom   RSS 2.0   Email

Cypher: Adding a Node to a Linked List

June 20th, 2013  |  Published in Cypher, Neo4j by Ian Robinson

This example shows how to add a node to the head of a linked list using Cypher.

Let’s assume we have a graph containing an indexed node that can act as the list’s owner. What we want to do is build up a linked list, with the head of the list attached to the owner node using a HEAD relationship. When a new node is added to the list, it becomes the head node. The HEAD relationship is detached from the old head node and attached to this new head node. At the same time, the new head node is linked to the old head node using a PREV relationship.

The diagram below shows a linked list comprising three nodes. The list is attached to an owner node named ‘my-list’.

linked-list

Here’s the Cypher for adding a node to the head of a linked list:

START owner=node:node_auto_index(name={listName})
CREATE UNIQUE (owner)-[:HEAD]->(newHead{newElement})
WITH owner, newHead
MATCH (newHead)<-[:HEAD]-(owner)-[oldRel:HEAD]->(oldHead)
DELETE oldRel
CREATE (newHead)-[:PREV]->(oldHead)

To build the linked list shown above, we would execute this Cypher statement three times.

How it works

The START clause looks up the node representing the list’s owner in an index. The CREATE UNIQUE clause then creates a new node and links it to the owner using a HEAD relationship. The {newElement} parameter here allows us to pass a map of the new node’s properties to the query.

list-1

Having created the new head node, we then pipe it and the owner node to a second query using WITH.

In this second query we try to match the list owner, the new head node and the old head node (if there is one):

list-2

If we find an old head node, we delete its HEAD relationship, because it is no longer valid. We then create a PREV relationship that links the new head node to this old head node:

list-3

The MATCH clause here is similar to a pattern matching clause in a functional language: if Cypher can match the (newHead)<-[:HEAD]-(owner)-[oldRel:HEAD]->(oldHead) pattern in the graph, it will invoke the subsequent DELETE and CREATE clauses; otherwise it will terminate without deleting or creating any further relationships.

This example has been tested against Neo4j 1.8.1 and 1.9. The code for the example can be downloaded from here.

Atom   RSS 2.0   Email

Running the Graph Databases Book Examples

June 11th, 2013  |  Published in Graph Databases, Neo4j by Ian Robinson

The current version of Neo4j has two different types of index: named indexes and automatic indexes. The Cypher query examples throughout the Graph Databases book use named indexes. In this query, for example, venue, city and author are named node indexes:

START theater=node:venue(name='Theatre Royal'), 
      newcastle=node:city(name='Newcastle'), 
      bard=node:author(lastname='Shakespeare')
MATCH (newcastle)<-[:STREET|CITY*1..2]-(theater) 
      <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]-> 
      (play)<-[:WROTE_PLAY]-(bard)
RETURN DISTINCT play.title AS play

There’s a problem, however: data created using a Cypher CREATE statement won’t be indexed in a named index. This has led to some confusion for anyone wanting to code along with the examples: if you use the CREATE statements as published, the query examples won’t work.

(You can find out more about Neo4j indexes in the documentation.)

If you want to code along with the examples, the easiest thing to do is create the data using Cypher and automatic indexes. You’ll need to modify the queries to use the automatic indexes, but the changes aren’t complicated.

Configure Automatic Indexes

The examples below show how to run the Shakespeare examples (pp.47-50) using automatic indexes.

Before you create any data, you’ll need to configure the automatic indexes. The approach you take will depend on whether you are running Neo4j in embedded or server mode.

Embedded mode

If running in embedded mode, configure the database at startup:

GraphDatabaseService graphDb = new GraphDatabaseFactory().
    newEmbeddedDatabaseBuilder( storeDirectory ).
    setConfig( GraphDatabaseSettings.node_keys_indexable, "name,lastname" ).
    setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ).
    newGraphDatabase();

Server mode

If running in server mode and executing Cypher statements through the console or using the REST API, add the following to the conf/neo4j.properties file.

node_auto_indexing=true
node_keys_indexable=name,lastname

Modify the Queries

The Shakespeare queries should then be modified as follows. In each case we’ve replaced a named node index (e.g. node:author) with an automatic node index (node:node_auto_index).

p.47:

START theater=node:node_auto_index(name='Theatre Royal'), 
      newcastle=node:node_auto_index(name='Newcastle'), 
      bard=node:node_auto_index(lastname='Shakespeare')
MATCH (newcastle)<-[:STREET|CITY*1..2]-(theater)
      <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]-> 
      (play)<-[:WROTE_PLAY]-(bard)
RETURN DISTINCT play.title AS play

p.48:

START theater=node:node_auto_index(name='Theatre Royal'),
      newcastle=node:node_auto_index(name='Newcastle'), 
      bard=node:node_auto_index(lastname='Shakespeare')
MATCH (newcastle)<-[:STREET|CITY*1..2]-(theater)
      <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]-> 
      (play)<-[w:WROTE_PLAY]-(bard)
WHERE w.year > 1608
RETURN DISTINCT play.title AS play

p.49:

START theater=node:node_auto_index(name='Theatre Royal'), 
      newcastle=node:node_auto_index(name='Newcastle'), 
      bard=node:node_auto_index(lastname='Shakespeare')
MATCH (newcastle)<-[:STREET|CITY*1..2]-(theater)
      <-[:VENUE]-()-[p:PERFORMANCE_OF]->()-[:PRODUCTION_OF]-> 
      (play)<-[:WROTE_PLAY]-(bard)
RETURN play.title AS play, count(p) AS performance_count 
ORDER BY performance_count DESC

p.50:

START bard=node:node_auto_index(lastname='Shakespeare') 
MATCH (bard)-[w:WROTE_PLAY]->(play)
WITH play
ORDER BY w.year DESC
RETURN collect(play.title) AS plays

Download Working Examples

The code that we used to generate most of the examples for the book, including the use cases, can be found on my GitHub account.

Atom   RSS 2.0   Email

Progressive .NET Tutorial

August 2nd, 2011  |  Published in Events, REST by Ian Robinson  |  3 Comments

On September 7th I’ll be running a half-day tutorial on building RESTful Web services as part of the Skills Matter and London .NET User Group’s Progressive .NET Tutorials.

This will be a very hands-on tutorial. Over the course of three and a half hours we’ll build a hypermedia-driven service and client in .NET using the new WCF Web APIs. The subject matter hearkens back to a favourite subject of mine, Hydras and Hypermedia, for the application we’ll build will be a simple text-based pick-your-path-to-adventure game.

This isn’t a tutorial on games design, however. Despite the non-enterprisey subject matter, the tutorial serves to illustrate how machine clients and services can cooperate to achieve a useful application goal. Business application analogies lurk around every corner and surprise you with every encounter.

The tutorial comprises three exercises. With each exercise you’ll have to fix a number of broken unit and functional tests. Bit by bit, you’ll build a working application:

  • Exercise 1: Here we implement the server resources that make up the dungeon – the chambers, tunnels and caves in which our daring client will venture.
  • Exercise 2: Next we build a crafty client that can discover a path through the dungeon. By the end of the exercise, the client ought to be able to navigate the dungeon from entrance to exit.
  • Exercise 3: Last, we add an element of danger, populating the dungeon with encounters (more resources) that the client must overcome before it achieves its application goal.

Along the way you’ll learn about:

  • The Atom Syndication Format
  • “What if” client-side intelligence
  • The client as arbiter of application state
  • Hypermedia controls: links, link relations and forms
  • DRY URIs

So come and join me for what promises to be an entertaining hack-n-slash workshop. Attendees should be familiar with C# and unit testing, and have some knowledge of HTTP.

Full details and registration for the entire event can be found here.

3 Comments  |  Atom   RSS 2.0   Email

The Power of the Daleks (and Neo4j)

July 11th, 2011  |  Published in Neo4j by Ian Robinson  |  5 Comments

Jim Webber and I have been busy these last few months writing a tutorial for Neo4j. As we did with REST in Practice, we’ve chosen an example domain that falls a little outside the enterprise norm. With REST in Practice we chose the everyday world of the coffee shop; this time round, we’ve gone for the grand old universe of Doctor Who.

As we’ve worked on the tutorial, we’ve built up a Doctor Who data model that shows how Neo4j can be used to address several different data and domain concerns. For example, part of the dataset includes timeline data, comprising seasons, stories and episodes; elsewhere we’ve social network-like data, with characters connected to one another through being companions, allies or enemies of the Doctor. It’s a messy and densely-connected dataset – much like the data you might find in a real-world enterprise. Some of it is of high quality, some of it is lacking in detail. And for every seeming invariant in the domain, there’s an awkward exception – again, much like the real world. For example, each incarnation of the Doctor has been played by one actor, except the first, who was originally played by William Hartnell, and then later by Richard Hurdnall, who reprised the tetchy first incarnation some years after William Hartnell’s death for the twenty-fifth anniversary story, The Five Doctors. Each episode has a title; at least, they did when the series first began, in 1963, and they do today. But towards the end of the third season, the original series stopped assigning individual episode titles (in the original series, stories typically took place over several episodes); a story’s episodes were simply labelled Part 1, Part 2, etc. And so on.

The Dalek Props

Dalek prop

The earliest alien monsters to appear in Doctor Who, the Daleks have long held a fascination for the British viewing public, and for the enquiring minds of Doctor Who fandom. In 2002, Jon Green began researching the history of the Dalek props created for the original series, which ran from 1963 to 1988. Jon was later joined by a second researcher, Gav, and the result of their hard work is the wonderful Dalek 6388.

Recently, I imported some of this Dalek research into our Doctor Who database. From the wealth of detail available on Dalek 6388 I chose to focus on the ways the different prop parts were reused in different stories.

Each episode or story featuring the Daleks made use of one or more Dalek props. Each prop came in two parts: the top half, called the shoulders, and the bottom half, called the skirt. These parts were handmade, and as a result there were many differences between them: the hemisphere spacing, collars, slats, eyestalks, and gun boxes in particular varied in subtle but noticeable ways. Reconstructing the history of the props has been largely a matter of identifying specific parts in screenshots and photos based on these visual clues.

Over time, the different shoulders and skirts were mixed and matched. For example, the shoulders originally belonging to Dalek 1, one of the props created for the very first Dalek story, The Daleks, were later paired with the skirt belonging to Dalek 5 for the story Day of the Daleks. This composite prop is known as Dalek One-5. A couple of seasons later, the shoulders were paired with the skirt belonging to Dalek 7 for the Daleks’ last outing with Jon Pertwee’s Doctor, Death to the Daleks. This composite prop is known as Dalek One-7.

Structuring the Data

With the details from Dalek 6388 added to our dataset we now have something that resembles a supply chain traceability scheme – another excellent use for Neo4j. Here’s how some of that data is structured (click the image for a closeup view):

Dalek props

The screenshot above shows some of the Dalek prop data as viewed in Neoclipse. Neoclipse allows you to view, create and edit nodes, as well as search any indexes you’ve created for your store. Neo4j’s browser-based Webadmin tool also includes a visualisation tab for viewing a database running in server mode.

The screenshot above show three of the stories in which the Daleks appeared: The Daleks, The Dalek Invasion of Earth and The Power of the Daleks. (They’ve appeared in many more stories – the view here shows only a fraction of the data we have. Even here I’ve only shown full details of the props for one of the stories, The Power of the Daleks. Portions of the data associated with the other two stories are then included to show how the data is interconnected.)

Below each story node is a node representing the group of Dalek props used in that story. I’ve chosen to create these group nodes because there is potentially other data to be included at the group level. For example, each story was assigned a budget for building or maintaining the props used in that story. Moreover, in the research, the voice artists and operators responsible for bringing the Daleks to life are not associated with individual props but with the group as a whole (the details of the voice artists and operators are not, however, currently in our dataset). Hence the Dalek props group nodes USED_IN each story.

Focusing now on the props used in The Power of the Daleks, we can see that four props appeared in the story: Dalek 1, Dalek 2, Dalek 7, and Dalek Six-5. Daleks 1 and 2 also appeared in the two earlier stories, The Daleks and The Dalek Invasion of Earth (together with some other props not shown here), as well as in several other stories not shown in the screenshot. Daleks 7 and Six-5 are here not associated with any other stories, but in the full dataset, Dalek 7 is shown to have appeared in two other stores, The Evil of the Daleks and The Chase, while Dalek Six-5 appeared in four other stories following its debut in the The Power of the Daleks.

Dalek Six-5 is one of those composite props I mentioned previously. It is COMPOSED_OF shoulders whose ORIGINAL_PROP was Dalek 6, and a skirt originally made for Dalek 5. The shoulders, as the graph in the screenshot shows, appeared as part of at least one other composite prop, Dalek Six-7, where they were paired with the skirt belonging to Dalek 7, which also appears in The Power of the Daleks. Dalek Six-5‘s skirt, whose ORIGINAL_PROP was Dalek 5, was at some other time paired with the shoulders from Dalek 1 to create the composite prop Dalek One-5. (To find out which story Dalek One-5 appeared in, all we’d have to do is expand the graph beyond the nodes shown here.)

Querying the Data

The Dalek prop data captures many of the significant domain concepts and relationships described in the Dalek 6388 research. Once we’d built the dataset, I set about asking it some tricky questions. At the top of my list of pressing questions was this one: what’s the hardest working Dalek prop part in showbiz? That is, which of those shoulders and skirts has appeared in more stories than any other part?

Neo4j offers several different ways to query a graph. There’s the Core API, which allows you to deal imperatively with nodes and relationships, and the original Traverser API, which offers a slightly more declarative route into a graph. For more sophisticated traversals there’s the Traversal Framework, which, being more powerful than the Traversal API, has a slightly steeper learning curve. For querying the server there’s the REST API. Next, there’s Gremlin, a powerful graph traversal language with a plugin for Neo4j, and the Pattern Matching library, which is like a regex language for graphs. And finally, introduced in the latest 1.4 milestones, there’s Cypher, a new declarative graph pattern matching language – a SQL for graphs.

In the rest of this post we’ll look at three different ways of figuring out which was the hardest working Dalek prop part. In the first example we’ll look at the Traversal Framework. In the next we’ll use the Pattern Matching library. Finally, we’ll use Cypher.

There’s quite a bit of code here. If you haven’t the patience to wade through it all, do just one thing: go look at the Cypher query. It’s a thing of Dalek-like precision.

Traversal Framework

Here’s what the query to find the hardest working Dalek prop part looks like using the Traversal Framework:

@Test
public void shouldFindTheHardestWorkingPropsUsingTraversalFramework() {
  Node theDaleks = database.index().forNodes("species")
    .get("species", "Dalek").getSingle();

  Traverser traverser = Traversal.description()
    .relationships(Rels.APPEARED_IN, Direction.OUTGOING)
    .relationships(Rels.USED_IN, Direction.INCOMING)
    .relationships(Rels.MEMBER_OF, Direction.INCOMING)
    .relationships(Rels.COMPOSED_OF, Direction.OUTGOING)
    .relationships(Rels.ORIGINAL_PROP, Direction.OUTGOING)
    .depthFirst()
    .evaluator(new Evaluator() {
      @Override
      public Evaluation evaluate(Path path) {
         if (path.lastRelationship() != null && 
            path.lastRelationship().isType(Rels.ORIGINAL_PROP)){
          return Evaluation.INCLUDE_AND_PRUNE;
        }
                        
        return Evaluation.EXCLUDE_AND_CONTINUE;
                        
       }
    })
    .uniqueness(Uniqueness.NONE)
    .traverse(theDaleks);
        
    assertHardestWorkingPropParts(getSortedResults(traverser),
      "Dalek 1 shoulder", 12,
      "Dalek 5 skirt", 12,
      "Dalek 6 shoulder", 12);
}

This example shows an idiomatic usage of the Neo4j Traversal Framework. First, we lookup a starting node for the traversal in an index. In this instance, we look up the Dalek species node. Then we build a traversal description and execute the traversal from this starting node. The traversal crawls the subgraph below the starting node. With each node that it visits, it calls the evaluate() method on the custom evaluator supplied with the description. This method determines whether the traverser is positioned over a node of interest (in this case, an original prop node). If it is, the traverser returns the path to this node to the calling code, and then begins a new travesal further up the subtree. If it isn’t a node of interest, the traverser continues deeper into the subgraph.

The description specifies the types of relationship we’re prepared to follow, and their direction. Every relationship in Neo4j has a type label and a direction. The direction helps provide semantic clarity when dealing with asymmetric relationships between nodes. While a relationship between two nodes is always directed – it has a start node, a direction, and an end node – a traverser can be programmed to follow either an INCOMING or an OUTGOING relationship, and even to ignore the direction entirely (using Direction.BOTH).

Note that we’ve specified that the traversal proceed depth first. This means that from the Dalek species node it will strike out on a depth first hunt for an original prop, following the first APPEARED_IN relationship to an episode and then searching the nodes below that episode. Once it has found a first matching original prop, the traverser returns the path to the prop to the calling code (INCLUDE_AND_PRUNE), and then moves successively back up that path, trying alternate branches, until it works its way all the way back to the species node, from where it tries the next APPEARED_IN relationship.

Here’s the route the traverser takes from the starting Dalek species node through all the nodes below The Power of the Daleks:

  • (Dalek)-[:APPEARED_IN]->(The Power of the Daleks)<-[:USED_IN]-(Daleks)<-[:MEMBER_OF]-(Dalek 7)-[:COMPOSED_OF]->(skirt)-[:ORIGINAL_PROP]->(Dalek 7) INCLUDE_AND_PRUNE
  • -[:COMPOSED_OF]->(shoulder)-[:ORIGINAL_PROP]->(Dalek 7) INCLUDE_AND_PRUNE
  • <-[:MEMBER_OF]-(Dalek Six-5)-[:COMPOSED_OF]->(skirt)-[:ORIGINAL_PROP]->(Dalek 5) INCLUDE_AND_PRUNE
  • -[:COMPOSED_OF]->(shoulder)-[:ORIGINAL_PROP]->(Dalek 6) INCLUDE_AND_PRUNE
  • <-[:MEMBER_OF]-(Dalek 2)-[:COMPOSED_OF]->(skirt)-[:ORIGINAL_PROP]->(Dalek 2) INCLUDE_AND_PRUNE
  • -[:COMPOSED_OF]->(shoulder)-[:ORIGINAL_PROP]->(Dalek 2) INCLUDE_AND_PRUNE
  • <-[:MEMBER_OF]-(Dalek 1)-[:COMPOSED_OF]->(skirt)-[:ORIGINAL_PROP]->(Dalek 1) INCLUDE_AND_PRUNE
  • -[:COMPOSED_OF]->(shoulder)-[:ORIGINAL_PROP]->(Dalek 1) INCLUDE_AND_PRUNE

Now that the traverser is back at the Dalek species node, it tries the next APPEARED_IN relationship:

  • -[:APPEARED_IN]->(The Dalek Invasion of Earth)<-[:USED_IN]-(Daleks)<-[:MEMBER_OF]-(Dalek 6)-[:COMPOSED_OF]->(skirt)-[:ORIGINAL_PROP]->(Dalek 6) INCLUDE_AND_PRUNE
  • -[:COMPOSED_OF]->(shoulder)-[:ORIGINAL_PROP]->(Dalek 6) INCLUDE_AND_PRUNE

And so on.

When it’s run, our traverser will return all the paths from the Dalek species node to the original props nodes associated with each prop part. Each path contains all the details of the nodes and relationships along that path, starting with the Dalek species node and ending with the original prop node, with the relevant episode and prop part nodes somewhere in between.

By itself, the traverser result doesn’t tell us which prop part has been used the most; for that, we need to do some accumulation. To generate the counted and sorted results, our code calls getSortedResults(), which is simply a helper method that delegates to a ResultsAssembler instance. This ResultsAssembler is responsible for accumulating the results for each prop part.

Here’s the getSortedResults() method:

private Map<String, Integer> getSortedResults(Traverser traverser) {
  Map<String, Integer> results = new ResultsAssembler<Path>(traverser)
    .sortResults(new PathAccessor());
  return results;
}

And here is the code for the ResultsAssembler:

public interface QueryResultAccessor<T> {
  String getOriginalProp(T queryResult);
  String getPart(T queryResult);
}
  
public class ResultsAssembler<T>{
  private final Iterable<T> queryResults;

  public ResultsAssembler(Iterable<T> queryResults) {
    super();
    this.queryResults = queryResults;
  }
    
  public Map<String, Integer> sortResults(QueryResultAccessor<T> accessor){
    Map<String, Integer> unsortedResults = new HashMap<String, Integer>();
      
    for (T queryResult : queryResults){
      String originalProp = accessor.getOriginalProp(queryResult);
      String part = accessor.getPart(queryResult);
            
      String key = originalProp + " " + part;
      if (!unsortedResults.containsKey(key)){
        unsortedResults.put(key, 0);
      }
      unsortedResults.put(key, unsortedResults.get(key) + 1);
    }
      
    Comparator<String> valueComparator = Ordering.natural()
      .onResultOf(Functions.forMap(unsortedResults))
      .reverse().compound(Ordering.natural());
    return ImmutableSortedMap.copyOf(unsortedResults, valueComparator);
  }
    
}

The ResultsAssembler builds a map of prop part keys and episode count values. The assembler uses a QueryResultAccessor to retrieve the original prop name and prop part from a query result. Here the query result is a path, but when we get round to the next example, which uses the Pattern Matching library, it’s a pattern match object. Hence the generic parameters.

Here’s the PathAccessor we use to access the original prop name and prop part from a path. The original prop details are held in the last node in the path, while the prop part details are held in the penultimate node. The getOriginalProp() method can retrieve the original prop details quite easily from the path’s endNode() property. To get the part name (shoulders or skirt) the getPath() method uses the path’s nodes() iterator and an iterator helper function to access the last but one node.

private class PathAccessor implements QueryResultAccessor<Path>{

  @Override
  public String getOriginalProp(Path queryResult) {
    Node originalPropNode = queryResult.endNode();
    return getPropertyValueOrNull(originalPropNode, "prop");
  }

  @Override
  public String getPart(Path queryResult) {
    Iterable<Node> pathNodes = IteratorUtil.asIterable(
      queryResult.nodes().iterator());
    Node partNode = IteratorUtil.fromEnd(pathNodes, 1);
    return getPropertyValueOrNull(partNode, "part");
  }
    
  private String getPropertyValueOrNull(Node node, String property){
    if (!node.hasProperty(property)){
      return null;
    }
    return node.getProperty(property).toString();
  }
    
}

Having accumulated and sorted the results, our unit test can then assert that the results contain the top three most used prop parts: Dalek 1‘s shoulders, Dalek 5‘s skirt, and Dalek 6‘s shoulders. Each of these parts appeared in 12 stories.

What do Dalek 1‘s shoulders look like today, nearly 50 years after they first appeared on TV? Dalek 6388, of course, has the answer.

Pattern Matching

The Traversal Framework helped us find the hardest working Dalek prop parts in showbiz, but we still had to do quite a bit of work to accumulate the results. While the traversal description itself was relatively trivial, having to iterate nodes in each of the returned paths proved quite tedious. With the Pattern Matching library we can avoid some of the messy iterative code that cropped up in the PathAccessor.

Using the Pattern Matching library we create an abstract subgraph that describes the graph patterns we want to match in our real graph. When matching, we anchor this abstact graph to a starting node in the real graph, much as we started the traversal in the previous example from a node we’d looked up in an index. The matcher then matches subgraph instances corresponding to the nodes, relationships and constraints defined in our abstract subgraph.

@Test
public void shouldFindTheHardestWorkingPropsUsingPatternMatchers(){
  Node theDaleksNode = database.index().forNodes("species")
    .get("species", "Dalek").getSingle();
      
  final PatternNode theDaleks = new PatternNode();    
  final PatternNode episode = new PatternNode();
  final PatternNode props = new PatternNode();
  final PatternNode prop = new PatternNode();
  final PatternNode part = new PatternNode();
  final PatternNode originalProp = new PatternNode();

  theDaleks.setAssociation(theDaleksNode);
  theDaleks.createRelationshipTo(episode, Rels.APPEARED_IN);
  props.createRelationshipTo(episode, Rels.USED_IN);
  prop.createRelationshipTo(props, Rels.MEMBER_OF);
  prop.createRelationshipTo(part, Rels.COMPOSED_OF);
  part.createRelationshipTo(originalProp, Rels.ORIGINAL_PROP);
        
  PatternMatcher pm = PatternMatcher.getMatcher();
  final Iterable<PatternMatch> matches = pm.match(theDaleks, theDaleksNode);
        
  assertHardestWorkingPropParts(getSortedResults(matches, originalProp, part),
    "Dalek 1 shoulder", 12,
    "Dalek 5 skirt", 12,
    "Dalek 6 shoulder", 12); 
}

The pattern node objects that we define as part of our abstract subgraph can then be used as keys into the match results. Below is the getSortedResults() method and MatchAccessor class for the pattern matching example. You’ll notice that we pass the originalProp and part pattern node objects into the MatchAccessor. The accessor then calls the match’s getNodeFor() method with one or other of these pattern nodes in order to retrieve the real, matched node.

private Map<String, Integer> getSortedResults(Iterable<PatternMatch> matches, 
    PatternNode originalProp, PatternNode part) {
  Map<String, Integer> results = new ResultsAssembler<PatternMatch>(matches)
    .getSortedResults(new MatchAccessor(originalProp, part));
  return results;
}

private class MatchAccessor implements QueryResultAccessor<PatternMatch>{

  private final PatternNode originalProp;
  private final PatternNode part;
    
  public MatchAccessor(PatternNode originalProp, PatternNode part) {
    super();
    this.originalProp = originalProp;
    this.part = part;
  }

  @Override
  public String getOriginalProp(PatternMatch queryResult) {
    return queryResult.getNodeFor(originalProp).getProperty("prop").toString();
  }

  @Override
  public String getPart(PatternMatch queryResult) {
    return queryResult.getNodeFor(part).getProperty("part").toString();
  }  
}

Cypher

The Pattern Matching example still required us to accumulate the results in order to work out which was the hardest working prop part in the original series. The abstract pattern nodes provided convenient keys into matched nodes, but we still had then to pass the matched nodes’ contents to the ResultsAssembler in order to build a map of prop parts and usage counts. With Cypher, all that manual accumulation disappears. Better still, the entire query is expressed using a very simple, declarative syntax. Unencumbered by a “fluent” interface, a Cypher query concisely describes our desired route into the graph:

public void shouldFindTheHardestWorkingPropsUsingCypher() throws Exception {
  CypherParser parser = new CypherParser();
  ExecutionEngine engine = new ExecutionEngine(universe.getDatabase());
  
  String cql = "start daleks=(Species,species,'Dalek')"
    + " match (daleks)-[:APPEARED_IN]->(episode)"
    + "<-[:USED_IN]-(props)"
    + "<-[:MEMBER_OF]-(prop)"
    + "-[:COMPOSED_OF]->(part)"
    + "-[:ORIGINAL_PROP]->(originalprop)"
    + " return originalprop.prop, part.part, count(episode.title)"
    + " order by count(episode.title) desc"
    + " limit 3";

  Query query = parser.parse(cql);
   ExecutionResult result = engine.execute(query);
      
  assertHardestWorkingPropParts(result.javaIterator(),
    "Dalek 1", "shoulder", 12,
    "Dalek 5", "skirt", 12,
    "Dalek 6", "shoulder", 12);
}
  
private void assertHardestWorkingPropParts(
    Iterator<Map<String, Object>> results, Object... partsAndCounts) {
  for (int index = 0; index < partsAndCounts.length; index = index + 3){
    Map<String, Object> row = results.next();
    assertEquals(partsAndCounts[index], row.get("originalprop.prop"));
    assertEquals(partsAndCounts[index + 1], row.get("part.part"));
    assertEquals(partsAndCounts[index + 2], row.get("count(episode.title)"));
  }
  
  assertFalse(results.hasNext());
}

Most of that is just boilerplate code for setting up the Cypher engine and executing the query. The really concise part is the query itself. Here it is in full:

start daleks=(Species,species,'Dalek')
match (daleks)-[:APPEARED_IN]->(episode)<-[:USED_IN]-(props)<-[:MEMBER_OF]-
      (prop)-[:COMPOSED_OF]->(part)-[:ORIGINAL_PROP]->(originalprop)
return originalprop.prop, part.part, count(episode.title)
order by count(episode.title) desc
limit 3

Spend a couple of moments looking at it and you’ll soon appreciate its simplicity and declarative expressiveness. If only the Daleks had such power…

Cypher is still in its infancy, and its syntax is still in flux. In the 1.4 release timeframe, which culminates this week with the release of Neo4j 1.4 GA, Cypher performs only non-mutating operations. Over the course of the 1.5 release, we expect to add mutating operations.

The Tutorial

Our Neo4j tutorial is freely available from GitHub. Download it now and try some of the exercises. And keep checking back for updates: as Neo4j grows, so will the tutorial.

Jim and I are also running one- and two-day versions of the tutorial at a number of conferences and other events:

(Dalek prop image belongs to the Project Dalek Forum gallery, another excellent resource for Dalek prop researchers and Dalek builders.)

5 Comments  |  Atom   RSS 2.0   Email