Thinking in Patterns Starts at Data Modeling
Informative and Flexible Patterns
Thinking in patterns is the key to interacting with a graph database like ONgDB. One of the main challenges I see with those with deep relational database experience when transitioning to a graph database is the use of a relational approach for querying data. To query a graph database most efficiently there is a need to update the mental model for how database query interactions are approached. We’ll look at some examples of this using the graph query language Geequel and making this transition to thinking in patterns.
The overuse of relational query techniques most often manifests itself in a tendency to use WHERE clauses exclusively for filtering and comparisons from multiple complete sets of nodes, rather than enabling ONgDB to begin ignoring nodes as it expands the starting set in the MATCH clause. The goal of querying in the Open Native Graph Database (ONgDB) with Geequel should be to get to the smallest starting set as quickly as possible to maximize the benefits of constant-time, index-free adjacency traversals within the local network around each starting node.
In order to query ONgDB in a pattern-centric manner that is sympathetic to the data layout the data model must consider these patterns that are important. One key in modeling the data is to know that each relationship of a node is literally a memory pointer to another node and the relationships around a node are grouped by their type. This allows constant time traversal and targeting from one node to a set of nodes all connected by a single type. Let’s look at an example…
Assuming we want to see individuals from Wooster, Ohio that were actors in a movie and see if any of them worked with any of the same directors. The non-normalized RDBMS approach to model this could be putting isActor, isDirector, city, state and movies properties on the Person node. Here’s a bit of an extreme example of this could look:
MATCH (actor:Person) WHERE actor.isActor = true AND actor.state = “Ohio” and actor.city = “Wooster” WITH actor, actor.movies AS movies UNWIND movies AS movie MATCH (director:Person) WHERE director.isDirector = true AND movie IN director.movies RETURN director, collect(person) AS persons;
The issue with such approach is that it requires you to go through each node within the Person label to find the intersection of the values within the movies array for the Person nodes that have been determined to be actors from Wooster, Ohio or directors.
A more graph friendly and contextually specific approach would be to realize that the movies really should be their own Node and connect each Person that is an actor to the Movies in which they acted via an ACTED_IN relationship and the lookup query would be:
MATCH (ohio:State {name: “Ohio”})<-[:APART_OF]-(wooster:City {name: “Wooster”})<-[:LIVES_IN]-(actor:Person)-[:ACTED_IN]->(m)<-[:DIRECTED]-(director:Person)-[:DIRECTED]->(m2)<-[:ACTED_IN]-(a2)-[:LIVES_IN]->(wooster) RETURN person;
Expanding out the contextually meaningful pieces of information such as the Movies to make them their own Node entity within the graph allows many significant patterns to be built involving them and take advantage of the traversal performance where ONgDB excels. By utilizing these patterns and getting to the smallest possible starting set as quickly as possible complex relationships and patterns can be leveraged to build incredibly meaningful returns much more quickly than staring with two large sets of entities and looking across all of them for some intersection based on many property checks.