In this post I’ll present the approach that I am using for the converting Path Query to Cypher. The basics of Path Query to Cypher conversion have been discussed in my previous post.
Observations
A Path Query consists of many paths. Each path represents some data in the database. For example consider the following Path Query.
It contains the following four paths.
Gene.length
Gene.homologues.dataSets.name
Gene.homologues.dataSets.publication.title
Gene.homologues.evidence.publications.abstractText
One observation that we can make by looking at these paths is that some prefixes are common in paths. For example, Gene.homologues
is common in the last three, Gene.homologues
is common is the second & third and Gene
prefix is there in all of them.
Infact, while building queries in the query builder, we start from a model and add its attributes, references & collections to our query. Then we move on to any of the references/collections and then add its attributes, references & collections to our Path Query as required, and so on. Thus, all the paths will have some prefix in common and there is an heirarchy associated with different components of the path.
With a bit of thought, I came to a conclusion that Tree Data Structure would be the perfect representation for all the paths of the Path Query. Tree is a popular hierarchical data structure which has a root, subtrees of children with a parent node.
PathTree Representation
Let us generate a Tree using the four paths of the example Path Query shown above. Since it is a Tree made up of paths, we can call it a Path Tree.
A path tree is made up of many TreeNodes. Each tree node represents a component of the path. For example, the path Gene.homologues.dataSets.name
would be represented by four TreeNodes - Gene, homologues, dataSets & name. These components further represent the Nodes, Relationships & Properties in the InterMine Neo4j graph. These TreeNodes also represent Neo4j Graphical Entities (& properties). All the paths having common prefix have common ancestor TreeNodes in the PathTree. This way we can represent hierachy among the various components of the path and can avoid storing redundant information.
Generating Cypher using Path Tree
In Cypher, we can assign Nodes, Relationships & even Paths to the variables. These variables can then be used in place of those Nodes/Relationships/Paths in the rest of the query. For example consider an example Cypher query.
In the example, we first matched all the Genes and assigned it to the variable n
. Now, n
is used to MATCH a relationship from Genes to Chromosomes and also in the WHERE clause it is used to compare the length of the Genes.
This way, while converting a Path Query to Cypher, we can assign a unique variable name to each TreeNode in the Path Tree and then use it in the remaining MATCH, WHERE, RETURN, ORDER BY & OPTIONAL MATCH clauses. The high-level approach, in an algorithmic form is presented as follows.
In the next post, I’ll explain the generation of each clause - MATCH, RETURN, ORDER BY, WHERE & OPTIONAL MATCH separately. Meanwhile, you can have a look at the Path Query to Cypher conversion code at org.intermine.neo4j.cypher package.