Image Blog Using Graph Databases in Neo4j With Cypher
June 11, 2015

Using Graph Databases in Neo4j With Cypher

Java Application Development
Enterprise Development

Before we get into our Neo4j example, let’s get an over view of Neo4j and the Cypher syntax. Learning the syntax of Cypher is pretty straightforward, applying it in real life situations to tailor to your needs is the part that needs more attention.

Neo4j and Cypher Documentation

You might be wondering if you can just go through the Cypher documentation and gather all you need from there?

Well, yes and no. Yes, because frankly, the Cypher documentation is one of the most comprehensive and detailed documentations that you will come across which, if you go through it chapter by chapter and example by example, will most definitely lay down an amazing groundwork for you to become a Neo4j veteran in no time. And no, because some of the examples are disjointed (according to me, at least), i.e. if example A shows you how to construct a query using START, it will most probably not tell you how to use the built-in functions, or creating collections. You get the drift.

Why Use Neo4j and Cypher?

Now you might also wonder: What exactly would I gain from learning about Neo4j and Cypher? To that, I say, “Excellent question!” If you want to learn the intricacies of how graphs databases work and help you in ways SQL never could, decreased retrieval time among many others, and would like to know how it is achieved with the help of Cypher, then learning Neo4j might lead you to that goal.

Now, with all that in mind, let’s commence operation “Neo4j and Cypher” with a few definitions to help get you in the groove as soon as possible:

What is Neo4j?

Neo4j is a graph database management system and the subject of the this post. It employs the mathematics of graphs and utilizes its huge potential for fast information extraction speeds to store information in the form of nodes and inter-node relationships.

What is Cypher?

Cypher is the language which helps structure all the queries in Neo4j -- much in the way that SQL helps to structure queries in relational database management systems like MySQL or Microsoft SQL.

Did you know that XRebel can help developers find excessive database queries?

See for yourself with a personalized XRebel live demo.

See a Demo

Our Neo4j Example

In this post, we will have a look at how to install your own local instance of Neo4j, a more detailed look at the syntax of Cypher, how information retrieval and modification is achieved using it, and as promised, introduce a scenario that will most probably make the understanding process fun and seemless. So, here we go!

Installation and the Cypher syntax
How to query information with CypherConclusion

Installing Neo4j and the Cypher Syntax

As was touched upon in the previous article, Cypher enables you to create a database based on graphical storage, i.e. it manages information in the form of nodes and relationships. Consider the following example:

Ms. Alpha Abbott works in the same company (RandomGreekAlphabets Corp.) as Mr. Lambda Longbottom, who are both supervised by Ms. Gamma Greyback, who has been working in the company since 2002.

The next image is a correct representation of the aforementioned situation: Neo4j Example Graph The screenshot above has been grabbed from the local Neo4j Webadmin’s Data browser view. How did I do this, you ask? Well, here’s the modus operandi:

    1. Go to the URL http://neo4j.com/download/other-releases/ for selecting either a *.zip archive or an executable.
    2. Once downloaded, run the file as an Administrator, agree to the user license, select the destination directory and voila! Neo4j is installed in your system.
    3. Now, go to $destination_directory\Neo4j Community\bin and execute the file neo4j-community.exe.
    4. The Neo4j Community dialog box which opens mentions the database location which we are going to work with later on. It basically is where Neo4j stores all the nodes, the inter-node relationships, the indices etc.
    5. Click the Start button to fire up the Neo4j server, which sits on top of the port 7474 and wait while it starts up. Provide it with all the accesses which you want, in case you happen to see a Security Window Alert window.
    6. Once that is done, you will see a green status strip informing you that Neo4j is ready. Neo4J community start screen

Click on the link provided: http://localhost:7474/. Check if the browser view is in tune with or something like the screenshot displayed below, without any issues. If yes, congratulations! If no, retrace your steps again to see if something inadvertently got missed. conenct-to-neo4j

  1. Enter the password and click Connect.
  2. The browser is quite user-friendly. Play around it with for some time to get the proper hang of it. You might want to go through some tutorials and walk-throughs in the process, at the pace which you find most comfortable. There's a left-side panel where you can find most of the options like System configurations, creating and accessing nodes in a jiffy, the REST API and the information about styling your graphs.
  3. The textarea on the top is where all the queries, which we are going to go through in a while, go. Type/ copy them and you will see the magic unfurl before you. Bon Voyage! Remember to stop the server by clicking the Stop button in the Neo4j Community dialog box once you are finished.

One last thing: the startup window won’t look the same the next time you decide to start the Neo4j instance, it may look something like the following: neo4j-start-screen

You might also want to look up how it should look currently by visiting the Neo4J docs page in case you desire so. This view which is seen is called the browser view. There is a more technical-information oriented view of Neo4j, called the Webadmin view, which may be accessed in one of the following two ways:

  • Open http://localhost:7474/webadmin.
  • Click the information button, (i), on the left-side panel to reveal the Information Menu, and click the hyperlink “Webadmin” present at the bottom.

In the Webadmin view you will find many options which we can safely ignore until we're administering an actual production database. This console displays information such as the number of nodes created till date, the memory being utilized, along with a graph which updates itself every 3 seconds to reflect the changes brought forth by your data manipulation queries. neo4j-dashboard There are two main panels which we are going to discuss next, apart from the Dashboard, displayed above: the Data Browser panel and the Console panel. The Data browser looks something like the following: neo4j-query-return It is where you can get the graph snapshots for your data, like the one for which this post is about. And the following image is of the Console panel, where you can write and execute the queries, old-school style! neo4j-console  You can either copy it to the browser view textarea which we discussed a while back, or you can copy it in the Console view. Wait for the query to execute. Once that is done, write the following query to see the output:


MATCH n RETURN n;


Now, just as there are two ways to execute your queries: the Neo4j browser view and the Console view, similarly there are two ways of viewing the results of the above query in the graphical form: the Neo4j browser itself, and the Data Browser.

  • In case you execute the previous two queries on the browser, click the “Graph” button on the left hand side of the query output sub-panel, which should give you something like the following:neo4j-query-example
  • In case you executed the query on the Webadmin Console view, then you might need to head over to the Data Browser panel, and type the following query in the textarea on the top and click the search button to reveal the following graph display, whose style can be customized in any way which suits your needs,
    
    START root=node(0)
    MATCH n
    RETURN n
    

    neo4j-dashboard-2

Now that we have gone through the process of installation and a basic introduction to the process of node and relationships creation, we are finally ready to have a look at how information retrieval, in which graphs are supposed to score immensely over all other methods, takes place.

How to Query Information With Cypher

The following is the possibly the nerdiest graph data structure I could think of (took me two whole days to research and materialize), which we will use to understand how information extraction is achieved in Cypher.  And the following are the questions we will try to find the answers for, via Cypher.

Cypher Information Query Example 1

  1. What are the names of the actors, the names of the characters and the television show in which they portrayed the said role? Arrange the list by both the franchise and the actor names.

This example illustrates the most basic of the Neo4j clauses: MATCH, RETURN, WHERE and ORDER BY and the role they play in extracting the information that is desired. To solve it, we need to ask ourselves how the actors are related to the characters they played, and the franchises to which those characters belong. The simplest way is to consider the following pretty straightforward scenario:


ACTOR – [:PLAYED_THE_ROLE_OF] -> CHARACTER – [WAS_A_CHARACTER_IN] 
-> FRANCHISE


This makes it easier to understand the relationship chain that we are required to follow: we need to find each actor, the character he played, and the franchise it was a part of, in that order. Remember, there may be better and more efficient ways to approach this problem: this is just one of many. This premise holds for all the queries that we shall discuss. Remember that there is always a better way to solve a problem. Now, let’s craft the query, shall we? Have a look at it below:


MATCH (people:Actor)-[:PLAYED_THE_ROLE_OF]->(character:Character), 
(franchise:TelevisionShow)
WHERE character-[:WAS_A_CHARACTER_IN]->(franchise)
RETURN people.name AS ACTOR_NAME, character.name AS CHARACTER_NAME, 
franchise.name AS FRANCHISE_NAME
ORDER BY FRANCHISE_NAME, ACTOR_NAME;


Here's a short explanation of what's happening in the query:


MATCH (people:Actor)-[:PLAYED_THE_ROLE_OF]->(character:Character), 
(franchise:TelevisionShow)


The clause MATCH is used to specify the nodes which you would require on your journey from the start to the finish, i.e. you want to start from an ‘actor(type: Actor) node and reach the ‘franchise’ node (type: TelevisionShow), via the character node (type: Character). The notation (n)-[r]->(m) signifies that there is an outgoing relationship r, in this case: PLAYED_THE_ROLE_OF) from a node ‘n’ to a node ‘m’. The colon before the name of the relationship is used whenever you explicitly state the name of the relationship, instead of using a general notation like ‘r’ to signify it, when you are either not sure of the relationship, or simply do not care which relationship there is between any two nodes.


WHERE character-[:WAS_A_CHARACTER_IN]->(franchise)


The clause WHERE works in much the same as its SQL counterpart, which is used to specify constraints to the process of information retrieval. In this particular case, the where clause signifies a constraint apart from the one specified in the MATCH clause preceding it: the ‘character’ node is related to the ‘franchise’ node by way of an outgoing WAS_A_CHARACTER_IN relationship.


RETURN people.name AS ACTOR_NAME, character.name AS CHARACTER_NAME, 
franchise.name AS FRANCHISE_NAME


For all the SQL aficionados out there, RETURN is the CQL version of the SQL SELECT clause. Here, it is being mentioned that after matching the nodes according to the relationship chains specified in the MATCH and the WHERE clauses, return the node properties people.name as ACTOR_NAME, character.name as CHARACTER_NAME and franchise.name as FRANCHISE_NAME.


ORDER BY FRANCHISE_NAME, ACTOR_NAME;


And the similarities just do not seem to end, do they? The ORDER BY clause makes sure to arrange the results first in the alphabetical order of the franchise names, and then, by the actor names. Here's how the output of this query looks in the Neo4j Webadmin Console View on my machine: neo4j-query-actor-name Sweet, right?


Cypher Information Query Example 2

  1. Find all the actors from the UK, who acted in the Sherlock franchise and were born after 1970.

This example shows the role of builtin functions in Neo4j and the retrieval of information by using a multi-step relationship chain.

The author would like to suggest the reader at this point that developing a CQL query is quite straight-forward, provided you have a clear picture of what information you need to extract, in your mind. Here’s another piece of the aforementioned author’s mind: take the good ol’ pencil and notepad, and draw a diagram, and don’t be afraid if it looks stupid: what matters is what you understand from it and that it clearly defines the situation that you intend to simulate via your query, much like we had a look in the previous question. Now, with that image on your notepad and your mind, let’s proceed to view what the query should look like:


MATCH (actor:Actor)-[*2]->
  (franchise:TelevisionShow {name: 'SHERLOCK'})
WHERE actor.nationality='United Kingdom' AND 
toInt(substring(actor.born, 7)) > 1970
RETURN actor.name;


Much like what we saw in the previous example, this query makes use of the MATCH clause to decide which nodes should be encountered as part of the journey and the relationship.

A little difference, though, can be seen here. We have ascertained the node ‘franchise’ by specifying its property: name, which we can, when we are sure of what we are looking for. Any number of properties can be specified, which is good, because the more the information, the faster the lookup.

If I told you that I lived on the outskirts of the Mandora crater on Mars, would you be able to look me up? No. I would need to tell you the address of my bio-dome beforehand. It’s as straightforward as that.

We have made use of the logical operator AND to club the two conditions which are required for us to devise the query: that the actor under consideration should be a national of the United Kingdom, and that they should be born later than 1970. The task of extracting the year of birth has been performed by using the string manipulation function inbuilt into Neo4j, called substring. And to parse that into an integer, so that it can work with mathematical operators, another inbuilt functionality called toInt comes immediately to our rescue.

You would notice that we have, unlike what was discussed earlier, specified the relationship between the nodes differently, by mentioning a 2 after the asterisk. What it implies is this: we don’t care which relationships the two nodes are bound by, as long as it is exactly 2 hops away, outbound, from the ‘actor’ node, signified by an asterisk followed by 2, to the ‘franchise’ node. An asterisk alone would have meant that one node could be any number of jumps away from the other. But if we think about it, it would definitely have taken much more time to process than our version.

The output for this query looks like the on the next image.

neo4j-query-actor-2


Cypher Information Query Example 3

  1. Which novels/ novel series were released in the 19th Century?

Ah, the printed word!

This example, as you can see from the description, needs us to find out the novels/ novel series which were released in the 19th century, and we will make use of more inbuilt functions, in addition to the introduction to two new clauses: SET used for variable modification among other things, as described below and WITH for piping the clauses.

Now, there is one very important thing that needs to be mentioned. I have said this once or twice already, and will probably do so again, that just because NoSQL, and hence Neo4j, is new and hip, it does not imply that it is suited to ALL the situations that you encounter. Sometimes, SQL gives us faster results as compared to graphs, and this example happens to be one of many such scenarios.

We need to make sure that we commence from the nodes of the type either Novel, or NovelSeries, and start defining the relationships. But wait! Which node should be the on the other side of the relationship? None, because we only need to look for the properties of standalone nodes in this case. The following are the tasks which should be performed in order to design the query:

  • Step 1: As mentioned, we require and choose only the Novel and NovelSeries type of nodes, which is achieved by the use of the has function, which returns true in case a particular node has a specific property, i.e. if the node ‘planet’ has a property called ‘orbital_period_around_the_Sun’, then has(planet.orbital_period_around_the_Sun) will return true, otherwise false.
  • Step 2: Now, we need a temporary variable, which we will call temp, in which we can store the integer referring to the year of publication of the Novel/Novel Series, exactly like we saw in the previous example, which, in Cypher, is achieved by the use of the SET clause.
  • Step 3: Now since we have defined quite a lot and specified quite a lot, we need to start afresh with the next part of our two-part query. And for that to happen, we need to pipe the node we created, using the WITH clause, which is now bound by the constraints we defined in the WHERE clause and is attached with a new temporary variable.
  • Step 4: We start with the MATCH again, and now in the WHERE clause, mention the most critical condition of the query, that the Novel/Novel Series should be published in the 19th century.
  • Step 5: Though it is not necessary, but it is a good practice to remove the temporary variables from the memory after they have served their purpose, because that variable was never actually a property of the node to begin with. Thus, we use the REMOVE clause to do so, but not before we store the value of the doomed variable in some alias for display purposes and pass this value forward to the final RETURN clause, which does the displaying for us.

So, after this much exercise, let’s finally have a look at the query and how the output looks:


MATCH (nOrNS) WHERE has(nOrNS.released) AND has(nOrNS.title)
SET nOrNS.temp = toInt(substring(nOrNS.released, 7))
WITH nOrNS
MATCH (nOrNS) WHERE nOrNS.temp > 1800 AND nOrNS.temp < 1899
WITH nOrNS, nOrNS.temp AS YearOfPublication
REMOVE nOrNS.temp
RETURN nOrNS.title AS Title, YearOfPublication;


Output that you might see then looks like this.

neo4j-books-query

Cypher Information Query Example 4

  1. Name the character from the franchise Dr. Who, who belonged to the Slytherin House in the Harry Potter franchise.

This query, quite differently from the previous one, requires you to visualize the relationship chain from this person whom we are looking for, to the node which represents the Slytherin house, by making use of a relationship chain.

Now, we know that the person is an actor, and that he played the role of a character in the franchise Dr. Who, and that he also played a role in the Harry Potter franchise, where he was a part of the Slytherin house. Guesses, anyone? No? Well, let’s let Cypher do this job for us. Carefully notice the directions in which the arrows point. The direction of the relationship, whether inbound or outbound, is of utmost necessity.


MATCH
(drWho:TelevisionShow {name: 'DR. WHO'})<-[:WAS_A_CHARACTER_IN]
-(drWhoCharacter:Character)<-[:PLAYED_THE_ROLE_OF]-(actor:Actor) -[:PLAYED_THE_ROLE_OF]->(harryPotterCharacter:Character)-
[:BELONGED_TO]->(house:House {name: 'SLYTHERIN'})
RETURN drWhoCharacter.name, harryPotterCharacter.name;


Output:

neo4j-character-match-query


Cypher Information Query Example 5

  1. What was the name of the ring whose wielder had something in common with Dobby, the house-elf?

This example continues to explore the role of relationship chains.

Now we all know that Dobby was an elf, and we are required to find the name of the ring whose wielder had something common with Dobby. We will use the bottom-up approach here, by starting from the Ring, and gradually moving towards the target character. It is also just a matter of how clearly you picture the relationship chain. So, here we go:


MATCH (ring:Rings)<-[:WAS_THE_WIELDER_OF]-(targetCharacter) -[:CHOSE_TO_LIVE_AS]->(elf:Race {type: 'ELF'})
<-[:WAS_AN]-(dobby:Character {name: 'DOBBY'})
RETURN ring.name;


Output on my machine is the following:

neo4j-query-ring-name


Cypher Information Query Example 6

  1. List the names of all the persons associated with more than one franchise.

This one is quite self-explanatory, but I will dabble in the details just a little. We are in need of all those Actor type nodes which have more than one PLAYED_THE_ROLE_OF outgoing relationships towards certain franchise nodes. Here’s the query and subsequently, the output. But before that, a couple of pointers:

  1. The collect method can be used to collect strings, as is required for the formulation of the said query.
  2. The length function can be either used to find the length of a string, or the size of a collection. Talk about versatility!

MATCH (person:Actor)-[:PLAYED_THE_ROLE_OF*1..]->
(character:Character)-[:WAS_A_CHARACTER_IN|FEATURED_IN]->
(franchise)
WITH person, collect(character.name) AS characterCollection, 
collect(franchise) AS franchiseCollection
WHERE length(franchiseCollection) > 1
RETURN person.name AS ActorName, characterCollection, 
franchiseCollection;


Now we have the query, let's check out the output that it gives on my machine: neo4j-query-person-name


Cypher Information Query Example 7

  1. Name of the Author of the book series in whose dependent franchise the person who voices Smaug plays the role of Sherlock.

This is the last one, I promise. In this case, the relationship chain seems to be quite complex, but is not. Thought of the answer already? I told you the pen and notepad trick would work, didn’t I? Let’s see what Cypher has in store for us this time:


MATCH (author)<-[:WAS_WRITTEN_BY]-(nOrNS: Novel)
<-[:WAS_A_CHARACTER_IN]-(smaug:Character {name: 'SMAUG'})
<-[:VOICED]-()-[:PLAYED_THE_ROLE_OF]->(sherlock: Character 
{name: 'SHERLOCK HOLMES'})
RETURN author.name;


My output here is below. Looks good isn't it? neo4j-author-query Tired? Mind if I ask just one more query which I was unable to form?

Even if you do, here it is: Actor William Russell is the father of Alfred Enoch. Find the next shortest linkage between the Father and Son duo. That is, there is a direct one step relationship between them, propagating as an IS_THE_FATHER_OF relationship from William Russell to Alfred Enoch. You are to write a query to find another relationship chain between these two nodes with the minimum length, obviously with length greater than 1.

HINT: A path variable can be assigned to a chain. And there is a method by the name of shortestPath which may also come in handy.

To seek help with this, the official Neo4j documentation might be a great place to start. You can also download the *.pdf version of the documentation to the latest Neo4j release from their website. In case you face any difficulties while trying to produce this query, look for the answer in the next part in of this series. Maybe I’ll be able to work it out somehow by then!

One important thing: the Neo4j Webadmin console requires you to place a ‘;’ terminator after every query, but the Neo4j browser does not, though it is always a good practice to do so. Also, always comment your queries and try keeping the variable names as verbose as you possibly can: it increases the readability and understandability, and subsequently the maintainability of your code.

Neo4j, or graph databases in general, are really intriguing, and the way Cypher reflects the way the human mind works in terms of flows and relationships is one of the features which made me interested in them in the first place.

Final Thoughts

In this article, we saw how to install a local instance of the Neo4j on a machine and looked at the various associated aspects.

  • We had a look at the two different views: the browser view and the Webadmin view, each with its own set of unique features.
  • We witnessed a lot of querying in this article.
  • We went through the usage of clauses like MATCH, RETURN, WHERE, WITH among a whole lot of others as well which help you in designing the graph database scenario that you want, how you want it to be, and how efficiently you want it to perform.
  • We created a graph database using the CREATE clause, effectively generating nodes, the relationships between them, and their respective properties.
  • We learnt via another graph database scenario and some dependent examples as to how to extract relevant information from a graph storage, which was performed using n-step relationship hops, multi-directional relationships and conditional extraction, among others.
  • We also went through a little bit of data manipulation through the use of the SET clause, and an example related to piping, the one with the WITH clause, where you can transfer control from the result of one sub-query to the next.

In the next article in the series, we will have a look at the REST API for Neo4j and how to manipulate a Neo4j database from Java code. That’s it for this time, folks. 

Additional Resources

There are a couple of books out there which can be checked out in case you develop a liking towards Neo4j: there is O'Reilly Media’s Graph Databases and then there’s Manning Publications’ Neo4j in Action. For further information, kindly scourge the internet and bug the experts on the forums, or just ping Michael Hunger, he's a great person and he'd be happy to help!

And as you are aware by now, graphs can be used to solve and visualize many situations which are difficult to wrap your head around using the conventional methods of learning. In case you would like to witness more proof of the amazingness of graphs, here are two links which you may like to follow. The articles are interesting, thought-provoking and fun: all at the same time!

My personal favorite is http://www.allthingsgraphed.com/2014/12/05/stellar-navigation-using-network-analysis/ where Caleb Jones tries to explain and visualize navigation between galaxies and nebulae from Earth by building a model of our immediate stellar neighborhood (which may not be feasible any time soon, but hey! Who’s to stop you from dreaming and making it a reality one day?)

For those trying to improve database performance in Java microservices, this webinar offers up some actionable tips.