Hot Ca$h: A Note on Performance Testing Neo4j

 A quick note on performance testing Neo4j:

When trying to performance test the queries key to your use-case, a common mistake is running your tests on a cold cache.

Cold Cache: When all (or most) items that are accessed must be loaded into memory from disk 
 Warm Cache: When none (or few) items that are accessed must be loaded into memory from disk.

In production, your database will nearly always be acting on a warm cache. Meaning, your database will have pulled the items that your queries are acting on into memory. Our goal is to test how your database will function under a production load, not the i/o on our SSDs.

Spoiler alert: a single query on a cold cache isn’t a representative test of a live application.

Neo4j-Enterprise has a two-part cache called the which consists of an object cache and a file cache. #highPerformance

From the Neo4j Docs:

“The file buffer cache caches the Neo4j data in the same format as it is represented on the durable storage media. The purpose of this cache layer is to improve both read and write performance. The file buffer cache improves write performance by writing to the cache and deferring durable writes.”

“The object cache caches individual nodes and relationships and their properties in a form that is optimized for fast traversal of the graph. There are two different categories of object caches in Neo4j.”

What does that mean? It means that in an ideal world, your hardware will be sized such that you have twice as much memory as your dataset on disk.

For example: if I have a 1GB dataset, it would be ideal to have at least 2GB of memory allocated to my JVM.

Which leads to my second key to reasonable performance testing:

 Size your JVM. 

Inside your neo4j-enterprise-2.2.1/conf/neo4j-wrapper.conf file, modify the amount of memory allocated to your JVM Heap:

testing neo4j big data graph database

In the above I’ve set my heap to 4GB.

Our next critical step is to size our page cache, found within: /conf/ a good general rule for sizing your page cache is the sum of the total possible nodes, relationships, and properties. For my test data of 2GB, I’ve set my page cache to 2GB.

pageCache testing neo4j big data graph database

My testing environment is a 2014 Macbook Air, i7, SSD with 8GB Ram. Our database is 5MM relationships, 350k nodes (avg. rel density of 14.3)

Data Model:

testing neo4j big data graph database
Complicated, I know.
What about density of my “test” node:
nodeDensity testing neo4j big data graph database
I have ~10,000 friends. So, we’re testing against probably one of the densest nodes in our graph. Typically a place where query performance will be weakest.

Testing Neo4j

Let’s take a cold cache with a basic recommendation query:

“Who should I be friends with: show me 20 of my friends’ friends?”
MATCH (n:User {username:"Kevin"})-[:FRIENDSHIP]-(friend)-[r:FRIENDSHIP]-(newFriend)
WHERE NOT (n)-[:FRIENDSHIP]-(newFriend)
WITH newFriend LIMIT 20
testing neo4j big data graph database
Uselessly slow. 2295 ms.
Let’s try warming up the cache. Here’s a bit of Cypher the Neo4j BlackOps Team (aka Max De Marzi) wrote to warm up Neo4j:

START n=node(*)
OPTIONAL MATCH n -[r]-> ()
WITH count(n.property_i_do_not_have) + count(r.property_i_do_not_have) as counted
RETURN counted;

Once is good, thrice is probably redundant.

warmUpTerminal testing neo4j big data graph database
Yes this takes time. Yes it’s important. Yes you can write a script to warm up your cache each time you bring back up an instance of Neo4j. For example in a Unix based system you can pre-warm the filesystem-cache of the operating system with something like:
for file in data/graph.db/neostore.*.db; 
do dd if=$file of=/dev/null bs=1000000; 
Remember, we’re attempting to touch every node in the graph (and bring it into memory).
Let’s rerun our basic recommendation query again:
MATCH (n:User {username:"Kevin"})-[:FRIENDSHIP]-(friend)-[r:FRIENDSHIP]-(newFriend)
WHERE NOT (n)-[:FRIENDSHIP]-(newFriend)
WITH newFriend LIMIT 20
boomShakaLaka testing neo4j big data graph database

We see a 5,000% difference between cold cache and hot cache. Remember, nearly always in production you’ll be working with a warm cache, so make sure that’s environment you’re testing against.


Leave a Reply

Your email address will not be published.