Neo4j High Availability: An Infographic on Scaling Your Graph to Handle Big Data

Neo4j High Availability: An Infographic on Scaling Your Graph to Handle Big Data

Setting up HA (Neo4j High Availability) and Clustering (replicated distribution) with your Graph Database

Scaling with Neo4j High Availability

Thank you @DavidMontag for writing the original “Understanding Neo4j Scalability” White Paper upon which a majority of this infographic is based. Neo4j HA DocsThere are two parts to each Neo4j instance. One part is the database itself, the other is the cluster management layer. The cluster management layer continuously stays in sync with all instances in the cluster. It keeps track of any instances joining or leaving the cluster. When a master election becomes necessary, the cluster management layer automatically ensures that a new master is consistently elected. The database layer manages the rest of the system. Slave instances pull transactional updates of the data from the master. In a Neo4j HA cluster, the full graph is replicated to each instance in the cluster. Each instance contains the complete graph. Thus, regardless of the number of instances that fail, all of your mission critical data is kept safe so long as even one instance remains available. Most applications perform many more reads than writes. While Neo4j’s write operations are performed in unison with the elected master. Read operations are done locally on each slave. This means that the read capacity of the HA cluster increases linearly with the number of servers. This provides an extremely high tolerance for real-world loads without compromised performance. Neo4j also supports sharding of the graph in memory along natural “chunks.” These sub-graphs may be kept hot in memory on specific instances of Neo4j within a cluster. Developers can then route queries that map to those chunks to those pre-heated instances. This allows for users to maintain incredible performance at scale without extreme hardware costs. While maintaining fully ACID transactions, Neo4j is able to commit 10k++ of transactions per second. All the while, maintaining data integrity across enormous data sets in the real-world on a global scale. Not a single successful business operates without backups of their mission-critical data. Neo4j supports both full and incremental backups from running clusters. Reporting instances allow for ad-hoc reporting and analytics jobs to be run without compromising production capacity. Neo4j can be configured to run as a global cluster.  By extending the main cluster with slave-only satellite clusters which can be placed close to end-users. These clusters stay up to date with the main cluster in real time. A single cluster is not protected in the adverse event of an entire data center failure, such as during a fire or natural disaster. In order to mitigate this risk, businesses with critical applications set up Neo4j with disaster recovery clusters housed in separate locations

Leave a Reply

Your email address will not be published.