Docker and Neo4j for Developers Who Can’t Read Good and Want to Do Other Stuff Good Too

TL;DR

If you don’t feel like reading and just want a script to spin up a Docker instance with Neo4j that has the right ports open, as well as logs and data persisted outside of the container:

docker run \

        -p 7474:7474 \
        -p 7687:7687 \
        -p 7473:7473 \
        -v $HOME/neo4j/data:/data \
        -v $HOME/neo4j/logs:/logs \
        -v $HOME/neo4j/import:/var/lib/neo4j/import \
        -v $HOME/neo4j/conf/:/conf/ \
        neo4j:latest

 

Important Resources for Further Reading:

A step-by-step guide to Neo4j and Docker:

I’ll admit it— I’m late to the party. Really late to the party. For the past few years+, every conference talk I’ve seen has been some version of: “Web-scale map-reduced machine-learning with artificially intelligent sentiment aware recommendations on Docker in the cloud.” Naturally, because it was new and trendy— I assumed it was stupid and would die soon. I also thought Uber was a bad idea. So, I’m probably not the best litmus for these things.
slowpoke-neo4j-docker
It’s now 2016 and containers are here to stay. They’re probably going to continue growing in popularity, especially as more and more enterprises deploy their infrastructure in the cloud. This guide is for all the other curmudgeons out there.

What is a Container and Why Do I Care?

A container is basically (but not actually) a very flexible, very lightweight, easier-to-manage virtual machine. In the olden days, when Cobalt developers roamed the earth— one application, one (physical server) server. If you were a hardware salesperson, times were good. If you were a developer, well…I’m sorry.
server docker neo4j
One Server, One Application.
Then came the hypervisor, which allowed a physical server to create virtual “computers” within itself. Wherein a server may be running several different operating systems, all with different libraries, and applications. Each VM (Virtual Machine), functions as an isolated partition within the host computer, made up of its own resources (disk, memory, etc.) as well as a guest operating system. VMs made cloud hosting possible. Users can now easily provision a VM on a large communally shared machine.
virtual_machine
Virtualized Hardware.

 

A container, by contrast, doesn’t require a guest operating system— a subtle but important distinction. By hooking into the host operating system’s kernel, tools like Docker are able to allow many applications to run in isolation, without the additional bulk of dozens of operating systems that aren’t actually contributing much. Docker Engine and it’s affiliated tools, allow for simple container management.
docker neo4j architecture
Dockerized Containers, Many Apps, One Server.

Step by Step Guide to Setting Up Neo4j on Docker 

Step 1: Install Docker

If you haven’t already, go and download Docker. Select the appropriate version based on your operating system, Docker has great documentation for this part of the process. I won’t redouble their efforts.

Step 2: Start Docker 

If you’re on OS X, navigate (or search) to the Docker Engine application, make sure it’s running. Open your terminal.

Step 3: Start Neo4j 

One of the things that makes Docker so great is how easy is it to pull and run different images of containers. A Docker Image is an application blueprint. You can make these yourself or pull them from the Docker equivalent to the app store called “Docker Hub”.
To start a container, we’ll use the run command. The syntax for this is:

docker run [options] [ image ]:[ tag ] [command] [ arguments ]

For example:

docker run neo4j:latest echo “i got that graphy feeling"

The above is telling docker to run the neo4j version that has been tagged “latest” and then to echo the string “i got that graphy feeling.” The are many different versions of Neo4j on the Docker Hub. For example, the tag: “enterprise” (which will pull the image of the latest version of Neo4j Enterprise) or you can specify the tag version: “3.0.6” or “3.0.6-enterprise.”

Opening your container to the outside world:

If you execute the command above, you may notice that you can’t actually access Neo4j or the Neo4j Browser.

We need to publish some ports using the “-p” option.

docker run \   
        -p 7474:7474 \
        -p 7687:7687 \
        -p 7473:7473 \
        neo4j:3.0.6-enterprise

 

What we’re doing above is mapping ports on our host machine to ports within our container. The three ports for remote access are by default:
  • 7474 for HTTP.
  • 7473 for HTTPS.
  • 7687 for Bolt.

But what about actually storing our data?

 

If you’d like to persist the database and logs outside of the container (this is a good idea), you just specify where you’d like to store that data using the -v option:

 

docker run \
        -p 7474:7474 \
        -p 7687:7687 \
        -p 7473:7473 \
        -v $HOME/neo4j/data:/data \
        -v $HOME/neo4j/logs:/logs \
        neo4j:3.0.6-enterprise

 

Step 4: Configuration

 

The Neo4j image by default declares a volume at /var/lib/neo4j/conf. This contains (depending on the version of Neo4j you’re using) all of Neo4j’s config files. For 3.X, this is a single file called “neo4j.conf”. To have our instances of Neo4j come up with the right configs, we have a few options:
  • We can just add your own configuration files as a volume at run-time with docker run -v /path/to/my/neo4j.conf:/conf/neo4j.conf.
  • We could copy the file over when the container is started. To do that, copy your file into the build at a location which isn’t underneath the volume then call a script from the entrypoint or cmd which will copy the file to correct location and start Neo4j.
  • We could also clone the project behind the Neo4j official image and edit the Dockerfile to add your own config file in before the VOLUME is declared (anything added before the VOLUME instruction is automatically copied in at run-time).
  • We could bring up an interactive “terminal” and then configure Neo4j from there.
  • Upon starting the container we can pass in environment variables (docs)
    • e.g., --env=NEO4J_dbms_memory_pagecache_size=4G
My preference is to store the .conf file outside of the container and then just point the container to the /conf/ directory on startup. This makes it a lot easier when spinning up a cluster (more on that in a follow-up blog post).
To do that, on startup we’ll use something like:

 

docker run \   
        -p 7474:7474 \
        -p 7687:7687 \
        -p 7473:7473 \
        -v $HOME/neo4j/data:/data \
        -v $HOME/neo4j/logs:/logs \
        -v $HOME/neo4j/conf/:/conf/ \
        neo4j:3.0.6-enterprise

Where in my root directory I have a folder called /neo4j/conf/ that has my desired configuration files.

Step 5: Interacting with Neo4j

Since we’ve mapped our container port 7474 to our host machine port 7474. We can open our browser and interact using the Neo4j Browser as per usual via localhost:7474. But, what if we’d like to access the shell?
We’ll need the name of the container we’ve just spun up, to find it we can list all containers using the ps command:
docker ps -a
container docker neo4j name

 

We can then start Neo4j’s shell using the exec command, which is used to operate against an already running container:

 

docker exec -ti $NAMEOFCONTAINER /var/lib/neo4j/bin/neo4j-shell

 

For the above screenshot, the command would be:

 

docker exec -ti fervent_pare /var/lib/neo4j/bin/neo4j-shell

 

If you’re using Neo4j 3.1 and above, you’re able to use the new version of the shell which is called the cypher-shell (/var/lib/neo4j/bin/cypher-shell ). It leverages the new BOLT protocol.

 

The -i flag tells Docker to connect to the STDIN on the container and the -t flag specifies to get a pseudo-terminal.

Step 6: LOAD CSV with Docker and Neo4j

 
Neo4j 3.X’s LOAD CSV tool by default looks for CSV files that are stored within the /neo4j/import/ directory. So we’ll need to map that directory in the same way we did with the conf and data volumes.

 

docker run  
        -p 7474:7474 \
        -p 7687:7687 \
        -p 7473:7473 \
        -v $HOME/neo4j/data:/data \
        -v $HOME/neo4j/logs:/logs \
        -v $HOME/neo4j/import:/var/lib/neo4j/import \
        -v $HOME/neo4j/conf/:/conf/ \
        neo4j:3.0.6-enterprise

 

Now, we can just copy over any CSV files for import into the /import/ directory and then operate on them as per normal.
 
 I’ll shortly be adding a follow-up post that will explain Neo4j 3.1’s Causal Clustering as well as how to spin up a cluster with Docker.

Trackbacks & Pings

Leave a Reply

Your email address will not be published.