Slashco: Using Neo4j to Master the World of Warcraft and Multi-Level Marketing

Using Neo4j’s Graphdatabase to Model MLM Compensation Planning

slashcoLogo

Graphs: The Right Tool for the Job Neo4j Graph database big data

Compensation planning and calculation in complex sales organizations can be incredibly taxing on traditional databases. Sufficiently large organizations often will end up batching these processes overnight or into a weekly job. For the purposes of this blog post, let’s take a look at Slashco: the hottest multi-level marketing “guild” in the World of Warcraft. In this guild, “associates” may join up and sell Slashco’s fine adventuring gear to the many peoples of Azeroth as well as recruit their friends and family to also sell Slashco products. When an associate recruits someone to join the guild, that new member becomes a part of the associate’s “down-stream.” As a royalty for bringing on this new member, associates will receive a small commission of every item sold by members “down-stream” of them.
In this compensation schedule we have sales-mages who will both sell products directly as well as recruit people to also sell products on behalf of Slashco. I created a sample model based on popular multi-level marketing model. It comprises of four income streams:

 

  • Direct Sales: For each item a team-member sells directly, they earn a commission.
  • Downstream Royalty: A percentage earned of the retail price of items sold by people “within their downstream,” meaning those that they have recruited to work for Slashco and by extension those who their recruits have recruited
  • Wholesale Profit: A percentage earned on the goods sold to those “within their downstream” (these goods will then be re-sold to consumers
  • Global Sales Royalty: A percentage of all sales made by those within Slashco. Revenue sharing.
Here’s an example of how a typical MLM organization might compensate its associates:
compPlanSlashco
We see that depending on what level you’re at, how you’re compensated both for your direct sales and also for your passive income streams can vary greatly. As such, calculating compensation can be hairy to say the least.

If we were to migrate our data from a legacy sales compensation planner, it would be best to push all of it out into a series of .csv files. In the database world, comma separated value files are often the lowest common denominator—thus for 90% of the projects I work on, I start there.  Let’s pretend we have pushed our old MySQL data into a series of 3 .csv files:

These are idealized dumps where I’ve synthesized what would normally be about a dozen csv files into 3 simply to show an example of how the data looks before and after Neo4j’s ETL process (LOAD CSV). When inspected by Excel (you could look at them with any other text editor as well), we’ll see a series of headers and rows with values that correspond to those headers. For example in the image below, we can see that a transaction with an ID of 1000000 has a value of 30 under the header of “salesRepID”, 22 under the header of “period”, 7780 under the header of “item1”, etc. etc. We’ll need to tell Neo4j how to interpret this information into a series of nodes and relationships instead of square “row and column” data.
Screen Shot 2015-09-10 at 2.49.45 PM

Before we actually load any data into Neo4j, it’s important to know what questions will be important to our business. In this case, we’re modeling a sales compensation tool. Our top queries might be something like:

  1. Commission due to each rep, by period in accord with compensation rules
  2. Commission due to each rep, annually in accord with compensation rules
  3. Sales Leaderboard 
    • Top Sales Rep by Total Sales
    • Top Sales Rep by Largest Deal
  4. Global reporting
    • Sales by period
    • Top selling items
  5. Recommendations
    • What items are most frequently sold together?

When we know these questions, we can then build a model that makes it easy to answer those questions quickly and efficiently.

As a general rule, if a piece of data is important to your use-case “make it physical” meaning, if you’re going to be traversing or filtering based on a specific bit of information make it a node or relationship. For example, Because we know that “deals” are important to our Sales Leaderboard, we’re going to make the idea of a deal (transaction) a specific type of node. We’ll then add information about that transaction, like when it happened, who owns it, and what items the deal contained. However, as a “last step” we’ll be doing some basic math operations on price, which is specific to that item— so we’ll keep that as a property.

Slashco Sales Comp. Application Data Model:

MLM-Model - New Page (2)
The scripts I’m using to build this blog post are located in this git repo.
Now that we have a data model, let’s fire up Neo4j and pass in our import script. Found Here. Essentially what we’re doing is creating a few constraints and indexes, then telling Neo4j how to interpret our csv files into the above model.
loading
Now that we’ve loaded in all of our data, let’s open up our browser and start answering some of our top queries.
                   nodeExplode

We’ll work backwards:

     5. Recommendations
  • What items are most frequently sold together?
//recommendation engine, what items are most frequently co-sold?
MATCH path = (item:Item)-[:CONTAINS]-(:Transaction)-[:CONTAINS]-(item2:Item)
WHERE id(item) > id(item2)
WITH item, item2, count(distinct path) as instances
ORDER BY instances DESC
LIMIT 3
RETURN item.name, item2.name, instances;
  Screen Shot 2015-09-11 at 9.33.14 AM
     4. Global reporting
  • Sales by period
  • Top selling items
//total sales volume by period descending
MATCH (p:Period)-[:OCCURED_IN]-(t:Transaction)-[:CONTAINS]-(i:Item)
WITH sum(i.price) as sales, p
ORDER BY sales DESC
LIMIT 10
RETURN sales, p.period;
Screen Shot 2015-09-11 at 9.33.43 AM
MATCH (t:Transaction)-[:CONTAINS]-(i:Item)
WITH count(distinct(t)) as itemSales, i
ORDER BY itemSales DESC
LIMIT 5
RETURN i.name as name, itemSales as count;
Screen Shot 2015-09-11 at 9.34.30 AM
 
     3. Sales Leaderboard 
  • Top Sales Rep by Total Sales Volume
  • Top Sales Rep by Largest Deal
//Who has sold the most volume?
MATCH (rep)-[:SOLD]-(txn)-[:CONTAINS]-(itm)
WITH rep, round(sum(itm.price)) as volume
ORDER BY volume DESC
LIMIT 5
RETURN rep.name as name, volume;
Screen Shot 2015-09-10 at 10.01.40 PM
//Who closed the largest deal?
MATCH (rep)-[:SOLD]-(txn)
WITH rep, txn
MATCH (txn)-[:CONTAINS]-(itm)
WITH rep, txn, round(sum(itm.price)) as dealSize
ORDER BY dealSize DESC
LIMIT 5
RETURN rep.name as name, txn.transactionID as transction, dealSize as `deal size`;
Screen Shot 2015-09-10 at 10.00.24 PM
     2. Commission due to each rep, annually in accord with compensation rules
 
Due to the complexity of the queries, I decided to run them with each level of rep separated out into its own query, however they all follow the basic form of the “what do I do with all this gold” query:
//level 6 comp
MATCH (transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.05) as globalRoyalty
MATCH (big_boss:Person {level:6})<-[r:REPORTS_TO*..]-(downStreamers)-[:SOLD]-(transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.1)+sum(item.wholesalePrice*.5) + globalRoyalty as downStreamGlobal6, big_boss
MATCH (boss)-[:SOLD]-(transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.65) + downStreamGlobal6 as tc6, big_boss.name as n6

RETURN tc6, n6;

 

comp by rep 
     1. Commission due to each rep, by period in accord with compensation rules

This looks frighteningly similar to our last query, except we’ve added a short pattern (transaction)-[:OCCURRED_IN]-(period {period:35}) which will filter out all transactions that occurred in periods that are not the 35th. We still see that over a reasonable sized dataset (100 employees, 20k items, 10k transactions), neo4j is lightning fast.

//level 6 comp with time period
MATCH (transaction)-[:OCCURRED_IN]-(p:Period {period:35})
WITH transaction, p
MATCH (transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.05) as globalRoyalty, p
MATCH (transaction)-[:OCCURRED_IN]-(p:Period {period:35})
WITH globalRoyalty, p, transaction
MATCH (big_boss:Person {level:6})<-[r:REPORTS_TO*..]-(downStreamers)-[:SOLD]-(transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.1)+sum(item.wholesalePrice*.5) + globalRoyalty as downStreamGlobal6, big_boss, p
MATCH (transaction)-[:OCCURRED_IN]-(p:Period {period:35})
WITH transaction, downStreamGlobal6, big_boss
MATCH (boss)-[:SOLD]-(transaction)-[:CONTAINS]-(item)
WITH sum(item.price*.65) + downStreamGlobal6 as tc6, big_boss.name as n6

RETURN tc6, n6;

 

comptime
 //kvg

4 Responses to “Slashco: Using Neo4j to Master the World of Warcraft and Multi-Level Marketing

Trackbacks & Pings

Leave a Reply

Your email address will not be published.