Matteo Zuccon is a software developer with a passion for web development (RESTFull services, JS Frameworks), Elasticsearch, Spark, MongoDB and agile processes. He runs whiletrue.run. Follow him on Twitter @matteo_zuccon.
There are many potential relationships living amongst the documents stored in your Elastic indexes. Thanks to Graph, you can explore them.
Graph is a product released in the middle of 2016 as Kibana plugin and now with Elastic version 5.0 is included in the X-Pack extension (Platinum subscription).
Graph is an API- and UI-driven tool, so you can integrate the Graph capabilities into your applications or explore the data using the UI.
Some examples of useful information deduced by a graph analysis include:
- Discovering which vendor is responsible for a group of compromised credit cards by exploring the shops where purchases were made.
- Suggesting the next best song for a listener who digs Mozart based on their preferences.
- Identifying potential bad actors and other intruders by looking at external IPs that machines on your network are talking to.
You can install the X-Pack into Elasticsearch using the command:
and into Kibana using the command:
Here you can find useful resources about Graph capabilities and subscriptions:
In this post I am going to show you an example of how Graph works. We will analyze a dataset that lists all current City of Chicago employees, complete with full names, departments, positions, and annual salaries.
You can read more about the dataset and download it here: City of Chicago employees
These are the metadata of the dataset:
- Name: name of the employee
- Surname: surname of the employee
- Department: the department where he works
- Position: the job position
- Annual Salary: annual salary in dollars
- Income class*: the income class based on the annual salary
- Sex*: male or female
I computed the field marked with the *, you will not find them in the original dataset.
This is how the CSV file looks like:
I converted the CSV file to a JSON file (using a Python script) to easily index the JSON documents to Elasticsearch using the bulk index API.
The JSON file I produced looks like this (the name of my index is chicagoemployees and the type is employee:
Once you have the JSON file you can index you documents in bulk:
Now that we have installed X-Pack and indexed our document we can start to explore them using Graph.
Here few examples of analyses performed using the data:
Which departments have the lowest annual income?
And which have the highest?
We can see that employees working in the Law, Health or Fire departments have a higher annual salary than employees who are working in the Public Library or City Clerk departments.
The thicker edges represent stronger relation (more related documents).
Next, we can highlight the relationships between Departments and Positions for the female employees that work in the Police department.
We can see that the main relationship is between the Police department and the Police Office position but also that the Clerk III position is shared among a lot of departments.
The final example shows that the relationships between the gender and the departments, demonstrating that some departments are common between male and female while others are not.
In this post I demonstrated how to import documents into Elasticsearch and how to utilize the Graph tool to discover relationships amongst these documents. Graph is a really powerful tool because it helps you to find out what is relevant in your data—it is not an easy task because popular is not always the same as relevant.
I recommend installing and exploring the Graph tool as it can easily run in your Kibana environment and analyze your existing indexes. Additionally, you do not need to perform any pre-processing on your documents.
You can download all the resources used in this post below: