Should I use a graph database?
8 May 2020
One of our clients asked us the other day:
I'm setting up a web app that stores information about devices and the interconnected dependencies between devices. The app would have a web front for a user to navigate through the graph of devices and be able to click through each node, find related/linked nodes, and dive deeper into a given node.
We're trying to decide what database to use, and thinking about a graph database. Do you have any advice?
Graph Databases (like Neo4j, AWS Neptune, and many others) are pretty amazing, but it's usually only a good idea to use one if you fit one of two criteria:
- You really want to use a graph database. If you want to learn something new, go for it! But our job as consultant normally isn't to use unfamiliar technology.
- You have a "true graph problem."
It can sometimes be difficult to tell if you have a "true graph problem," or just have a relational problem that looks like a graph problem.
"What devices are linked to other devices" just looks like a graph problem. It's pretty easily solved with just two tables:
devices(id, name, <other columns>)stores the device information (and would probably have foreign keys to/from other application tables)
devices_network(upstream_id, downstream_id)is a join table specifying the relationships between devices (you may want some to add some constraints).
You can then create a graph with some joins between those tables or multiple queries. This is going to be fast and efficient to get the network for any particular node, and maps really well to associations in Object Relational Mappers, wither eager loading/preloading.
If you're not using an ORM, it's still pretty simple to build an in-memory network structure of some set of the graph to answer questions about deeper relationships.
What is a "true graph problem"?
Graph problems are less about connections and nodes, and more about the edges between those nodes. "What is connected to this node?" is an easy question to answer with a relational database, as we saw above. Some questions that are not easy to answer would be "what paths are available to travel between two nodes?" and "what are the costs/constraints between two nodes?"
You can think of air travel. If we're traveling between airports, we're not so concerned with "what airports are connected to JFK," which is a relational problem. We're rather asking "what is the fastest/cheapest/shortest path between JFK and SFO or nearby airports," which can only be solved in terms of a graph, because information about the edges/connections are fundamental to the solution.
Put more abstractly: if the edges between your nodes have important properties, you probably have a graph problem. If the edges between nodes are just connections, you have a relational problem.
It's always much easier to run and develop against a relational database than a graph or any specialized database. Relational databases are a marvel of technology (especially PostgreSQL 😍), and you shouldn't throw that out lightly. Specialized tech means you’re going to have to do a lot of stuff from scratch, or with vastly more limited libraries. Far fewer tutorials and acquired knowledge. You’re just on your own for a whole lot more!
That said, there are times graph and other specialized databases are the way to go. It's just a good idea to make sure you know why you're choosing it.
No matter what you're building, we'd love to help. Please get in touch!