Your Data is a Graph, Model it Like One

Modeling your domain objects is one of the most important, if not the most important step in designing any service. A well designed datamodel makes it easy to reason about and manipulate the state your service manages. A poorly designed datamodel does the opposite and results in lost time spent refactoring the model to support new business needs.

Often at startups or when working on a new project, the datamodel is in a constant state of flux. Features are requested that reveal the initial model isn't flexible enough to accommodate the request. For instance, maybe you have a User model that would be better represented as two distinct User and Customer models. Or, maybe a User previously had a relationship to a favorite Book but now they're allowed to have many favorite Books. With a typical relational database implementation, changes like this would require schema migrations.

Migration-less schema

For the average developer, a relational-database schema migration is about as much fun as a trip to the denist. Devs often make trade-offs between taking the time to do a migration and hacking onto the existing datamodel. Unfortunately, the latter can result in technical debt that becomes harder and harder to fix.

But what if I told you, you never had to do a schema migration again (cue the morpheus meme). Often when designing domain objects, programmers whiteboard something resembling an Entity-Relationship Diagram to visualize how to structure the data. This tells us there are two basic parts to representing any datamodel - Entities and Relationships. It follows that a datamodel can also be described more formally as a graph where entities are the verticies (or nodes) and relationships are the edges.

If a relational-database schema only implemented these two building blocks that are common to every datamodel, you would never need to change it. In practice this might look like:

Entities

id type attributes
e1 User {name: "bob", age: 23}
e2 Coupon {amount: "$50"}
e3 Plan {amount: "$10", name: "premium"}

Relationships

id name from to
r1 Discount e1 e2
r2 Subscription e1 e3

This design forms a hybrid between a schema and schema-less database. The idea here is to schematize aspects of the graph that will never change, and leave definitions that are subject to change (entity attributes, relationship cardinality) unstructured and unenforced by the db.

In the example above, it is undefined whether a User can have one or many subscriptions. This can and should still be enforced in code at the application layer, but the database schema doesn't care. Contrast that with a schema containing a User table with a foreign key to a Plan.id. If your service needed to start supporting multiple subscriptions per User a db migration would be needed to move the Plan.id column into a new join-table.

Tradeoffs

Programming is an art of deciding between tradeoffs. While the design I've outlined in this post optimizes for reducing churn on a db schema, there are a few tradeoffs to consider.

  • Traversing a relationship requires an extra JOIN
  • Indexing and filtering on unstructured data is not supported by all db systems. PostgreSQL does offer a jsonb data type that could be useful for this use-case.

In Practice

My current company Trove uses the strategy described above to great success. In my next post I'll go over the details of a sample implementation with this db design.

Show Comments