NoSQL Data Modeling: Beyond the Relational Mindset
Most developers approach NoSQL the way they approach relational databases, and that is precisely why their NoSQL projects fail. This article distills the ideas from Rick Houlihan's talk Advanced Design Patterns for Amazon DynamoDB and focus on when NoSQL makes sense, why it exists in the first place, and how to design data for it without dragging thirty years of relational habits along.
Why does NoSQL exist? A brief history
To understand NoSQL, it helps to remember why the relational model was invented in the 1970s. Storage was outrageously expensive. A 4 MB hard drive in 1974 retailed for $250,000. Under those constraints, the winning strategy was clear: normalize the data, eliminate redundancy, shrink the footprint on disk.
The relational model was, at its core, a storage-optimization technology. The price it paid was CPU: every denormalized view the application needed had to be reconstructed on the fly through joins across multiple tables. That was a fair trade in 1974, because disk was expensive and CPU was cheap.
Fifty years later, the economics have flipped entirely. Storage is nearly free; CPU is the bottleneck. Yet we are still optimizing for the cheap resource while burning the expensive one on every query. NoSQL is, before anything else, the response to that inversion, a model designed for a world where denormalization is the cheap path and joins are the luxury.
The practical takeaway is blunt: joins are the problem, not the solution. A relational query that pulls a product catalog with three entity types may hit four tables, execute three queries, and perform several joins just to assemble one denormalized view for the application. That cost does not scale linearly, it compounds as the dataset grows. NoSQL sidesteps it entirely by storing the denormalized result as the primary data structure.
When to use SQL and when to use NoSQL
NoSQL is not a replacement for relational databases, it is a different tool for a different class of problem. The decision hinges on how predictable your access patterns are.
Choosing the right tool
| Dimension | Relational (SQL) | NoSQL |
|---|---|---|
| Access patterns | Ad-hoc, unknown, exploratory | Known, repeatable, well-defined |
| Workload type | OLAP, BI, analytics | OLTP at scale |
| Query shape | Complex joins, aggregations, reshaping | Simple key lookups, range scans |
| Schema | Flexible queries, rigid schema | Flexible schema, rigid queries |
| Scale | Vertical, breaks down at very large volumes | Horizontal, designed for distribution |
| Good fit example | Dashboards, reporting, data warehousing | Product catalogs, user sessions, event streams |
If you do not know in advance what questions you will ask, use SQL. If you know exactly what the application will ask and it will ask it millions of times a day, use NoSQL. The former optimizes for the unknown; the latter optimizes for the predictable. The common business case, an OLTP application where ordering a product triggers the same sequence of operations every time, falls squarely in NoSQL territory.
The most common mistake: the adoption trap
Teams adopting NoSQL usually follow the same path: they take their relational schema, split it across multiple collections or tables, keep their normalized entities, and then wonder why their new database is slow, expensive, or unreliable. The problem is almost never the database, it is that they are using a new tool the same way they used the old one.
There is a persistent myth that NoSQL is "flexible." It is not. NoSQL is efficient, not flexible. A well-designed NoSQL schema is tightly coupled to its access patterns, that is the source of its speed. Change the access patterns and you often have to redesign the schema. That rigidity is a feature, not a bug: it is what allows sub-millisecond responses at any scale.
The mental flip is this: in SQL you model the entities first and let the queries follow. In NoSQL you model the queries first and let the entities follow. You must know every access pattern up front, reads, writes, aggregations, filters, before you design a single partition key. Skip this step and no amount of horizontal scaling will save you.
Fundamentals of NoSQL data modeling
Four principles carry most of the weight. Internalize these and the rest of the patterns become variations on a theme.
1. One table, not many
The instinct to create one table per entity is relational thinking in disguise. In NoSQL, a single table can, and usually should, hold multiple entity types. Customers, orders, and items can live in the same table, distinguished by their partition key and sort key structure. One table, queried correctly, can serve a dozen or more distinct access patterns with a single round trip each.
2. Partition keys: high cardinality, uniform distribution
The partition key determines which physical storage node holds your data. The golden rule is high cardinality, uniformly accessed over time. Good partition keys: customer IDs, UUIDs, session IDs. Bad partition keys: status flags, gender, boolean fields, anything that collapses your data into a handful of buckets.
3. The hot-key anti-pattern
When access is not evenly distributed, you get a hot partition: one storage node absorbing most of the traffic while the others sit idle. Visualized as a heatmap across partitions over time, a healthy NoSQL workload looks like a uniformly speckled "pepperoni pizza." A broken one looks like a single red line, one partition burning while the rest of the cluster does nothing. Every NoSQL engine suffers this failure mode; it is a universal property of distributed hash-partitioned systems, not a DynamoDB quirk.
4. Sort keys: modeling one-to-many inside a partition
The sort key is where NoSQL gets interesting. Within a single partition, items are ordered by their sort key, which means you can execute range queries against them cheaply: "all orders for customer X in the last 24 hours", "all events for device Y between timestamps A and B". This is how one-to-many relationships are modeled in NoSQL, not through foreign keys and joins, but through co-location and ordering.
Advanced modeling patterns
Once the fundamentals click, a handful of patterns cover the vast majority of real-world relational structures. They build on each other, so the order matters.
Composite keys and faceted search
A sort key does not have to be a single value. Concatenating fields, for example status#timestamp, creates a composite key that enables faceted search within a partition. You can now query "all items for user X where the sort key begins with PENDING_" and get a selective read rather than scanning the partition and filtering afterwards. The distinction matters: filters apply after the read (you still pay for every item scanned), while sort-key conditions apply before the read (you only pay for what you return).
Partition Key: user_id = "bob"
Sort Key: status#timestamp
Items in partition:
PENDING_2024-01-15T10:00:00
PENDING_2024-01-16T14:20:00
ACCEPTED_2024-01-14T09:15:00
DECLINED_2024-01-13T08:30:00
Query: sort_key BEGINS_WITH "PENDING_"
-> Returns only the 2 pending items, selectively.Linear hierarchies
Geographic, organizational, and taxonomic data tend to form linear hierarchies: country → state → city → office. Instead of four tables joined together, encode the entire path into a composite sort key: STATE#CITY#OFFICE_ID. A single partition keyed on country now supports queries at every level of granularity:
- "Everything in the US" → query by partition key alone
- "Everything in New York state" → sort key begins with NY#
- "Everything in New York City" → sort key begins with NY#NYC#
One table. One query per access pattern. No joins.
Adjacency lists: many-to-many without a join table
Many-to-many relationships, the archetypal case for a join table in SQL, are modeled in NoSQL using an adjacency list. Each entity becomes a partition; each relationship becomes a duplicated item placed in the partition of each entity it connects. A reverse-lookup global secondary index (swapping partition and sort keys) then lets you traverse the relationship in either direction.
Example: contacts and resolver groups with a many-to-many relationship. A contact belonging to three groups is stored three times, once in each group's partition. Querying the table gives you "all contacts for group X"; querying the reverse-lookup index gives you "all groups for contact Y". No join table, no joins, the graph is flattened into the key structure itself.
Version control with the v0 pattern
When items need version history or lightweight transactional semantics, the v0 pattern is elegant. Each logical item is stored twice in its partition: once under a versioned sort key (v1, v2, v3...) and once under a canonical v0 sort key that mirrors the latest committed version.
Readers who want the current state query BEGINS_WITH "v0" and always get the latest. Readers who want history query the numbered versions. To commit a new version, write v3 first, then overwrite v0 with a copy, exposing readers to read-committed or read-uncommitted semantics depending on which sort key they target. You get an audit trail, version history, and a simple two-step transactional workflow, all without external coordination.
ACID transactions: when to use them, when to avoid them
Modern NoSQL engines support multi-item ACID transactions, and it is tempting to reach for them whenever things get complex. Resist that instinct. Transactions are a tool, not a crutch for maintaining normalized schemas. If you find yourself needing a transaction because your data is split across five items that "should" be one, the right fix is usually the data model, not the transaction.
Legitimate use cases: committing changes that genuinely span independent entities (creating an order and decrementing inventory across two different aggregates), conditional batch operations, and multi-item consistency checks. Illegitimate use case: simulating foreign-key integrity across a normalized relational design. If you are using transactions to prop up a relational model, you have already lost most of the benefits that brought you to NoSQL.
Pre-computed aggregations: compute on write, not on read
NoSQL databases are bad at complex aggregations, counts, sums, averages, medians across millions of rows. That is not a flaw; it is a design choice. The NoSQL answer to the aggregation problem is to invert the cost model: instead of computing on read (every time a dashboard loads), compute on write (once, when the data arrives).
The mechanism is a change stream: every insert, update, or delete is published to a log, and a downstream process (a serverless function, a stream consumer, a worker) consumes it and updates a pre-computed aggregation item stored back in the same table. Readers then fetch a single item instead of scanning thousands.
The principle generalizes far beyond any specific database. If a value is read a thousand times for every time it is written, compute it at write time. Time-series data is the canonical sweet spot: once a time window has closed, its aggregates never change, so pre-computing them once and serving them a million times is nearly free. The same logic applies to leaderboards, running totals, histograms, and any metric where write frequency is dominated by read frequency.
The streams aggregation pattern
Why NoSQL is becoming the default for modern workloads
There is a shift worth naming. For most of the last two decades, the advice was "use NoSQL only if you are at scale." That advice is becoming obsolete. The ordinary application is, increasingly, a big-data application. Mobile clients, IoT sensors, event logs, user interactions, and third-party integrations push data volumes into territory where relational joins collapse under their own weight.
The talk that inspired this article closes with a telling case study: an audiobook synchronization service with twenty distinct access patterns, spanning e-books, audio products, audio files, and sync metadata with multiple many-to-many relationships between them. The relational instinct would be a dozen tables and an index per query. The NoSQL solution: one table, three secondary indexes, twenty access patterns satisfied, each with a single round trip.
That is not an edge case. It is what a well-modeled NoSQL schema looks like once the mental model shifts. The limiting factor is never the database, it is how well you understood the access patterns before you started writing the schema.
Closing thoughts
NoSQL is not non-relational, the data is still relational, it is just stored differently. The entity-relationship diagram still matters; you still think about one-to-many and many-to-many relationships; you still care about integrity. What changes is the physical representation: instead of splitting entities across tables and reassembling them with joins, you co-locate them by access pattern and retrieve them with key lookups.
Relational databases are not obsolete. For ad-hoc queries, analytical workloads, and exploratory analysis, they remain the right tool. But for the OLTP workloads that make up the bulk of modern applications, especially at scale, NoSQL is not an alternative anymore. It is the baseline.