Deep Modules — and Why They Matter More in the AI Era

I watched Matt Pocock’s talk Software Fundamentals Matter More Than Ever and one idea kept circling back: deep modules. The talk pointed at Ousterhout’s A Philosophy of Software Design and said, roughly, that the old advice transfers to AI codebases for new reasons. Pocock made the claim land in two minutes. I wanted to actually understand it instead of nodding along.

So I sat down with Claude and worked it from the ground up — what a module actually is, why narrow interfaces beat wide ones, how to tell deep from shallow without hand-waving, and what the AI-era twist adds beyond Ousterhout’s original argument. The full walkthrough — with a module-shape slider, a side-by-side diff explorer, a decision tree, and a LOC-growth chart you can drag — lives at barnessn.com/deep-modules. This is what now guides me when I build with agents — the lens behind every interface decision I make.

A module is just a line drawn around some code

Forget the metaphor and think mechanism. When you put functions in a file and pick which ones to export, you’ve drawn a line. Exported means the outside can see it. Not exported means it stays inside. The boundary’s literal form changes from layer to layer — public/private on a class, a URL surface for a REST API, “what we own vs what we ask another team for” for a team — but the concept is the same. Something on this side, something on that side, and a deliberate boundary between them.

Every module has two parts: an interface (the surface a caller has to learn) and an implementation (everything inside the caller doesn’t have to know). The whole point of drawing a line is to declare a contract — “outside this line, you only need to know X. Inside, the complexity is my problem.”

Deep vs shallow

Picture a module as a rectangle. The width is the interface — how much a caller has to learn. The depth is the functionality behind it.

A deep module is narrow on top, fat below: one simple call, lots happening inside. Unix’s read(fd, buf, n) is the canonical example — three arguments hiding filesystems, page caches, device drivers, networking.

A shallow module is the opposite: wide on top, thin below — an eight-argument wrapper that just forwards to another function. You pay full interface cost for almost no hiding.

The rule is one sentence: a module earns its keep by hiding more than it exposes. If it doesn’t hide much, it’s noise — the caller would be better off reading the body inline.

The cover-body test, sharpened

The classic diagnostic is: cover the implementation and read only the interface. Can you predict what it does? But “predict” is two different things — predicting the line-by-line implementation, or predicting the contract. A deep module hides the first but exposes the second. From read(fd, buf, n) you can’t predict the page cache, but you can predict the contract: “fills the buffer with up to n bytes from the file.” That’s a useful, compressed summary.

A shallow module fails differently. The contract of buildHeaders(userId, env, contentType, accept, traceId) is barely smaller than its body — five inputs, five outputs, one-to-one. The summary is the same size as the body. The abstraction didn’t compress anything.

The corrected test, the one I’d actually use: can you state the contract in fewer words than it would take to read the body? If yes, deep. If no, shallow.

Why this transfers to AI — for a different reason than Ousterhout meant

Ousterhout’s original argument was about human working memory. Humans have Miller’s 7±2 — a deep module lets a reader page out the implementation and reason about the interface only. That argument still applies.

But language models have the opposite cognitive shape. They have effectively unlimited working memory inside one context, zero persistent memory across sessions, and they generate plausible nonsense when given a wide surface to misuse. So the deep-module rule transfers, but for different physics with the same shape: wide interface means more bytes for the same operation (token cost), more public API surface for the agent to misuse, and a bigger blast radius when the agent is wrong inside.

Pocock’s framing — “go back to old books” — is right in outcome but a little misleading in cause. The fundamentals aren’t transferring to AI nostalgically. They’re transferring because the failure modes line up.

The shallow shape compounds

Here’s where I stopped nodding and started seeing it everywhere.

Imagine four near-identical API routes for different content types — each ~350 lines, each doing 95% the same auth, quota, streaming, and persist scaffolding, each diverging only on a schema import, a sources function, and a prompt name. Add a fifth content type by copy-paste and you’ve added another 350 lines of accidental duplication and a fifth place to fix the next abort/quota/persistence bug.

AI both produces this shape and fails to navigate it. Each turn the agent sees only so much; it’s safer to write small isolated functions; it’s pattern-matched to “good code = small functions” from training. Once a codebase tilts shallow, every new feature widens the dependency graph by another fan, and the next agent run gets lost in it.

The fix isn’t clever. Notice that the orchestration is duplicated. Notice that the lower layers — the router, the answer streamer, the history loader — are already deep, parameterized by content type and called the same way everywhere. The leak is at the route shell.

Extract that shell as another deep module: narrow surface (the per-type config), fat insides (the streaming, auth, quota, persistence plumbing). New content types ship as 80-line callers instead of 350-line copies.

The deepest move is in the schema

This was the part I almost missed.

The application-level collapse only works because the database already encoded the deep-module move. A content_items parent table holds every kind of content with a contentType discriminator. A content_transcripts child table holds the transcribed text with no type column.

The medium-specific work — speech-to-text for audio, captions for video — happens upstream in the ingestion pipeline. By the time something hits content_transcripts, it’s just text. One loader serves all transcribed types because the data shape is the same.

The decision rule the schema encodes is the whole insight in one sentence: share when the shape is the same; separate when the shape legitimately differs. Audio vs video is different in real life. Their transcripts are not. The schema gets to make that distinction, and the application code inherits it.

What stays separate is part of the design

A deep module that refuses to unify what doesn’t fit is more honest than a deep module that swallows everything. Two examples of refusing, both real:

If one content type is stored as structured markdown in object storage with custom page markers and another is a row in Postgres, those loaders stay separate. Forcing them through one function would push storage-specific logic into the shared loader and undo the abstraction.
If one query log table has different foreign keys and different analytics needs, it gets its own table. The union of all per-type log shapes isn’t worth the complexity it would force into the shell.

The shell’s name and type should say exactly what it covers. What doesn’t fit is the boundary the abstraction earns by being narrow.

The lesson, in one breath

The leverage point in the AI era moves from “writing code” to “drawing the right boundary.” That sounds like it should make senior judgment matter less. It’s the opposite — AI can hand-roll the implementation behind any interface you give it. Choosing the right interface is the part that doesn’t automate.

The refactor that demonstrates this — the one in the interactive walkthrough — isn’t a clever architectural move. It’s three small acts of judgment, in order:

Notice that the routes are 95% identical at the orchestration layer.
Notice that the lower layers were already deep.
Refuse to unify the parts whose shape really differs.

None of those are mechanical. They’re shape-sensing decisions. The mechanical part — writing the shell, writing the new thin route — is the easy half.

If you take one thing from this post: the right question is never “what kind of thing is this in the world?” It’s “what shape is the data, and where does the medium-specific work end?” That question separates the parts that should collapse from the parts that legitimately stay separate. Everything else is consequence.

The interactive version is where the diagrams click. A module-shape slider you can drag from deep to shallow and watch the rectangle invert. A side-by-side diff explorer with a toggle for highlighting what’s identical vs what differs. A decision tree that walks you through “should I collapse these two functions?” question by question. A growth chart that shows how the shallow shape compounds linearly while the deep shape grows on a flat 80-lines-per-new-type curve. If the prose above feels abstract, the interactive piece is where it lands.