Back to Blog

Cairn: Compile Knowledge, Don't Retrieve It

RAG rediscovers your knowledge on every query. Cairn compiles it once, at ingest — an agent with a router tool, built on Karpathy's LLM-wiki pattern, plus the verification step that makes it hold up.

ai llm rag-alternative context-engineering agents pattern

I built an agent that answers questions over a content creator’s entire archive — podcasts, articles, a back catalog of written work. Claude picked the name: Cairn, after the stacked-stone trail markers that get built up over time by many hands. It chose it for the compounding metaphor, and I kept it — that is exactly what the knowledge base does: every source adds a stone, the pile compounds, and it marks the path for whoever comes next.

The name is doing real work, because the thing that makes Cairn different from a normal RAG system is exactly that compounding. So this is the pattern, the one decision that defines it, and the lesson that surprised me building it.

The one decision: when do you do the synthesis?

Answering questions over a body of content needs synthesis — deciding what matters, connecting related ideas across different sources, noticing where two sources contradict each other. Someone, or something, has to do that work. The only real question is when.

A normal RAG system does it at question time. Every time a user asks something, it embeds the question, searches a vector index, pulls back the most similar chunks of raw text, and hands the model a fresh pile to reason over. The synthesis happens from scratch, inside that one answer call, and then it is thrown away. Ask the same archive a thousand similar questions and it does that same rediscovery a thousand times. Nothing accumulates.

Cairn does it at ingest time — the moment a new source is added, long before anyone asks a question. A new podcast transcript comes in, and the agent reads it, writes a structured summary, updates the index, and walks the related pages to patch them. By the time a question actually arrives, the synthesis is already done and written down. The question is cheap because the expensive part already happened, once.

If you have written software you already know this trade-off under another name. It is the difference between an interpreter and a compiler. An interpreter re-parses the source every time it runs; a compiler does that work once and produces an artifact you can run cheaply forever after. RAG is the interpreter. Cairn is the compiler — it compiles the archive into a living, structured knowledge base, and keeps it current as new sources land.

Where the pattern comes from

The shape of this is Andrej Karpathy’s LLM-wiki gist. The idea: instead of retrieving chunks at query time, the LLM incrementally builds and maintains a wiki — structured, interlinked markdown files that sit between you and the raw sources. It has three layers:

  • Raw sources at the bottom, immutable — the LLM reads them but never edits them.
  • The wiki in the middle — summaries, concept pages, cross-references — owned entirely by the LLM.
  • A schema document on top, defining how the wiki is structured and maintained.

Ingesting a source is not “add a row to an index.” It is: read the source, write its summary page, update the index, patch the related concept pages, log the ingest. One source touches ten to fifteen pages. That sounds expensive until you remember the alternative is paying a smaller version of that cost on every single question, forever, and keeping none of it.

What it looks like in production

Cairn runs over a creator’s full archive — five full-length written works plus around eighty podcast episodes.

The query side is an agent with a router tool. Hand the router a question, it searches the index and returns the handful of sources most likely to hold the answer. The agent decides whether to call it at all — on a follow-up in a conversation, when the relevant sources are already in hand, it skips routing and answers directly. The search runs inside the tool, so the full index never has to sit in the agent’s context. Once the agent has the right sources, it loads them and writes the answer itself, quoting directly with citations — the whole archive is never in a single prompt.

The storage side has two shapes, chosen by content type:

  • Authored writing — the author’s own words and terminology are the point, so there is no paraphrase layer; the agent quotes raw sources directly. Cairn still takes on open, discussion-style questions, not only direct lookups — but anything substantive in the answer stays a direct quote with a citation, never an unsourced blend.
  • Transcribed material — paraphrasing is acceptable, so Cairn uses the full three-layer structure from Karpathy’s pattern: the raw transcripts at the bottom, the LLM-maintained wiki of summaries and cross-references on top of them, and the schema that defines how that wiki is organized.

Agent vs pipeline

An earlier version of Cairn was a fixed pipeline: one router call, then one answer call, every time. What runs now is an agent, and the difference is not that the agent does more — it does less. A pipeline routes on every question whether it needs to or not. An agent reads the question first and decides: when it already has what it needs — a follow-up in a conversation, the sources still in hand — it skips the router and answers directly. That deciding is what makes it an agent rather than a script.

The catch is that a decision can be wrong. The moment the agent can choose not to route, it can choose not to route when it should have — and answer from stale context, or from the model’s own memory, instead of going to look. A fixed pipeline cannot make that mistake: it is wasteful, but it is safe. An agent trades that safety for judgment.

So the judgment has to be inspectable. Every decision the agent makes is written to a database — what it was asked, whether it routed, what the router returned, what it answered. That log is the difference between trusting the agent’s judgment and being able to check it. An agent you cannot audit is just a pipeline with worse failure modes.

The lesson that surprised me: verify the compiler

Here is the thing I did not expect.

The compiler hallucinates. When you have an LLM build out a routing index — go through the content and list, for each entry, the concepts and terms that appear in it — roughly a quarter of what it proposes is wrong. Not subtly wrong: either a term cross-contaminated from a neighboring entry the model also happens to know, or a plausible-sounding paraphrase that never appears literally in the source. Left in the index, it actively breaks the system. The router matches on that term, picks that entry, the answer call loads the content behind it, the term is not there, and the user gets confident prose with nothing behind it.

The fix is mechanical and boring. For every term the LLM proposed, grep the content that entry points to. If the term is not literally there, drop it. In one index build, 1038 proposed terms came down to 791 after the prune — 247 dropped. A bash loop eliminated a whole class of failure in an afternoon.

This generalizes well past routing indexes. Anywhere an LLM writes structured data that another LLM then trusts — extracted function-call arguments, assigned tags, classifications that drive routing — there is usually a cheap deterministic check available: a grep, a regex, a schema validation, an existence lookup. It is fast, it is dumb, and it compounds. The compiler being fluent and plausible is the whole point, and the whole risk. Verification is what lets you push the compiler aggressively and still trust what comes out.

The direction is right

After I had been running Cairn for about a month, Pinecone announced a new product built around what they call a “context compiler” — structure the knowledge upstream into ready artifacts instead of retrieving raw documents at query time. Same core idea, arrived at independently, from a company whose whole business is the retrieval side. When two parties reach for the same word — compiler — without coordinating, that is usually a sign the direction is right.

It is also a reminder that you do not necessarily need the heavy infrastructure to get there. Cairn is markdown files and an LLM that does not get bored updating cross-references. At the scale most of these problems actually live at — one creator’s archive, not an entire enterprise’s data warehouse — that is the whole stack. The synthesis is resolved once and kept, instead of re-derived on every question. The knowledge compounds. Each source adds a stone.

Compile your knowledge. Then verify the compiler.

The three-layer structure here is Andrej Karpathy’s LLM-wiki pattern. The two-layer routing variant for authored writing, and the mechanical verification step, are additions from running it in production.