Karpathy's LLM wiki workflow
Build a source-grounded knowledge base where an LLM agent curates the wiki from the raw material you feed it. Every claim traces back to a preserved source. Maps directly to the Knowledge base starter pack.
A persistent, compounding knowledge base where you bring the raw material and the LLM does the summarization, cross-referencing, and maintenance. The pattern is Andrej Karpathy's, from his April 2026 gist on LLM-curated wikis. Open Knowledge ships the Knowledge base starter pack as a direct implementation, with one extension Karpathy doesn't formalize: the wiki layer is split into research/ (status: provisional) and articles/ (status: canonical), with the consolidate workflow as the explicit promotion step, so premature canonicalization becomes a thing you opt into rather than something that drifts in.
The point of this guide is to show how Open Knowledge's features compose to make the pattern feel native: the starter-pack picker scaffolds the layout in one click; per-folder templates + agent-readable folder frontmatter teach the LLM the conventions (Karpathy's centralized CLAUDE.md / AGENTS.md schema, distributed per-folder); the workflow tool's three pipeline kinds (ingest, research, consolidate) cover the sources → provisional → canonical pipeline; the WYSIWYG editor + CRDT keep editing frictionless; the activity panel attributes every write; the links tool keeps the source graph clean. Every step uses three or four of these together. That's the product.
Who this is for
- You're new to LLM-curated PKM. You've felt the pull of "the LLM should remember my context for me" but you don't want a vault that turns into a junk drawer of unprocessed clips. Start here.
- You already practice Karpathy's pattern in another tool (Obsidian, plain folders, Claude projects). You want to know what Open Knowledge does differently, and what you keep.
Either way, by the end of this guide you'll have a working source-grounded knowledge base, a routine for adding sources, and a daily-driver agent setup that interrogates the vault on your behalf.
Before you begin
You need:
- The Open Knowledge desktop app. The native macOS app is the canonical surface: WYSIWYG editor, file sidebar, agent activity panel, version timeline, and the starter-pack picker all live there. Install it from the Quickstart.
- An MCP-capable agent assistant. Claude Code, Cursor, or Codex. The OK desktop app's first-launch flow detects them and wires them up.
- A read of the source pattern. Andrej Karpathy's LLM wiki gist is the canonical description of what we're implementing here. It's a 10-minute read and worth doing first; this guide builds on it directly.
If you're brand new to OK, run the Quickstart (≤5 minutes to first agent-driven edit) and come back here.
The scenario
You're evaluating a new agent framework for an upcoming architecture decision. You've collected five sources:
- The framework's GitHub README
- A 12-page architecture overview from the docs site
- A skeptical Hacker News thread
- A paper on the underlying coordination model
- A Twitter thread from a maintainer responding to the HN thread
By the end of the afternoon you can ask your agent "what does the framework do when two sub-agents race on the same write?" and get an answer that cites the specific paragraph in the architecture doc, contextualized with the maintainer's clarification on Twitter, without re-reading any of the five sources yourself.
That's the payoff. Below is how you get there in Open Knowledge.
What's in your vault after one afternoon
your-project/
├── external-sources/
│ ├── framework-readme.md
│ ├── framework-architecture-overview.md
│ ├── hn-skeptical-thread.md
│ ├── coordination-model-paper.md
│ └── maintainer-twitter-thread.md
├── research/
│ └── agent-framework-evaluation.md
├── articles/ (empty; nothing canonical yet)
└── log.md (append-only audit trail)external-sources/. Five.mdfiles, each carrying the verbatim source text plus YAML frontmatter with the original URL, fetch date, and author metadata. Immutable after capture. The agent reads these but never edits them. (Karpathy's gist calls this folderraw/; OK's pack ships it asexternal-sources/for clearer intent.)research/. One provisional article synthesizing the five sources into an evaluation. Every claim cites a specific path inexternal-sources/. Status:provisional(you can change it).articles/. Still empty. You consolidate to here only when you've actually decided (e.g., "yes we're adopting this framework"). Premature consolidation is how wikis go stale. (Karpathy's gist puts both provisional and canonical pages in a singlewiki/folder; OK splits them so the promotion step is explicit.)log.md. One append-only file at project root recording each ingest, research pass, and consolidation. The audit trail. (Direct from Karpathy's pattern.)
The workflow behind it
Karpathy's pattern, summarized from the gist:
| Element | What it is | Karpathy's filenames |
|---|---|---|
| Raw sources | Immutable collection of the actual documents (articles, papers, repo READMEs, transcripts). The LLM reads but does not modify. | raw/ (with raw/assets/ for attachments) |
| The wiki | LLM-generated markdown files: summaries, entity pages, concept pages, cross-references. Single layer; the LLM owns it entirely. | wiki/ |
| The log | Append-only record of ingests, queries, lint passes: "what happened and when." | log.md |
| The index | Content-oriented catalog listing every page with link, summary, optional metadata. | index.md |
| The schema | Config document telling the LLM how the wiki is structured. | CLAUDE.md (Claude Code) or AGENTS.md (other hosts) |
Two operations turn the layers into a living artifact:
- Ingest. When a new source arrives, the LLM reads it, extracts the key information, integrates findings into existing wiki pages, updates the index, and appends to the log. "A single source might touch 10-15 wiki pages."
- Query. When you ask a question, the LLM searches relevant pages and synthesizes an answer. "Good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered: these are valuable and shouldn't disappear into chat history." This is the part most people miss.
Plus periodic lint for contradictions, stale claims, orphan pages, missing concepts, dangling cross-references, and data gaps.
Karpathy's framing of why this matters:
the wiki is a persistent, compounding artifact. The cross-references are already there.
How Open Knowledge implements (and extends) this
The Knowledge base pack scaffolds Karpathy's pattern with one significant addition: the wiki layer is split into research/ (provisional) and articles/ (canonical), with consolidate as the explicit promotion ritual.
| Karpathy | Open Knowledge |
|---|---|
raw/ (verbatim sources, immutable) | external-sources/ (OK's name; same role) + clip template + ingest workflow |
wiki/ (single LLM-owned layer) | Split: research/ (provisional, status:provisional, sources cited) + articles/ (canonical, status:canonical, supersedes chain); consolidate promotes |
log.md (append-only record) | log.md at project root, written by the pack |
index.md (static content catalog) | Provided dynamically via exec (ls) + the file sidebar. The agent reads folder frontmatter + per-doc metadata on every list call, so a static index isn't required. You can still hand-write index.md if you prefer. |
CLAUDE.md / AGENTS.md (centralized schema) | <folder>/.ok/frontmatter.yml + per-folder templates; schema lives next to the action |
| LLM agent | Claude Code, Cursor, Codex |
When your agent lists external-sources/, it reads the folder description telling it to ingest verbatim and not analyze in those files. That's the schema layer doing its job, without you hand-writing a CLAUDE.md from scratch.
Step-by-step: getting to the scenario above
1. Open the desktop app and pick a project folder
Launch Open Knowledge (the macOS app). Either drag-and-drop a folder, open one from Pick Existing Project, or click Start Fresh to create a new vault. The first-launch consent dialog scaffolds .ok/ and offers to wire up any MCP-capable editors it detects (Claude Code, Cursor, Codex).
Don't have the desktop app yet? See the Quickstart.
2. Initialize the Knowledge base starter pack
In the editor, on a fresh project, click the empty-state Pick a starter pack button → select Knowledge base → confirm the subfolder (leave blank for project-root, or use something like brain/ to nest the layers). Apply.
You'll get external-sources/, research/, articles/, the three matching templates, and log.md at the chosen root. The folders carry descriptions written for the agent; see them in the file sidebar's folder tooltips, and in any exec directory listing from an agent.
The starter-pack picker is idempotent (safe to re-run on the same project).
3. Confirm your agent host is wired
If you accepted the desktop app's first-launch offer to wire up your agent assistant, you're done. If not, in your agent assistant (Claude Code, Cursor, or Codex), ask:
> list the workflow tools availableYou should see mcp__open-knowledge__workflow (its kind covers ingest / research / consolidate / discover), alongside the standard reads/writes (exec, search, write, links, etc.). The workflow guides cover the sources → provisional → canonical pipeline described in the mapping table above.
4. Ingest the five sources
For each source, paste the URL into your agent and say:
> ingest this: https://github.com/example/frameworkThe agent fetches the URL, calls ingest, and writes the verbatim content to external-sources/framework-readme.md with frontmatter capturing the URL, fetch date, and any author metadata it can extract. Never copy-paste raw URLs into chat as "sources" without ingesting. The knowledge base is closed-loop. Every claim must cite a local doc.
Five ingests, five files. After the fifth, ask:
> what's in external-sources/, summary of eachYour agent uses exec("ls external-sources/") to read frontmatter and per-file summaries.
5. Synthesize into a research log
> research the agent-framework question; synthesize the five sources in external-sources/Running workflow({ kind: "research", topic: "agent-framework evaluation" }) spins up a step-by-step research flow: it confirms scoping, lists what's covered by the existing sources, identifies gaps (maybe one source you haven't ingested yet), and produces a research/agent-framework-evaluation.md with status: provisional and a sources: array listing the five paths.
Every claim in the log cites a specific source path. You can now ask:
> what does the framework do when two sub-agents race on the same write?The agent reads research/agent-framework-evaluation.md first, follows the citation chain into external-sources/, and returns a synthesized answer with traceable evidence.
What's inside .ok/ after seeding
The pack doesn't just create three folders. It scaffolds the agent-readable schema layer that makes the LLM behave per Karpathy's conventions without you hand-writing a CLAUDE.md.
Each folder gets a .ok/frontmatter.yml (the agent reads it on every exec directory listing) plus a .ok/templates/<name>.md (the agent picks it on every write({ document: { template } }) call). external-sources/.ok/frontmatter.yml, for example, opens with:
Raw sources SAVED verbatim, not just cited. Produced by
ingest... Immutable after capture. No analysis in these files; that belongs inresearch/.
Result: the agent learns each layer's discipline without a separate prompt or skills file. research/ and articles/ carry analogous descriptions. This is Karpathy's centralized CLAUDE.md schema, distributed per-folder, so schema lives closest to the agent action. Customize via mcp__open-knowledge__edit (with a folder target).
The consolidate step, deferred
You don't consolidate yet in our scenario; the team hasn't decided. Maybe in two weeks you adopt the framework. Then:
> consolidate the agent-framework research into a canonical articleThe workflow tool's consolidate kind starts with a STOP gate asking whether the decision is actually made. If yes, it writes articles/agent-framework-evaluation.md with status: canonical and a supersedes: chain pointing back to the research log. That research/ doc isn't deleted; it's superseded, so the evidence chain stays intact.
What just composed. Five sources became one canonical article via: the Knowledge base starter pack (three folders, three templates, agent-readable folder frontmatter, one click), the workflow tool's three pipeline kinds (ingest/research/consolidate mapping 1:1 to Karpathy's layers), the agent activity panel (every fetch + write attributed), the wikilink graph + frontmatter sources: arrays (citation chain audit-able doc by doc), and the WYSIWYG editor (review and refine without leaving the same surface). The same vault, the same product, the whole stack working as one thing.
The promotion rhythm
When to use each tool, and when not to:
| Trigger | Tool | Why |
|---|---|---|
| Source arrives (URL, PDF, transcript) | ingest | Preserve verbatim before analyzing |
| You searched the web yourself to ground a claim | ingest | Closed-loop: the KB doesn't cite the live web, only local docs |
| You're synthesizing 2+ sources into an answer | research | Provisional article, citations required, status:provisional |
| Team decided; this is the canonical position | consolidate | Writes status:canonical with supersedes chain |
| You just want to write a note | write | Not every doc needs the workflow (scratch notes, project pages, runbooks) |
| You ran a one-off query and the answer's useful | Save the chat to a new doc in research/ | Per Karpathy: "valuable query results become new wiki pages"; don't let them die in chat history |
Anti-pattern: consolidating too early. If you consolidate before the team has actually decided, you'll be rewriting canonical articles every week. Status provisional exists for a reason. Keep things in research/ until decisions are real.
Anti-pattern: ingesting your own thoughts. ingest is for external sources, preserved verbatim. Your reflections, hypotheses, and reactions go in research/ (provisional analysis) or a separate notes/ folder. Don't pollute the immutable layer.
Cadence
| When | Do | Composes |
|---|---|---|
| As sources arrive | ingest (30 sec per source) | MCP tool + external-sources/.ok/ frontmatter + clip template + agent activity panel |
| Weekly | A research pass to synthesize recent ingests; ask the agent to flag contradictions | research workflow + folder frontmatter discipline + wikilink graph |
| Per-decision | consolidate to canonical | consolidate workflow + supersedes chain + articles/ template |
| Monthly | "find stale claims in articles/, orphans, missing cross-refs" | links (orphans, dead links, backlinks) |
The compounding move: end each ingest session with one synthesis query. The answer becomes a research/ doc. The wiki grows.
Tips
- Folder descriptions are agent-readable. The pack writes
<folder>/.ok/frontmatter.ymlso the agent learns layer discipline on everyexecdirectory listing, with no separateCLAUDE.md. log.mddoubles as a journal. Everyingest/research/consolidatelands there with timestamps. Underrated.New from template…in the sidebar gives you the right frontmatter shape instantly. Don't hand-author.
Looking for entity tracking instead?
This guide is the source-grounded posture: bring sources, the LLM curates the wiki. If what you actually want is to track people, companies, and meetings (who's in your network, what was said, what changes over time), the Entity vault (GBrain-compatible) workflow guide is the better starting point. Different pattern, same Open Knowledge editor.
Further reading
Source pattern
- Karpathy's LLM wiki gist. The original three-layer pattern, in his words. Read this before or alongside this guide.
Open Knowledge internals that support this workflow
- Agent activity. How every agent edit lands in the shadow repo with attribution.
- Claude Code, Cursor, Codex. Agent-assistant integrations.
STARTER_PACKSregistry. Canonical source for the Knowledge base pack content.
Adjacent OK workflow
- Entity vault (GBrain-compatible) workflow in Open Knowledge. Entity-vault counterpart, if the source-grounded posture isn't your fit.
Folders and templates
Describe a folder with its own properties, and offer reusable templates that give new documents their starting content and properties.
Entity vault (GBrain-compatible) workflow
A GBrain-compatible Markdown workflow for people, companies, meetings, and concepts. Open Knowledge is the cockpit/editor/review layer; Garry Tan's gbrain can import or sync the same vault for indexing and automation.