|
| 1 | +--- |
| 2 | +name: codebase-knowledge-builder |
| 3 | +description: >- |
| 4 | + Deep-dive into any codebase and produce structured knowledge artifacts |
| 5 | + that turn a coding agent into a codebase specialist. This skill should be |
| 6 | + used when the user asks to "study this repo", "understand this codebase", |
| 7 | + "document this project", "onboard me onto this code", "create codebase |
| 8 | + knowledge", "map this architecture", or when asked to produce knowledge |
| 9 | + artifacts for any agent working on an unfamiliar repository. |
| 10 | +version: 1.0.0 |
| 11 | +--- |
| 12 | + |
| 13 | +# Codebase Knowledge Builder |
| 14 | + |
| 15 | +Transform from a generalist into a codebase specialist by systematically studying a repository and producing high-quality knowledge artifacts. The process follows a strict "read first, write later" principle across four sequential phases. |
| 16 | + |
| 17 | +## Prerequisites |
| 18 | + |
| 19 | +- File read access to the target repository (cloned locally or accessible via tools) |
| 20 | +- Bash access for file counting and structure discovery |
| 21 | +- Write access to produce scratch files and final artifacts |
| 22 | + |
| 23 | +## Workflow |
| 24 | + |
| 25 | +1. **Reconnaissance** -- Build a broad mental model of the entire repo |
| 26 | +2. **Deep-Dive Study** -- Investigate each requested topic in isolation |
| 27 | +3. **Artifact Authoring** -- Synthesize findings into polished knowledge artifacts |
| 28 | +4. **Delivery** -- Package and deliver artifacts to the user |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +### Phase 1: Reconnaissance |
| 33 | + |
| 34 | +Clone the repo and build a high-level map before touching any specific topic. |
| 35 | + |
| 36 | +1. Run `find . -type f -name '*.js' -o -name '*.ts' -o -name '*.py' | head -50` and `wc -l` to gauge scale. |
| 37 | +2. Read the main entry point file end-to-end. |
| 38 | +3. Follow the checklist in `references/recon-checklist.md` to systematically discover architecture, entry points, config systems, and key abstractions. |
| 39 | +4. Save a structured summary to a scratch file (`recon_findings.md`) with: tech stack, directory map, module responsibilities, design patterns, and open questions. |
| 40 | + |
| 41 | +**Do not proceed to Phase 2 until the repo's architecture can be described in one paragraph.** |
| 42 | + |
| 43 | +### Phase 2: Deep-Dive Study |
| 44 | + |
| 45 | +For each topic the user requests, perform a focused investigation. Study each topic **separately** -- do not mix concerns. |
| 46 | + |
| 47 | +1. Read `references/deep-dive-methodology.md` for file reading strategies, tracing patterns, and note-taking protocol. |
| 48 | +2. Start from the subsystem's entry point and follow imports outward (dependency order, not alphabetical). |
| 49 | +3. Trace three paths per subsystem: **happy path**, **error path**, **edge cases**. |
| 50 | +4. After every 2-3 files, save key findings to a scratch file. Do not rely on context memory alone. |
| 51 | +5. For each file, capture: purpose (one sentence), key functions, what it calls, what calls it, and gotchas. |
| 52 | + |
| 53 | +### Phase 3: Artifact Authoring |
| 54 | + |
| 55 | +Synthesize each topic's findings into a standalone knowledge artifact. |
| 56 | + |
| 57 | +1. Copy the template from `templates/knowledge_artifact.md` for each topic. |
| 58 | +2. Fill **every section** -- Overview, Architecture, Key Components table, Data & Control Flow, Key Functions table, Configuration table, Gotchas, Extension Points, and Visual Flow diagram. |
| 59 | +3. Include Mermaid diagrams: use `sequenceDiagram` for flows, `graph TD` for architecture. |
| 60 | +4. Each artifact must be self-contained -- a developer reading only that artifact should understand the subsystem completely. |
| 61 | + |
| 62 | +### Phase 4: Delivery |
| 63 | + |
| 64 | +Attach all completed Markdown artifacts to a message to the user. Include a brief summary of what each artifact covers. |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +## Limitations |
| 69 | + |
| 70 | +- Large monorepos (>10,000 files) may require scoping to specific directories or packages before starting reconnaissance. |
| 71 | +- Binary files, compiled assets, and vendored dependencies should be excluded from study. |
| 72 | +- Knowledge artifacts reflect the codebase at a point in time. Major refactors may invalidate sections. |
| 73 | + |
| 74 | +## Quality Checklist |
| 75 | + |
| 76 | +Before delivering any artifact, verify: |
| 77 | + |
| 78 | +| Check | Criteria | |
| 79 | +| :--- | :--- | |
| 80 | +| **Completeness** | Every template section is filled with codebase-specific detail, not placeholders. | |
| 81 | +| **Accuracy** | File paths, function names, and parameter descriptions match the actual code. | |
| 82 | +| **Gotchas** | At least 2-3 non-obvious behaviors, historical fixes, or race conditions documented. | |
| 83 | +| **Visuals** | At least one Mermaid diagram per artifact. | |
| 84 | +| **Self-contained** | A reader with no prior context can understand the subsystem from the artifact alone. | |
| 85 | + |
| 86 | +## Bundled Resources |
| 87 | + |
| 88 | +| Resource | Path | When to Read | |
| 89 | +| :--- | :--- | :--- | |
| 90 | +| Recon Checklist | `references/recon-checklist.md` | At the start of Phase 1 | |
| 91 | +| Deep-Dive Methodology | `references/deep-dive-methodology.md` | At the start of each Phase 2 topic | |
| 92 | +| Artifact Template | `templates/knowledge_artifact.md` | At the start of Phase 3 for each topic | |
0 commit comments