Initial release: codebase-knowledge-builder skill

OthmanAdi · OthmanAdi · commit bee4a9bfc5ac · 2026-03-08T19:24:28.000+01:00
Agent skill that systematically studies any repository and produces
structured knowledge artifacts. Four-phase workflow: reconnaissance,
deep-dive study, artifact authoring, and delivery.

Includes recon checklist, deep-dive methodology reference, and
knowledge artifact template.

Co-authored-by: OthmanAdi &lt;othmanadi@users.noreply.github.com&gt;
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,32 @@
+---
+name: Bug report
+about: Something isn't working as expected
+title: ''
+labels: bug
+assignees: ''
+---
+
+## What happened
+
+Describe what went wrong. Be specific.
+
+## What you expected
+
+What should have happened instead.
+
+## Steps to reproduce
+
+1. Installed the skill via `npx skills add OthmanAdi/codebase-knowledge-builder`
+2. Asked the agent to "..."
+3. ...
+
+## Environment
+
+- **Agent**: (Claude Code / Cursor / OpenCode / other)
+- **Agent version**: 
+- **OS**: 
+- **Target codebase language/framework**: 
+
+## Additional context
+
+Error messages, screenshots, or the artifact output (if relevant).
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,23 @@
+---
+name: Feature request
+about: Suggest an improvement to the skill
+title: ''
+labels: enhancement
+assignees: ''
+---
+
+## Problem
+
+What's missing or inconvenient? Describe the situation, not just a solution.
+
+## Proposed solution
+
+How would you solve it? If you're not sure, that's fine -- just describe the problem clearly.
+
+## Alternatives considered
+
+Other approaches you thought about and why they didn't fit.
+
+## Additional context
+
+Links, screenshots, or examples from other skills that handle this well.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,24 @@
+## What this PR does
+
+Describe the change in 1-3 sentences.
+
+## Why
+
+What problem does this solve? Link to an issue if there is one.
+
+## What changed
+
+- [ ] SKILL.md
+- [ ] References (`references/`)
+- [ ] Template (`templates/`)
+- [ ] README or docs
+- [ ] Repo config (CI, templates, etc.)
+
+## Checklist
+
+- [ ] SKILL.md is under 3,000 words
+- [ ] SKILL.md uses imperative form (no "you should")
+- [ ] Description uses third person ("This skill should be used when...")
+- [ ] All referenced files exist
+- [ ] Tested with at least one real codebase
+- [ ] No placeholder text left in templates
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,5 @@
+.DS_Store
+Thumbs.db
+*.swp
+*.swo
+*~
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,41 @@
+# Contributing
+
+Thanks for considering a contribution. Here's how to do it well.
+
+## Reporting bugs
+
+Open an issue using the **Bug report** template. Include:
+
+- What you expected to happen
+- What actually happened
+- The agent you're using (Claude Code, Cursor, OpenCode, etc.)
+- Any error messages or unexpected output
+
+## Suggesting improvements
+
+Open an issue using the **Feature request** template. Describe the problem you're solving, not just the solution you want. Context helps.
+
+## Pull requests
+
+1. Fork the repo and create a branch from `main`
+2. Make your changes
+3. Make sure the SKILL.md still passes [skillcheck](https://getskillcheck.com) validation
+4. Open a PR using the pull request template
+5. Describe what changed and why
+
+### What makes a good PR
+
+- One concern per PR. Don't bundle unrelated changes.
+- If you're modifying SKILL.md, keep it under 3,000 words. Move detailed content to `references/`.
+- If you're adding a reference file, make sure SKILL.md points to it in the Bundled Resources table.
+- Test with at least one real codebase before submitting.
+
+### Style
+
+- SKILL.md uses imperative form ("Run the command", not "You should run the command")
+- SKILL.md description uses third person ("This skill should be used when...")
+- README and human-facing docs: write like a person, not a press release
+
+## Code of conduct
+
+Be direct, be helpful, don't waste people's time. That covers it.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 Ahmad Othman Ammar Adi (OthmanAdi)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,74 @@
+# codebase-knowledge-builder
+
+An agent skill that studies any repository and produces structured knowledge artifacts. Drop it into Claude Code, Cursor, OpenCode, or any agent that supports the [agentskills](https://agentskills.io) spec, point it at a codebase, and get back documentation that actually helps.
+
+[![skillcheck passed](https://raw.githubusercontent.com/olgasafonova/skillcheck-free/main/skill-check/passed.svg)](https://getskillcheck.com)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+[![skills.sh](https://img.shields.io/badge/skills.sh-codebase--knowledge--builder-black)](https://skills.sh/othmanadi/codebase-knowledge-builder)
+
+## What it does
+
+Most agents forget what they read three files ago. This skill fixes that by following a four-phase process:
+
+1. **Reconnaissance** -- scan the repo structure, identify the tech stack, map module boundaries
+2. **Deep-dive study** -- trace happy paths, error paths, and edge cases through each subsystem
+3. **Artifact authoring** -- fill a structured template covering architecture, key functions, gotchas, and Mermaid diagrams
+4. **Delivery** -- hand back self-contained Markdown artifacts that any developer (or agent) can read cold
+
+The output is a set of knowledge artifacts. Each one covers a single subsystem and stands on its own. No prior context needed.
+
+## When to use it
+
+- Onboarding onto an unfamiliar codebase
+- Producing documentation for a repo that has none
+- Preparing knowledge files so other agents can work on the project without re-reading everything
+- Studying a specific subsystem (auth, database layer, API routing, etc.) in depth
+
+## Install
+
+```bash
+npx skills add OthmanAdi/codebase-knowledge-builder
+```
+
+Or manually: copy the `skills/codebase-knowledge-builder/` directory into your agent's skills folder.
+
+## What's inside
+
+```
+skills/codebase-knowledge-builder/
+  SKILL.md                              # Skill definition and workflow
+  references/
+    recon-checklist.md                   # Phase 1 checklist
+    deep-dive-methodology.md            # File reading and tracing strategies
+  templates/
+    knowledge_artifact.md               # Output template for each subsystem
+```
+
+The SKILL.md stays lean (~80 lines). Detailed methodology lives in `references/` and only gets loaded when needed. The template in `templates/` defines the exact structure of every knowledge artifact the skill produces.
+
+## Example output
+
+After running the skill on a Node.js API, each artifact includes:
+
+- Architecture overview with design pattern identification
+- Key components table (component, file path, responsibility)
+- Step-by-step data and control flow
+- Key functions table with parameters and return values
+- Configuration and environment variable mapping
+- Gotchas and pitfalls (race conditions, caching quirks, historical fixes)
+- Extension points for adding new functionality
+- Mermaid diagrams for visual flow
+
+## How it works under the hood
+
+The skill uses progressive disclosure. When an agent triggers it, only the SKILL.md body loads into context (~600 words). The references and template load on demand during each phase. This keeps the context window clean for the actual codebase files being studied.
+
+Scratch files (`recon_findings.md`, per-file notes) are saved during study so the agent doesn't lose findings as it reads more files. The quality checklist at the end catches incomplete sections, missing diagrams, and placeholder text before delivery.
+
+## Contributing
+
+See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on submitting issues and pull requests.
+
+## License
+
+[MIT](LICENSE)
diff --git a/skills/codebase-knowledge-builder/SKILL.md b/skills/codebase-knowledge-builder/SKILL.md
@@ -0,0 +1,92 @@
+---
+name: codebase-knowledge-builder
+description: >-
+  Deep-dive into any codebase and produce structured knowledge artifacts
+  that turn a coding agent into a codebase specialist. This skill should be
+  used when the user asks to "study this repo", "understand this codebase",
+  "document this project", "onboard me onto this code", "create codebase
+  knowledge", "map this architecture", or when asked to produce knowledge
+  artifacts for any agent working on an unfamiliar repository.
+version: 1.0.0
+---
+
+# Codebase Knowledge Builder
+
+Transform from a generalist into a codebase specialist by systematically studying a repository and producing high-quality knowledge artifacts. The process follows a strict "read first, write later" principle across four sequential phases.
+
+## Prerequisites
+
+- File read access to the target repository (cloned locally or accessible via tools)
+- Bash access for file counting and structure discovery
+- Write access to produce scratch files and final artifacts
+
+## Workflow
+
+1. **Reconnaissance** -- Build a broad mental model of the entire repo
+2. **Deep-Dive Study** -- Investigate each requested topic in isolation
+3. **Artifact Authoring** -- Synthesize findings into polished knowledge artifacts
+4. **Delivery** -- Package and deliver artifacts to the user
+
+---
+
+### Phase 1: Reconnaissance
+
+Clone the repo and build a high-level map before touching any specific topic.
+
+1. Run `find . -type f -name '*.js' -o -name '*.ts' -o -name '*.py' | head -50` and `wc -l` to gauge scale.
+2. Read the main entry point file end-to-end.
+3. Follow the checklist in `references/recon-checklist.md` to systematically discover architecture, entry points, config systems, and key abstractions.
+4. Save a structured summary to a scratch file (`recon_findings.md`) with: tech stack, directory map, module responsibilities, design patterns, and open questions.
+
+**Do not proceed to Phase 2 until the repo's architecture can be described in one paragraph.**
+
+### Phase 2: Deep-Dive Study
+
+For each topic the user requests, perform a focused investigation. Study each topic **separately** -- do not mix concerns.
+
+1. Read `references/deep-dive-methodology.md` for file reading strategies, tracing patterns, and note-taking protocol.
+2. Start from the subsystem's entry point and follow imports outward (dependency order, not alphabetical).
+3. Trace three paths per subsystem: **happy path**, **error path**, **edge cases**.
+4. After every 2-3 files, save key findings to a scratch file. Do not rely on context memory alone.
+5. For each file, capture: purpose (one sentence), key functions, what it calls, what calls it, and gotchas.
+
+### Phase 3: Artifact Authoring
+
+Synthesize each topic's findings into a standalone knowledge artifact.
+
+1. Copy the template from `templates/knowledge_artifact.md` for each topic.
+2. Fill **every section** -- Overview, Architecture, Key Components table, Data & Control Flow, Key Functions table, Configuration table, Gotchas, Extension Points, and Visual Flow diagram.
+3. Include Mermaid diagrams: use `sequenceDiagram` for flows, `graph TD` for architecture.
+4. Each artifact must be self-contained -- a developer reading only that artifact should understand the subsystem completely.
+
+### Phase 4: Delivery
+
+Attach all completed Markdown artifacts to a message to the user. Include a brief summary of what each artifact covers.
+
+---
+
+## Limitations
+
+- Large monorepos (>10,000 files) may require scoping to specific directories or packages before starting reconnaissance.
+- Binary files, compiled assets, and vendored dependencies should be excluded from study.
+- Knowledge artifacts reflect the codebase at a point in time. Major refactors may invalidate sections.
+
+## Quality Checklist
+
+Before delivering any artifact, verify:
+
+| Check | Criteria |
+| :--- | :--- |
+| **Completeness** | Every template section is filled with codebase-specific detail, not placeholders. |
+| **Accuracy** | File paths, function names, and parameter descriptions match the actual code. |
+| **Gotchas** | At least 2-3 non-obvious behaviors, historical fixes, or race conditions documented. |
+| **Visuals** | At least one Mermaid diagram per artifact. |
+| **Self-contained** | A reader with no prior context can understand the subsystem from the artifact alone. |
+
+## Bundled Resources
+
+| Resource | Path | When to Read |
+| :--- | :--- | :--- |
+| Recon Checklist | `references/recon-checklist.md` | At the start of Phase 1 |
+| Deep-Dive Methodology | `references/deep-dive-methodology.md` | At the start of each Phase 2 topic |
+| Artifact Template | `templates/knowledge_artifact.md` | At the start of Phase 3 for each topic |
diff --git a/skills/codebase-knowledge-builder/references/deep-dive-methodology.md b/skills/codebase-knowledge-builder/references/deep-dive-methodology.md
@@ -0,0 +1,56 @@
+# Deep-Dive Methodology
+
+## Table of Contents
+
+1. File Reading Strategy
+2. Tracing Patterns
+3. Note-Taking Protocol
+4. Common Subsystem Types
+
+## 1. File Reading Strategy
+
+Read files in dependency order, not alphabetical order. Start from the entry point of the subsystem being studied and follow imports outward. For each file:
+
+1. Read the module-level docstring or header comment first -- it often explains the "why."
+2. Identify the exported functions/classes -- these are the public API.
+3. Read the constructor or initialization logic -- this reveals dependencies.
+4. Read the primary execution method -- this is the core logic.
+5. Scan for error handling, edge cases, and commented-out code -- these reveal history.
+
+When a file is too long (>500 lines), use range-based reading. Start with lines 1-100 to get the imports and class definition, then jump to the method needed.
+
+## 2. Tracing Patterns
+
+For each subsystem, trace these three paths:
+
+**Happy Path**: The normal, successful execution flow from trigger to completion. This is the backbone of the artifact.
+
+**Error Path**: What happens when things go wrong. Look for try/catch blocks, error classes, fallback logic, and retry mechanisms.
+
+**Edge Cases**: Caching behavior, race conditions, concurrent access, timeout handling. These are the gotchas that make the difference between a junior and a senior developer.
+
+## 3. Note-Taking Protocol
+
+After every 2-3 files read, save key findings to a scratch file. Do not rely on context window memory alone. Structure notes as:
+
+```
+## [File Path]
+- Purpose: [one sentence]
+- Key functions: [list]
+- Calls: [what it calls]
+- Called by: [what calls it]
+- Gotchas: [non-obvious behavior]
+```
+
+## 4. Common Subsystem Types
+
+Different subsystem types require different investigation angles:
+
+| Subsystem Type | Primary Focus | Key Questions |
+| :--- | :--- | :--- |
+| **Request Pipeline** | Middleware chain, request/response transformation | What is the middleware order? What gets injected at each stage? |
+| **Agent/LLM System** | Model loading, prompt assembly, tool binding | How are models selected? How are prompts composed? What middleware wraps the LLM? |
+| **Streaming/Real-time** | Event emission, section management, callback handlers | What events are emitted? How are sections structured? What handles backpressure? |
+| **Data Access Layer** | Connection pooling, query building, caching | How are connections managed? What caching strategy is used? How are schemas loaded? |
+| **Worker/Task System** | Task delegation, result aggregation, error propagation | How are tasks routed? How are results collected? What happens on failure? |
+| **Configuration System** | Config sources, override hierarchy, validation | What is the config precedence? How are defaults applied? What validates config? |
diff --git a/skills/codebase-knowledge-builder/references/recon-checklist.md b/skills/codebase-knowledge-builder/references/recon-checklist.md
diff --git a/skills/codebase-knowledge-builder/templates/knowledge_artifact.md b/skills/codebase-knowledge-builder/templates/knowledge_artifact.md

-Original file line number
+Diff line change
@@ @@ -0,0 +1,5 @@ @@
 +.DS_Store
 +Thumbs.db
 +*.swp
 +*.swo
 +*~