Skip to content

Commit bee4a9b

Browse files
committed
Initial release: codebase-knowledge-builder skill
Agent skill that systematically studies any repository and produces structured knowledge artifacts. Four-phase workflow: reconnaissance, deep-dive study, artifact authoring, and delivery. Includes recon checklist, deep-dive methodology reference, and knowledge artifact template. Co-authored-by: OthmanAdi <othmanadi@users.noreply.github.com>
0 parents  commit bee4a9b

File tree

11 files changed

+465
-0
lines changed

11 files changed

+465
-0
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
name: Bug report
3+
about: Something isn't working as expected
4+
title: ''
5+
labels: bug
6+
assignees: ''
7+
---
8+
9+
## What happened
10+
11+
Describe what went wrong. Be specific.
12+
13+
## What you expected
14+
15+
What should have happened instead.
16+
17+
## Steps to reproduce
18+
19+
1. Installed the skill via `npx skills add OthmanAdi/codebase-knowledge-builder`
20+
2. Asked the agent to "..."
21+
3. ...
22+
23+
## Environment
24+
25+
- **Agent**: (Claude Code / Cursor / OpenCode / other)
26+
- **Agent version**:
27+
- **OS**:
28+
- **Target codebase language/framework**:
29+
30+
## Additional context
31+
32+
Error messages, screenshots, or the artifact output (if relevant).
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
name: Feature request
3+
about: Suggest an improvement to the skill
4+
title: ''
5+
labels: enhancement
6+
assignees: ''
7+
---
8+
9+
## Problem
10+
11+
What's missing or inconvenient? Describe the situation, not just a solution.
12+
13+
## Proposed solution
14+
15+
How would you solve it? If you're not sure, that's fine -- just describe the problem clearly.
16+
17+
## Alternatives considered
18+
19+
Other approaches you thought about and why they didn't fit.
20+
21+
## Additional context
22+
23+
Links, screenshots, or examples from other skills that handle this well.

.github/pull_request_template.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
## What this PR does
2+
3+
Describe the change in 1-3 sentences.
4+
5+
## Why
6+
7+
What problem does this solve? Link to an issue if there is one.
8+
9+
## What changed
10+
11+
- [ ] SKILL.md
12+
- [ ] References (`references/`)
13+
- [ ] Template (`templates/`)
14+
- [ ] README or docs
15+
- [ ] Repo config (CI, templates, etc.)
16+
17+
## Checklist
18+
19+
- [ ] SKILL.md is under 3,000 words
20+
- [ ] SKILL.md uses imperative form (no "you should")
21+
- [ ] Description uses third person ("This skill should be used when...")
22+
- [ ] All referenced files exist
23+
- [ ] Tested with at least one real codebase
24+
- [ ] No placeholder text left in templates

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
.DS_Store
2+
Thumbs.db
3+
*.swp
4+
*.swo
5+
*~

CONTRIBUTING.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Contributing
2+
3+
Thanks for considering a contribution. Here's how to do it well.
4+
5+
## Reporting bugs
6+
7+
Open an issue using the **Bug report** template. Include:
8+
9+
- What you expected to happen
10+
- What actually happened
11+
- The agent you're using (Claude Code, Cursor, OpenCode, etc.)
12+
- Any error messages or unexpected output
13+
14+
## Suggesting improvements
15+
16+
Open an issue using the **Feature request** template. Describe the problem you're solving, not just the solution you want. Context helps.
17+
18+
## Pull requests
19+
20+
1. Fork the repo and create a branch from `main`
21+
2. Make your changes
22+
3. Make sure the SKILL.md still passes [skillcheck](https://getskillcheck.com) validation
23+
4. Open a PR using the pull request template
24+
5. Describe what changed and why
25+
26+
### What makes a good PR
27+
28+
- One concern per PR. Don't bundle unrelated changes.
29+
- If you're modifying SKILL.md, keep it under 3,000 words. Move detailed content to `references/`.
30+
- If you're adding a reference file, make sure SKILL.md points to it in the Bundled Resources table.
31+
- Test with at least one real codebase before submitting.
32+
33+
### Style
34+
35+
- SKILL.md uses imperative form ("Run the command", not "You should run the command")
36+
- SKILL.md description uses third person ("This skill should be used when...")
37+
- README and human-facing docs: write like a person, not a press release
38+
39+
## Code of conduct
40+
41+
Be direct, be helpful, don't waste people's time. That covers it.

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 Ahmad Othman Ammar Adi (OthmanAdi)
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# codebase-knowledge-builder
2+
3+
An agent skill that studies any repository and produces structured knowledge artifacts. Drop it into Claude Code, Cursor, OpenCode, or any agent that supports the [agentskills](https://agentskills.io) spec, point it at a codebase, and get back documentation that actually helps.
4+
5+
[![skillcheck passed](https://raw.githubusercontent.com/olgasafonova/skillcheck-free/main/skill-check/passed.svg)](https://getskillcheck.com)
6+
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
7+
[![skills.sh](https://img.shields.io/badge/skills.sh-codebase--knowledge--builder-black)](https://skills.sh/othmanadi/codebase-knowledge-builder)
8+
9+
## What it does
10+
11+
Most agents forget what they read three files ago. This skill fixes that by following a four-phase process:
12+
13+
1. **Reconnaissance** -- scan the repo structure, identify the tech stack, map module boundaries
14+
2. **Deep-dive study** -- trace happy paths, error paths, and edge cases through each subsystem
15+
3. **Artifact authoring** -- fill a structured template covering architecture, key functions, gotchas, and Mermaid diagrams
16+
4. **Delivery** -- hand back self-contained Markdown artifacts that any developer (or agent) can read cold
17+
18+
The output is a set of knowledge artifacts. Each one covers a single subsystem and stands on its own. No prior context needed.
19+
20+
## When to use it
21+
22+
- Onboarding onto an unfamiliar codebase
23+
- Producing documentation for a repo that has none
24+
- Preparing knowledge files so other agents can work on the project without re-reading everything
25+
- Studying a specific subsystem (auth, database layer, API routing, etc.) in depth
26+
27+
## Install
28+
29+
```bash
30+
npx skills add OthmanAdi/codebase-knowledge-builder
31+
```
32+
33+
Or manually: copy the `skills/codebase-knowledge-builder/` directory into your agent's skills folder.
34+
35+
## What's inside
36+
37+
```
38+
skills/codebase-knowledge-builder/
39+
SKILL.md # Skill definition and workflow
40+
references/
41+
recon-checklist.md # Phase 1 checklist
42+
deep-dive-methodology.md # File reading and tracing strategies
43+
templates/
44+
knowledge_artifact.md # Output template for each subsystem
45+
```
46+
47+
The SKILL.md stays lean (~80 lines). Detailed methodology lives in `references/` and only gets loaded when needed. The template in `templates/` defines the exact structure of every knowledge artifact the skill produces.
48+
49+
## Example output
50+
51+
After running the skill on a Node.js API, each artifact includes:
52+
53+
- Architecture overview with design pattern identification
54+
- Key components table (component, file path, responsibility)
55+
- Step-by-step data and control flow
56+
- Key functions table with parameters and return values
57+
- Configuration and environment variable mapping
58+
- Gotchas and pitfalls (race conditions, caching quirks, historical fixes)
59+
- Extension points for adding new functionality
60+
- Mermaid diagrams for visual flow
61+
62+
## How it works under the hood
63+
64+
The skill uses progressive disclosure. When an agent triggers it, only the SKILL.md body loads into context (~600 words). The references and template load on demand during each phase. This keeps the context window clean for the actual codebase files being studied.
65+
66+
Scratch files (`recon_findings.md`, per-file notes) are saved during study so the agent doesn't lose findings as it reads more files. The quality checklist at the end catches incomplete sections, missing diagrams, and placeholder text before delivery.
67+
68+
## Contributing
69+
70+
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on submitting issues and pull requests.
71+
72+
## License
73+
74+
[MIT](LICENSE)
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
name: codebase-knowledge-builder
3+
description: >-
4+
Deep-dive into any codebase and produce structured knowledge artifacts
5+
that turn a coding agent into a codebase specialist. This skill should be
6+
used when the user asks to "study this repo", "understand this codebase",
7+
"document this project", "onboard me onto this code", "create codebase
8+
knowledge", "map this architecture", or when asked to produce knowledge
9+
artifacts for any agent working on an unfamiliar repository.
10+
version: 1.0.0
11+
---
12+
13+
# Codebase Knowledge Builder
14+
15+
Transform from a generalist into a codebase specialist by systematically studying a repository and producing high-quality knowledge artifacts. The process follows a strict "read first, write later" principle across four sequential phases.
16+
17+
## Prerequisites
18+
19+
- File read access to the target repository (cloned locally or accessible via tools)
20+
- Bash access for file counting and structure discovery
21+
- Write access to produce scratch files and final artifacts
22+
23+
## Workflow
24+
25+
1. **Reconnaissance** -- Build a broad mental model of the entire repo
26+
2. **Deep-Dive Study** -- Investigate each requested topic in isolation
27+
3. **Artifact Authoring** -- Synthesize findings into polished knowledge artifacts
28+
4. **Delivery** -- Package and deliver artifacts to the user
29+
30+
---
31+
32+
### Phase 1: Reconnaissance
33+
34+
Clone the repo and build a high-level map before touching any specific topic.
35+
36+
1. Run `find . -type f -name '*.js' -o -name '*.ts' -o -name '*.py' | head -50` and `wc -l` to gauge scale.
37+
2. Read the main entry point file end-to-end.
38+
3. Follow the checklist in `references/recon-checklist.md` to systematically discover architecture, entry points, config systems, and key abstractions.
39+
4. Save a structured summary to a scratch file (`recon_findings.md`) with: tech stack, directory map, module responsibilities, design patterns, and open questions.
40+
41+
**Do not proceed to Phase 2 until the repo's architecture can be described in one paragraph.**
42+
43+
### Phase 2: Deep-Dive Study
44+
45+
For each topic the user requests, perform a focused investigation. Study each topic **separately** -- do not mix concerns.
46+
47+
1. Read `references/deep-dive-methodology.md` for file reading strategies, tracing patterns, and note-taking protocol.
48+
2. Start from the subsystem's entry point and follow imports outward (dependency order, not alphabetical).
49+
3. Trace three paths per subsystem: **happy path**, **error path**, **edge cases**.
50+
4. After every 2-3 files, save key findings to a scratch file. Do not rely on context memory alone.
51+
5. For each file, capture: purpose (one sentence), key functions, what it calls, what calls it, and gotchas.
52+
53+
### Phase 3: Artifact Authoring
54+
55+
Synthesize each topic's findings into a standalone knowledge artifact.
56+
57+
1. Copy the template from `templates/knowledge_artifact.md` for each topic.
58+
2. Fill **every section** -- Overview, Architecture, Key Components table, Data & Control Flow, Key Functions table, Configuration table, Gotchas, Extension Points, and Visual Flow diagram.
59+
3. Include Mermaid diagrams: use `sequenceDiagram` for flows, `graph TD` for architecture.
60+
4. Each artifact must be self-contained -- a developer reading only that artifact should understand the subsystem completely.
61+
62+
### Phase 4: Delivery
63+
64+
Attach all completed Markdown artifacts to a message to the user. Include a brief summary of what each artifact covers.
65+
66+
---
67+
68+
## Limitations
69+
70+
- Large monorepos (>10,000 files) may require scoping to specific directories or packages before starting reconnaissance.
71+
- Binary files, compiled assets, and vendored dependencies should be excluded from study.
72+
- Knowledge artifacts reflect the codebase at a point in time. Major refactors may invalidate sections.
73+
74+
## Quality Checklist
75+
76+
Before delivering any artifact, verify:
77+
78+
| Check | Criteria |
79+
| :--- | :--- |
80+
| **Completeness** | Every template section is filled with codebase-specific detail, not placeholders. |
81+
| **Accuracy** | File paths, function names, and parameter descriptions match the actual code. |
82+
| **Gotchas** | At least 2-3 non-obvious behaviors, historical fixes, or race conditions documented. |
83+
| **Visuals** | At least one Mermaid diagram per artifact. |
84+
| **Self-contained** | A reader with no prior context can understand the subsystem from the artifact alone. |
85+
86+
## Bundled Resources
87+
88+
| Resource | Path | When to Read |
89+
| :--- | :--- | :--- |
90+
| Recon Checklist | `references/recon-checklist.md` | At the start of Phase 1 |
91+
| Deep-Dive Methodology | `references/deep-dive-methodology.md` | At the start of each Phase 2 topic |
92+
| Artifact Template | `templates/knowledge_artifact.md` | At the start of Phase 3 for each topic |
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Deep-Dive Methodology
2+
3+
## Table of Contents
4+
5+
1. File Reading Strategy
6+
2. Tracing Patterns
7+
3. Note-Taking Protocol
8+
4. Common Subsystem Types
9+
10+
## 1. File Reading Strategy
11+
12+
Read files in dependency order, not alphabetical order. Start from the entry point of the subsystem being studied and follow imports outward. For each file:
13+
14+
1. Read the module-level docstring or header comment first -- it often explains the "why."
15+
2. Identify the exported functions/classes -- these are the public API.
16+
3. Read the constructor or initialization logic -- this reveals dependencies.
17+
4. Read the primary execution method -- this is the core logic.
18+
5. Scan for error handling, edge cases, and commented-out code -- these reveal history.
19+
20+
When a file is too long (>500 lines), use range-based reading. Start with lines 1-100 to get the imports and class definition, then jump to the method needed.
21+
22+
## 2. Tracing Patterns
23+
24+
For each subsystem, trace these three paths:
25+
26+
**Happy Path**: The normal, successful execution flow from trigger to completion. This is the backbone of the artifact.
27+
28+
**Error Path**: What happens when things go wrong. Look for try/catch blocks, error classes, fallback logic, and retry mechanisms.
29+
30+
**Edge Cases**: Caching behavior, race conditions, concurrent access, timeout handling. These are the gotchas that make the difference between a junior and a senior developer.
31+
32+
## 3. Note-Taking Protocol
33+
34+
After every 2-3 files read, save key findings to a scratch file. Do not rely on context window memory alone. Structure notes as:
35+
36+
```
37+
## [File Path]
38+
- Purpose: [one sentence]
39+
- Key functions: [list]
40+
- Calls: [what it calls]
41+
- Called by: [what calls it]
42+
- Gotchas: [non-obvious behavior]
43+
```
44+
45+
## 4. Common Subsystem Types
46+
47+
Different subsystem types require different investigation angles:
48+
49+
| Subsystem Type | Primary Focus | Key Questions |
50+
| :--- | :--- | :--- |
51+
| **Request Pipeline** | Middleware chain, request/response transformation | What is the middleware order? What gets injected at each stage? |
52+
| **Agent/LLM System** | Model loading, prompt assembly, tool binding | How are models selected? How are prompts composed? What middleware wraps the LLM? |
53+
| **Streaming/Real-time** | Event emission, section management, callback handlers | What events are emitted? How are sections structured? What handles backpressure? |
54+
| **Data Access Layer** | Connection pooling, query building, caching | How are connections managed? What caching strategy is used? How are schemas loaded? |
55+
| **Worker/Task System** | Task delegation, result aggregation, error propagation | How are tasks routed? How are results collected? What happens on failure? |
56+
| **Configuration System** | Config sources, override hierarchy, validation | What is the config precedence? How are defaults applied? What validates config? |

0 commit comments

Comments
 (0)