Stable, reusable, and observable code context infrastructure for AI agents
Hybrid Retrieval · Project Memory · MCP Server · Retrieval Observability
简体中文 · Docs · First Use · Deployment · CLI · MCP
ContextAtlas is an open-source context infrastructure for AI coding agents — providing hybrid code retrieval, project memory, and retrieval observability as a CLI, MCP server, or embeddable library. It combines tree-sitter semantic chunking, LanceDB vector search, SQLite FTS5 full-text search, and token-aware context packing to deliver structured, high-quality code context to tools like Claude Code, Codex, and custom agent workflows.
2026-04-10:codebase-retrievalnow includes the lightweight direct graph summary by default, with MCP metadata, tests, and changelog docs updated in sync.2026-04-06: tightened the default user path, memory governance, and operational visibility to make first use, feedback loops, and health checks clearer.2026-04-07: improved the indexing pipeline with lighter planning, snapshot copy reduction, queue observability, fallback hardening, and repeatable benchmarks.2026-04-08: added the embedding gateway, local caching and multi-upstream routing, plus Hugging Face integration and MCP context lifecycle tools.2026-04-09: added churn / cost-aware index planning, moved long-term memory into dedicated tables + FTS5, and finished default-path hardening, threshold configuration, ops alert threshold alignment, and doc sync.
- Why ContextAtlas
- Where it fits
- Core Capabilities
- Tech Stack
- Installation
- Configuration
- Quick Start
- Integration Modes
- Architecture Overview
- Documentation
- Friendly Links
- License
ContextAtlas is not just a code search tool. It addresses a more practical engineering problem:
- can an agent find the right code faster in a large repository?
- can repository understanding be persisted instead of rediscovered every session?
- can retrieval, indexing, and memory quality be observed and improved over time?
If you are building Claude Code workflows, MCP clients, or custom agent systems, ContextAtlas provides a context infrastructure layer: retrieval, memory, context packing, and observability.
In real projects, agent failures are often not caused by a weak model. They come from weak context systems:
- the relevant code is not found
- the returned code is too fragmented and lacks surrounding context
- the same module has to be re-understood again and again
- indexes become stale, retrieval quality degrades, and token budgets get exhausted without clear signals
ContextAtlas turns this into a composable set of capabilities:
- Find: hybrid retrieval narrows down the relevant implementation
- Expand: graph expansion and token packing turn hits into usable local context
- Store: project memory, long-term memory, and a cross-project hub preserve knowledge
- Observe: health checks, telemetry, usage analysis, and alerts make the system diagnosable
- As a repository retrieval backend for coding agents
- As an MCP server for external clients that need code retrieval and memory tools
- As a local CLI / skill backend for scripts, CI, and workflow automation
- As a cross-project knowledge layer for reusable module knowledge and decision history
| Capability | Description |
|---|---|
| Hybrid Retrieval | Vector recall + FTS lexical recall + RRF fusion + rerank |
| Context Expansion | Local context expansion based on neighbors, breadcrumbs, and imports |
| Token-aware Packing | Keeps the highest-value context inside a limited token budget |
| Project Memory | Feature Memory, Decision Record, and Project Profile |
| Long-term Memory | Rules, preferences, and external references that cannot be derived reliably from code |
| Cross-project Hub | Reuse module memories, dependency chains, and relations across repositories |
| Async Indexing | SQLite queue + daemon consumer + atomic snapshot switch |
| Observability | Retrieval monitor, usage report, index health, memory health, and alert evaluation |
ContextAtlas decides what context to provide, not how the task should be executed. It does not handle agent reasoning, workflow orchestration, or business API actions.
- TypeScript / Node.js 20+
- Tree-sitter for semantic chunking
- SQLite + FTS5 for metadata, retrieval, queues, and memory hub storage
- LanceDB for vector storage
- Model Context Protocol SDK for MCP integration
npm install -g @codefromkarl/context-atlasProduct identity mapping:
- Repository:
ContextAtlas - npm package:
@codefromkarl/context-atlas - CLI command:
contextatlas
Available commands:
contextatlascw(short alias)
The docs use contextatlas as the primary command name. cw remains as a compatibility alias.
Initialize the config directory and example environment file first:
contextatlas init
contextatlas setup:localDefault config file location:
~/.contextatlas/.envMinimum required configuration:
EMBEDDINGS_API_KEY=
EMBEDDINGS_BASE_URL=
EMBEDDINGS_MODEL=
RERANK_API_KEY=
RERANK_BASE_URL=
RERANK_MODEL=Index update planning also supports these optional knobs:
INDEX_UPDATE_CHURN_THRESHOLD=0.35
INDEX_UPDATE_COST_RATIO_THRESHOLD=0.65
INDEX_UPDATE_MIN_FILES=8
INDEX_UPDATE_MIN_CHANGED_FILES=5INDEX_UPDATE_CHURN_THRESHOLD: when the changed-file ratio crosses this value,index:plan/index:updatewill favorfullINDEX_UPDATE_COST_RATIO_THRESHOLD: triggersfullwhen the estimated incremental cost is close to a full rebuildINDEX_UPDATE_MIN_FILES/INDEX_UPDATE_MIN_CHANGED_FILES: require both repo size and change size to clear a minimum bar before escalation is allowed
initwrites an editable example.env, including default SiliconFlow endpoints and recommended model settings.setup:localalso wires Claude / Codex / Gemini MCP configs, prompt docs, and the Codex ContextAtlas skill in one step.
If you are onboarding for the first time, start with the First use guide.
contextatlas start /path/to/repocontextatlas init
# edit ~/.contextatlas/.envcontextatlas index /path/to/repocontextatlas search \
--repo-path /path/to/repo \
--information-request "How is the authentication flow implemented?"contextatlas daemon startcontextatlas mcpUseful for:
- custom agent skills
- shell workflows and CI scripts
- local debugging and retrieval analysis
Example:
# retrieval
contextatlas search --repo-path /path/to/repo --information-request "Where is the payment retry policy implemented?"
# project memory
contextatlas memory:find "search"
contextatlas decision:list
# health
contextatlas health:fullUseful for:
- desktop clients that support MCP
- agent systems that need standard tool-based access to ContextAtlas capabilities
Claude Desktop configuration example:
{
"mcpServers": {
"contextatlas": {
"command": "contextatlas",
"args": ["mcp"]
}
}
}ContextAtlas MCP tools cover:
- code retrieval
- project memory
- long-term memory
- cross-project hub operations
- auto-recording and memory suggestion flows
Indexing: Crawler / Scanner → Chunking → Indexing → Vector / SQLite Storage
Retrieval: Vector + FTS Recall → RRF → Rerank → Graph Expansion → Context Packing
Memory: Project Memory / Long-term Memory / Hub → CLI / MCP Tools
ContextAtlas focuses on what context to provide, not how the task should be executed. For a fuller architecture explanation, see repository positioning and engineering positioning.
| Document | Purpose |
|---|---|
| Docs index | Unified entry for stable docs, plans, changelog, and archived delivery material |
| 2026-04-10 update summary | Default-on graph context for codebase retrieval plus MCP/docs sync |
| First use guide | Fast onboarding path for the default contextatlas loop |
| 2026-04-06 update summary | Summary of the new main path, memory governance, operations, release gate, and team metrics |
| 2026-04-07 update summary | Summary of the seven indexing phases covering lightweight planning, snapshot copy reduction, health repair, observability, fallback hardening, storage trimming, and benchmarks |
| 2026-04-09 update summary | Summary of index planning thresholds, long-term memory table split, and delivery sync |
| Deployment guide | Installation, deployment patterns, MCP integration, operations |
| CLI reference | CLI commands, categories, and examples |
| MCP reference | MCP tools, parameters, and calling patterns |
| Project memory guide | Feature Memory, Decision Record, and Catalog routing |
| Latest delivery bundle | Bundled handoff package for the latest verified delivery |
| Repository positioning | Repository role, design thinking, and system boundaries |
| Engineering positioning | Where ContextAtlas fits in harness engineering |
| Product roadmap | Future versions and product direction |
Ways to improve ContextAtlas:
- open issues for bugs or documentation gaps
- submit PRs for retrieval, memory, monitoring, or documentation improvements
- contribute real-world usage patterns, deployment notes, and benchmark data
- improve README, CLI docs, and MCP examples
Before submitting code, it helps to:
- run
pnpm buildand make sure the repo still builds - keep command examples, README, and docs aligned with the implementation
- update functionality, documentation, and operational notes together when possible
pnpm build
pnpm build:release
pnpm dev
node dist/index.jsMIT
