AperturePlus · AperturePlus · Feb 22, 2026 · Feb 21, 2026 · Feb 21, 2026 · Feb 21, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,172 @@
+# AGENTS.md - Engineering and Agent Governance for ACI
+
+This file defines mandatory engineering rules for this repository.
+
+Applies to:
+- Human contributors
+- AI coding agents
+
+## 1. Purpose and Scope
+
+This document is the single governance contract for:
+- Requirements
+- Design
+- Programming
+- Testing
+- CI/CD
+- Git workflow
+- AI safety constraints
+- AI commit discipline
+
+If a change conflicts with this document, this document takes precedence unless a maintainer explicitly overrides it in the task context.
+
+## 2. Roles and Audience
+
+- Humans MUST follow these standards when planning, coding, testing, and reviewing.
+- AI agents MUST follow these standards when proposing, editing, validating, and reporting changes.
+
+## 3. Requirement Engineering Standards
+
+- Every non-trivial task MUST define:
+  - Objective
+  - In-scope
+  - Out-of-scope
+  - Acceptance criteria
+- Bugfixes MUST include:
+  - Reproducible symptom
+  - Expected behavior
+- If acceptance criteria are undefined, implementation MUST NOT start.
+
+## 4. Design Standards
+
+- Non-trivial changes MUST include a brief design rationale in PR/task notes.
+- Design rationale MUST state:
+  - Selected approach
+  - Alternatives considered
+  - Tradeoffs
+- Layering MUST be respected:
+  - `core`: domain logic and cross-cutting primitives
+  - `infrastructure`: external systems and adapters
+  - `services`: orchestration and business workflows
+  - `cli` / `http` / `mcp`: entrypoint adapters only
+
+## 5. Programming Standards
+
+- New code MUST include type annotations.
+- Error handling and logs MUST preserve root causes and actionable context.
+- Security-sensitive data MUST NOT be hardcoded or logged.
+- Changes SHOULD minimize diff surface; avoid opportunistic refactors.
+- Existing repository conventions MUST be preserved unless the task explicitly requests a redesign.
+
+## 6. Testing Standards
+
+- Logic changes MUST include tests appropriate to the change:
+  - Unit tests
+  - Property tests
+  - Integration tests
+- Flaky or non-deterministic tests MUST NOT be merged.
+- Minimum local validation before PR:
+  - `uv run ruff check src tests`
+  - `uv run pytest tests/ -v --tb=short -q --durations=10`
+- Type checking SHOULD be run and reviewed:
+  - `uv run mypy src --ignore-missing-imports --no-error-summary`
+
+## 7. CI/CD Standards
+
+- CI source of truth:
+  - `.github/workflows/test.yml`
+  - `.github/workflows/release.yml`
+- Required CI commands are:
+  - `uv run ruff check src tests`
+  - `uv run pytest tests/ -v --tb=short -q --durations=10`
+- Current mypy execution in CI is informational/non-blocking:
+  - `uv run mypy src --ignore-missing-imports --no-error-summary || true`
+- Release tags MUST follow `v*`.
+- A release is marked prerelease when tag name contains `-`.
+- PRs MUST NOT merge with failing required checks.
+
+## 8. Git Workflow and PR Standards
+
+- Workflow MUST be Trunk-based development + PR.
+- Commit messages MUST follow Conventional Commits with scope:
+  - `type(scope): short imperative summary`
+- Allowed types:
+  - `feat`, `fix`, `refactor`, `test`, `docs`, `chore`, `ci`, `perf`
+- Branch naming SHOULD use:
+  - `feat/<topic>`
+  - `fix/<topic>`
+  - `chore/<topic>`
+  - `docs/<topic>`
+- Every PR MUST include:
+  - What changed
+  - Why
+  - Test evidence (commands + outcomes)
+  - Risk/rollback notes for non-trivial changes
+
+## 9. AI Safety and Dangerous-Command Policy
+
+### 9.1 Strict Blocking Rules
+
+AI MUST NOT run destructive or high-risk commands.
+
+Default prohibited patterns:
+- `rm -rf`
+- `del /s /q`
+- `format`
+- `mkfs`
+- `dd`
+- shutdown/reboot commands
+- recursive permission/ownership changes outside task scope
+- `git reset --hard`
+- `git clean -fdx`
+- `git checkout -- .`
+- force push to protected branches
+
+### 9.2 Additional Safety Constraints
+
+- AI MUST NOT modify files outside repository task scope without explicit user instruction.
+- AI MUST explicitly call out risk before any potentially destructive operation.
+- AI MUST prefer reversible operations.
+- AI MUST preserve unrelated local changes.
+
+## 10. AI Commit Discipline and Commit Message Standard
+
+### 10.1 Milestone-Based Commit Discipline
+
+- AI MUST commit at each verifiable milestone, not only at final completion.
+- A milestone is:
+  - A complete logical subtask
+  - With relevant checks passing
+  - With coherent rollback boundaries
+- AI MUST avoid oversized mixed-purpose commits.
+- AI SHOULD keep one intent per commit.
+
+### 10.2 AI Commit Message Template
+
+Header (required):
+- `type(scope): short imperative summary`
+
+Body (required):
+- `Why:` context/problem
+- `What:` key changes
+- `Test:` commands executed and outcomes
+
+Optional footer:
+- `BREAKING CHANGE:` when applicable
+
+Example:
+- `fix(indexing): retry qdrant upsert on transient timeout`
+
+## 11. Definition of Done (DoD) Checklist
+
+Before marking work complete, ALL applicable items MUST be satisfied:
+
+- Requirements are explicit and testable
+- Design rationale is captured for non-trivial changes
+- Code is scoped and consistent with project architecture
+- Tests are added/updated and passing
+- Lint checks are passing
+- CI/CD impact is considered
+- Documentation is updated when behavior changes
+- Commit/PR metadata follows policy
+
diff --git a/README.md b/README.md
@@ -1,6 +1,7 @@
 # Project ACI - Augmented Codebase Indexer
 
 Language: **English** | [简体中文](doc/README.zh-CN.md)
+Development governance: see [AGENTS.md](AGENTS.md)
 
 A Python tool for semantic code search with precise line-level location results.
 

diff --git a/doc/CHUNKING_ALGORITHM.zh-CN.md b/doc/CHUNKING_ALGORITHM.zh-CN.md
@@ -0,0 +1,109 @@
+# Chunking 算法原理（当前实现）
+
+本文基于 `src/aci/core/chunker` 的当前代码实现，说明 ACI 在索引阶段如何把源码切分为可检索片段（chunks）。
+
+## 1. 总体流程
+
+`Chunker.chunk(file, ast_nodes)` 的主流程：
+
+1. 先按语言抽取 import 列表（写入每个 chunk 的 metadata）。
+2. 若存在 AST 节点：走 **语义切分（AST-based）**。
+3. 若无 AST 节点：走 **固定行数切分（fixed-size fallback）**。
+4. 若配置了 `summary_generator`，并行产出 function/class/file summary artifact。
+
+## 2. AST 语义切分（优先路径）
+
+当解析器能产出 AST 节点时：
+
+- 每个 AST 节点（`function/class/method`）默认作为一个 chunk 候选。
+- metadata 会补充结构化信息：
+  - `function_name`
+  - `class_name`
+  - `parent_class`（method 场景）
+  - `imports`、`file_hash`、`language`
+- 若节点有 docstring，会先规范化，再以分隔符拼到 chunk 内容前缀，提高语义可检索性。
+
+### Token 上限控制
+
+对每个候选节点：
+
+- `token_count <= max_tokens`：直接生成单个 chunk。
+- `token_count > max_tokens`：交给 `SmartChunkSplitter` 做智能拆分。
+
+## 3. SmartChunkSplitter 智能拆分策略
+
+目标：在 token 约束下尽量不破坏代码语法/语义边界。
+
+### 3.1 拆分优先级
+
+在一个超大节点内部，优先在这些位置切分：
+
+1. 空行
+2. 语句边界（`def/class/if/for/while/try/except/return/...` 模式）
+3. 缩进较低的行（块边界）
+4. 实在不行按可容纳最大范围切
+
+### 3.2 如何找“可容纳最大范围”
+
+- 通过二分法 `_find_max_end_index` 找从 `start_idx` 开始，token 不超限的最远 `end_idx`。
+- 再在 `[start_idx, end_idx]` 区间回溯挑“最佳切点”。
+
+### 3.3 上下文补偿
+
+拆分后会给后续子块加上下文前缀，避免脱离语境：
+
+- 方法：`# Context: class <Parent>`
+- 函数：`# Context: function <Name>`
+- 类：`# Context: class <Name>`
+
+此外：
+
+- docstring 前缀只附加在首个子块。
+- metadata 里标记 `is_partial / part_index / total_parts` 等字段。
+
+## 4. 固定行数切分（fallback）
+
+当某语言暂不支持 AST（或 AST 为空）时：
+
+- 以 `fixed_chunk_lines`（默认 50 行）分块。
+- 相邻块保留 `overlap_lines`（默认 5 行）重叠，降低跨块语义断裂。
+- 每块仍会做 token 校验；若超限，持续从块尾减行直到不超限（至少保留 1 行）。
+- chunk 类型标记为 `fixed`。
+
+## 5. Import 抽取策略
+
+chunking 前会先提取 import，并写入 metadata：
+
+- Python：识别 `import ...` / `from ...`
+- JS/TS：识别 `import ...` 和 `const ... require(...)`
+- Go：支持 `import (...)` 块和单行 import
+- 其他语言：空实现（返回空列表）
+
+这让检索和后续总结模型可利用依赖上下文。
+
+## 6. 输出数据形态
+
+最终 `ChunkingResult` 包含两类产物：
+
+- `chunks: list[CodeChunk]`
+- `summaries: list[SummaryArtifact]`
+
+其中 `CodeChunk` 是索引主对象，带有：
+
+- 行号范围（1-based，含结束行）
+- 原始/拆分后的内容
+- chunk 类型（`function/class/method/fixed`）
+- metadata（含 imports、符号名、分片标记等）
+
+## 7. 设计取舍总结
+
+当前算法是“**语义优先 + token 兜底 + 行切分回退**”的混合方案：
+
+- 优点：
+  - 尽量对齐语言结构（函数/类/方法），检索粒度更自然。
+  - 超大节点可智能拆分，并保留上下文，降低语义损失。
+  - 对不支持 AST 的语言仍可工作（工程可用性高）。
+- 潜在限制：
+  - 语句边界模式目前偏 Python 风格正则，对其他语言并非完全精确。
+  - 固定切分路径主要靠行数和重叠，语义一致性弱于 AST 路径。
+