Skip to content

fix(#1839): add technical doc accuracy to correctness sub-agent#1840

Open
fullsend-ai-coder[bot] wants to merge 1 commit into
mainfrom
agent/1839-correctness-technical-docs
Open

fix(#1839): add technical doc accuracy to correctness sub-agent#1840
fullsend-ai-coder[bot] wants to merge 1 commit into
mainfrom
agent/1839-correctness-technical-docs

Conversation

@fullsend-ai-coder
Copy link
Copy Markdown

The correctness sub-agent was declaring "zero correctness surface area" on documentation-only PRs, even when those documents contained implementation plans with verifiable technical claims (algorithm descriptions, pseudocode, CLI flag semantics, API behavior claims). Human reviewers on PR #1804 found 9 confirmed technical accuracy issues that the bot missed.

Changes:

  • Updated the correctness sub-agent definition to own technical
    accuracy in implementation plans and design documents, with
    specific evaluation guidance for algorithm logic, API/library
    behavior claims, design document alignment, internal
    consistency, and edge case correctness.
  • Updated SKILL.md section 3b to classify docs/plans/ files and
    technical documentation as having correctness surface area,
    ensuring the correctness sub-agent is dispatched for such PRs.
  • Added an implementation plan example to the dispatch table.

Note: make lint could not run (sandbox Go toolchain permission error unrelated to these markdown-only changes). Pre-commit encountered the same infrastructure error (exit code 3). The post-script runs authoritative pre-commit on the runner.


Closes #1839

Post-script verification

  • Branch is not main/master (agent/1839-correctness-technical-docs)
  • Secret scan passed (gitleaks — 1088f9b74b9ed046b902bf25e6ce4204339c99ee..HEAD)
  • Pre-commit hooks passed (authoritative run on runner)
  • Tests ran inside sandbox

The correctness sub-agent was declaring "zero correctness surface
area" on documentation-only PRs, even when those documents
contained implementation plans with verifiable technical claims
(algorithm descriptions, pseudocode, CLI flag semantics, API
behavior claims). Human reviewers on PR #1804 found 9 confirmed
technical accuracy issues that the bot missed.

Changes:
- Updated the correctness sub-agent definition to own technical
  accuracy in implementation plans and design documents, with
  specific evaluation guidance for algorithm logic, API/library
  behavior claims, design document alignment, internal
  consistency, and edge case correctness.
- Updated SKILL.md section 3b to classify docs/plans/ files and
  technical documentation as having correctness surface area,
  ensuring the correctness sub-agent is dispatched for such PRs.
- Added an implementation plan example to the dispatch table.

Note: make lint could not run (sandbox Go toolchain permission
error unrelated to these markdown-only changes). Pre-commit
encountered the same infrastructure error (exit code 3). The
post-script runs authoritative pre-commit on the runner.

Closes #1839

Signed-off-by: fullsend-code <fullsend-code@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Site preview

Preview: https://c7e4c249-site.fullsend-ai.workers.dev

Commit: 542ab8c498a4d488ac3b40b41087a9b13746cca8

@fullsend-ai-review
Copy link
Copy Markdown

Review

Findings

Medium

  • [edge-case] internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md:196 — The category table in section 3a was not updated with documentation-accuracy categories. The correctness sub-agent now owns technical accuracy in implementation plans, but the re-review routing table only lists code-oriented categories (logic-error, nil-deref, off-by-one, etc.). New categories the sub-agent may produce (e.g., algorithm-error, api-claim-incorrect, design-inconsistency) will only route correctly via the fallback rule ("to correctness as a fallback"). If the fallback rule is ever changed, these findings would be misrouted during re-reviews.
    Remediation: Add documentation-accuracy categories (e.g., algorithm-error, api-claim-incorrect, design-inconsistency, edge-case-gap) to the correctness row of the category table in section 3a.

  • [incomplete-doc] docs/problems/code-review.md:55 — The Correctness agent section in the code-review problem doc describes the sub-agent's scope (logic errors, edge cases, test adequacy, split-payload attacks) but does not mention the new responsibility for technical documentation with correctness surface area. This doc is now stale relative to the expanded correctness.md definition.
    Remediation: Add a bullet noting that technical documentation with correctness surface area (algorithm logic, API behavior claims, design documents under docs/plans/) is also reviewed by the correctness sub-agent.

Low

  • [pattern-inconsistency] internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md:215 — The new classification bullet uses an em dash (—) as an explanatory aside before the arrow (→), a pattern not used in the other bullets in this list. Minor stylistic inconsistency; the em dash serves a legitimate clarification purpose.

  • [design-direction] internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md:214 — The new classification criterion introduces content-based detection patterns ("algorithm descriptions, pseudocode, data structure definitions") but existing dispatch triage in section 3b is primarily domain-based (file paths, changed symbols). It is unclear whether the orchestrator should inspect file contents or rely solely on the docs/plans/ path prefix. See also: [edge-case] finding at SKILL.md:196.

  • [code-organization] internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md:253 — The new dispatch example "Implementation plan in docs/" is inserted at the top of the table. The existing table appears roughly ordered by complexity. Consider placing it after "Typo fix in README" since both are documentation-focused.

  • [pattern-inconsistency] internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/correctness.md:14 — The Own: section now mixes parenthetical clarifying questions with a declarative addition, slightly breaking parallel structure.

Info

  • [architectural-conflict] internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md:13 — Pre-existing ADR-0018 deviation. The skill already documents this as an "approved temporary exception." Not introduced by this PR.

  • [api-contract] internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/correctness.md:22 — The guidance to "cross-check against known behavior" for API/library claims relies on model training knowledge, which has a cutoff date. For claims about internal APIs, tool-based verification against repo source would be more reliable.

@fullsend-ai-review fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

requires-manual-review Review requires human judgment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review correctness sub-agent should evaluate technical accuracy in implementation plans

0 participants