Skip to content

Include user comments and tool attempts in fix prompts#622

Merged
wesm merged 28 commits intoroborev-dev:mainfrom
shoyer:fix-skill-comments
Apr 5, 2026
Merged

Include user comments and tool attempts in fix prompts#622
wesm merged 28 commits intoroborev-dev:mainfrom
shoyer:fix-skill-comments

Conversation

@shoyer
Copy link
Copy Markdown
Contributor

@shoyer shoyer commented Apr 3, 2026

Summary

  • Thread user comments and prior tool attempts into fix prompts (CLI single-job, batch, daemon background fix, and refine), so fix agents can see developer feedback like false-positive flags and preferred approaches
  • Split responses into tool-generated (roborev-*) vs user-authored categories with distinct prompt framing via new SplitResponses/FormatToolAttempts/FormatUserComments helpers in the prompt package
  • Add GetAllCommentsForJob to merge job-linked and legacy commit-linked comments with dedup-by-ID, used consistently across daemon, CLI fix, CLI show, TUI, and the new HTTP client helper
  • Expose comments in roborev show --json output and update both claude and codex fix skills to document the new comments field and instruct agents to respect developer feedback
  • Consolidate 4 duplicate looksLikeSHA functions into a single exported git.LooksLikeSHA reusing the existing isHex helper; accepts mixed-case hex, 7-char minimum matching git's default abbreviation length

Stephan Hoyer and others added 20 commits April 3, 2026 14:16
The `roborev fix` command was ignoring user comments left on reviews
via the TUI, so the fix agent had no visibility into developer feedback
(e.g. false-positive flags or preferred approaches). The refine command
already included comments via GetCommentsForJob + BuildAddressPrompt,
but fix did not.

Add fetchComments() and formatComments() helpers and wire them into
both fixSingleJob (single-job path) and runFixBatch (batch path).
Update buildGenericFixPrompt and buildBatchFixPrompt to render a
"User Comments" section when comments are present. Adjust
batchEntrySize to account for comment text so prompt-size splitting
remains accurate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BuildAddressPrompt (used by refine) previously dumped all responses
under "Previous Addressing Attempts", which misframed user comments
like "this is a false positive" as failed fix attempts. Now responses
are split by Responder prefix: roborev-* entries stay under "Previous
Addressing Attempts", everything else goes under "User Comments" with
developer-feedback framing.

Add SplitResponses, FormatToolAttempts, and FormatUserComments to the
prompt package so both fix and refine share the same rendering. Remove
the local formatComments helper from fix.go in favor of the shared
prompt.FormatUserComments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix two issues found by review:

1. buildGenericFixPrompt and buildBatchFixPrompt were passing all
   responses through FormatUserComments, so roborev-fix/roborev-refine
   tool responses were mislabeled as user comments. Now both paths
   call SplitResponses first and render FormatToolAttempts +
   FormatUserComments separately.

2. fetchComments only fetched by job_id, missing legacy SHA-based
   comments. Mirror the TUI's loadResponses merge logic: when the
   git ref is a single commit, also fetch by SHA, dedupe by ID,
   and sort chronologically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ills

Three remaining gaps where comments were not surfaced:

1. Daemon buildFixPromptWithInstructions now fetches comments via
   GetCommentsForJob and splits them into tool attempts vs user
   comments, matching the CLI fix behavior.

2. `roborev show --json` now includes a `comments` array in the
   JSON output so skills and external tools can see developer
   feedback and prior tool responses.

3. Both claude and codex roborev-fix skills updated to document the
   new comments field and instruct agents to respect developer
   feedback (false positives, preferred approaches).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add storage.GetAllCommentsForJob helper that merges job-based and
legacy SHA-based comments with dedup and chronological sort. This
centralizes the merge logic that was duplicated across TUI, CLI fix,
and now needed by daemon fix and show --json.

Update all remaining callers:
- Daemon buildFixPromptWithInstructions uses GetAllCommentsForJob
- show --json fetchShowComments merges both paths
- Add batch prompt mixed-response test
- Add GetAllCommentsForJob tests covering merge, dedup, range/dirty skip

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GetAllCommentsForJob now accepts commitID (int64) instead of gitRef
(string) and calls GetCommentsForCommit directly, avoiding the
repo-ambiguous GetCommitBySHA lookup that could fail when the same
SHA exists in multiple repos.

Fix dedup test to actually exercise the dedup branch by inserting a
response with both job_id and commit_id set, so it appears in both
queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add debug log in daemon handleFixJob when GetAllCommentsForJob fails,
and a stderr warning in fetchShowComments when the HTTP request fails.
Both were silently swallowing errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All three callsites that implement the merge/dedup-by-ID/sort pattern
for comments now cross-reference each other so future changes to the
merge logic can be kept in sync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sync notes missed the original implementation in
cmd/roborev/tui/fetch.go loadResponses(). All four callsites
now cross-reference each other.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ests

GetAllCommentsForJob now accepts gitRef as a fallback for legacy jobs
that have no commit_id but do have a single-commit git_ref. Prefers
commit_id when available (unambiguous), falls back to SHA lookup only
when commit_id is zero.

Also:
- fetchShowComments now warns on SHA fetch failures (was silent)
- Dedup test asserts all three responders are present (not just length)
- Added test for SHA fallback when commitID is zero

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add commit_id support to the /api/comments endpoint so CLI callers
can avoid ambiguous SHA-based lookups. fetchComments and
fetchShowComments now prefer commit_id when available, falling back
to SHA only for legacy jobs without commit linkage.

Also reuse fetchShowComments for the human-readable show output so
both --json and text modes display the same merged comment set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update TUI loadResponses to prefer commit_id over SHA for legacy
comment lookup, matching the fix.go and show.go paths. All four
merge/dedup callers now consistently use the unambiguous commit_id
when available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…imestamps

GetAllCommentsForJob now returns errors from legacy comment lookups
instead of silently discarding them, so callers (daemon handler) can
log failures.

Test timestamps are now explicit RFC3339 strings with 1-hour spacing,
eliminating flaky ordering from same-second inserts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies that when the legacy SHA lookup fails (nonexistent SHA),
GetAllCommentsForJob returns a wrapped error while still providing
the job-based comments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strengthen the error-path test to verify that both job-linked
comments (alice, charlie) are returned with correct ordering,
not just a non-empty slice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on failures

Gate the SHA-based legacy comment fallback behind looksLikeSHA to
prevent task labels ("run", "analyze") from being treated as commit
SHAs. Add warnings in fetchComments for legacy fetch failures instead
of silently dropping comments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fetchComments now logs a warning when the legacy comment endpoint
returns a non-200 status (e.g. 404 commit not found) instead of
silently ignoring the failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add GetAllCommentsForJob to the HTTP client (daemon.HTTPClient) with
the same merge/dedup/sort logic, commit_id preference, and SHA
validation as the other callers. Update roborev refine to use it
so legacy commit-based comments and prior tool attempts are included
in address prompts.

Update all sync notes to reference the fifth callsite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
looksLikeSHA was rejecting uppercase hex and abbreviations shorter
than 7 chars, causing legacy comment fallback to silently miss
comments for valid git refs. Git accepts abbreviated SHAs as short
as 4 characters and is case-insensitive. Update all four copies.

Also remove unconditional log.Printf warnings from legacy comment
fetch paths in fix.go and client.go — the fallback is best-effort
and the noise interferes with --quiet and automation flows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apply De Morgan's law to looksLikeSHA hex checks (staticcheck QF1001)
and use assert.Empty instead of assert.Equal(t, "") in prompt tests
(testifylint).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 3, 2026

roborev: Combined Review (4a1ada9)

Verdict: Changes introduce a medium-severity legacy comment merge bug that can attach comments from the wrong commit/review and contaminate fix/refine context.

Medium

  • Ambiguous short-SHA fallback can merge comments from the wrong review/commit
    Locations: cmd/roborev/fix.go:572, cmd/roborev/fix.go:574, internal/storage/reviews.go:554, internal/storage/reviews.go:562, internal/daemon/client.go:407, internal/daemon/client.go:442, cmd/roborev/tui/fetch.go:538
    The new legacy-comment lookup treats short hex refs as commit SHAs and falls back to querying comments by bare sha when commit_id is absent. Because abbreviated SHAs are not guaranteed to be unique, synthetic refs or unrelated commits such as dead / cafe12 can incorrectly match stored legacy comments, causing wrong comments to be merged into fix/refine prompts and show --json. This is both a correctness issue and a prompt-contamination/data-leak risk.
    Suggested fix: Remove short-SHA fallback for legacy lookups, or require a repo-scoped full SHA / validated unique commit before merging any legacy comments.

  • Legacy comment fetch failures are silently swallowed in the HTTP client path
    Locations: internal/daemon/client.go:396, cmd/roborev/fix.go:1321, cmd/roborev/show.go:191
    The HTTP-side merge path ignores request errors, non-200 responses, and JSON decode failures. As a result, refine, fix, and show --json can quietly drop developer comments or prior attempts, which makes the new merged-comment behavior unreliable and hard to debug.
    Suggested fix: Propagate or at least log legacy fetch failures consistently instead of silently discarding them.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

The 4-char minimum caused short hex task labels like "dead", "beef",
and "cafe" to enter the SHA reachability and legacy comment lookup
paths. Since git_ref always stores full 40-char SHAs for commit-based
jobs, and GetCommitBySHA does exact matching anyway, supporting short
abbreviations added false-positive risk without actual benefit.

Restore the 7-char minimum (matching git's default abbreviation
length) while keeping mixed-case hex support, which is genuinely
needed since some tools store uppercase refs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@shoyer
Copy link
Copy Markdown
Contributor Author

shoyer commented Apr 3, 2026

Thanks for catching these — both findings are valid. Addressed in 13bf6b0:

1. Ambiguous short-SHA fallback: Restored the 7-char minimum for looksLikeSHA (matching git's default abbreviation length). The 4-char minimum was over-eager — short hex strings like dead/cafe/beef are valid task labels that would incorrectly enter the SHA reachability and legacy comment lookup paths. Since git_ref always stores full 40-char SHAs for commit-based jobs and GetCommitBySHA does exact matching, supporting short abbreviations added false-positive risk without benefit. Mixed-case support is kept since some tools store uppercase refs.

2. Silent error swallowing in legacy fetch: This is intentional — the legacy comment fetch is best-effort supplementary context. The primary job-based fetch already succeeded and any errors are returned to callers. Adding warnings back would reintroduce the stderr noise that was flagged in an earlier review (the --quiet interference issue). The tradeoff is: slightly harder to debug legacy comment misses vs. clean output for automation. We think clean output wins here since commit_id-based lookup (the preferred path) doesn't have this ambiguity problem.

Comment by Claude Code

Replace 4 duplicate local looksLikeSHA functions with a single
exported git.LooksLikeSHA that reuses the existing isHex helper.
Move the test from fix_test.go to git_test.go.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 3, 2026

roborev: Combined Review (31b6946)

Verdict: One medium-severity issue should be addressed before merge; no high or critical findings were identified.

Medium

  • internal/daemon/client.go:399, cmd/roborev/fix.go:1330, cmd/roborev/show.go:206, cmd/roborev/tui/fetch.go:510
    The legacy comment merge path silently ignores commit_id/sha lookup failures. Transport errors, non-200 responses, and JSON decode failures are treated the same as “no legacy comments,” so fix, refine, show, and the TUI can proceed without prior comments or developer feedback and without any visible signal. This is inconsistent with storage.GetAllCommentsForJob, which returns partial results plus an error.
    Suggested fix: surface legacy lookup failures explicitly by returning partial comments with an error, or at minimum emit a warning when transport/non-200/decode failures occur.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@shoyer
Copy link
Copy Markdown
Contributor Author

shoyer commented Apr 4, 2026

Duplicate of finding #2 from the previous review cycle (addressed in comment above). The silent error handling in legacy comment fetches is intentional — see rationale there.

Comment by Claude Code

The merge/dedup-by-ID/sort pattern for combining job-based and legacy
commit-based comments was duplicated in 5 callsites. Extract it into
storage.MergeResponses and replace all inline copies. Remove stale
sync notes that tracked the duplication.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 5, 2026

roborev: Combined Review (72345b6)

Verdict: Changes are mostly sound, but there is one medium-severity correctness issue that should be addressed before merge.

Medium

  • internal/daemon/client.go:390: GetAllCommentsForJob silently drops failures from the legacy commit_id/sha comment lookup. If that fallback request returns a non-200 response, fails to decode, or errors, roborev refine continues with only job-linked comments, which can omit prior developer feedback and earlier tool attempts without any signal that prompt context was lost.
    • Fix: handle the legacy request with the same error checks as the primary lookup, and return or at least log a wrapped error while preserving any comments already fetched.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Replace assert.Condition(func() bool { return false }) anti-pattern
in compact_test.go with proper require.NotEmpty/assert.Equal calls.
Trim redundant WHAT comment in show.go.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stephan Hoyer and others added 3 commits April 5, 2026 08:42
Skill-driven roborev comment calls were using the default $USER
responder, making them indistinguishable from human comments in
IsToolResponse. Add --commenter roborev-fix / --commenter roborev-refine
to match the CLI commands' behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The verbose admonition about checking comments was unnecessary —
models naturally read JSON fields. Replace with a concise one-liner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Take main's branch-filtering revert in compact_test.go while keeping
our clean assertion style (no assert.Condition anti-pattern).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wesm
Copy link
Copy Markdown
Collaborator

wesm commented Apr 5, 2026

I'm squashing this PR right now -- I can take it from here to get this merged if that works!

Move the LooksLikeSHA check out of storage.GetAllCommentsForJob and
into callers, so the storage package no longer imports internal/git.
The function now treats its fallbackSHA parameter as pre-validated
(empty string means no SHA fallback).

Add ReviewJob.CommitIDValue() to eliminate the repeated nullable
pointer extraction pattern (if j.CommitID != nil { id = *j.CommitID }).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@shoyer
Copy link
Copy Markdown
Contributor Author

shoyer commented Apr 5, 2026

Give me a few minutes, I'm doing a bit more clean-up!

@shoyer
Copy link
Copy Markdown
Contributor Author

shoyer commented Apr 5, 2026

Detailed walkthrough and design notes

Problem

Fix agents (both CLI roborev fix and skill-driven /roborev-fix) had no visibility into developer feedback left on reviews via the TUI. A user could flag a finding as a false positive or request a specific approach, and the fix agent would ignore it. The refine command included comments but lumped automated tool responses (e.g., "Fix applied via roborev-fix") together with human feedback under "Previous Addressing Attempts", which misframed user comments.

What changed

1. Core feature: comments in fix prompts

fixSingleJob, runFixBatch, and the daemon's handleFixJob now fetch comments for the review being fixed and include them in the prompt. Comments are split into two categories via prompt.SplitResponses:

  • Tool attempts (roborev-* responder prefix) → rendered under "Previous Addressing Attempts" with guidance to learn from prior approaches
  • User comments (everything else) → rendered under "User Comments" with guidance to respect developer feedback

BuildAddressPrompt (used by refine) was refactored to use the same split helpers.

2. Legacy comment merging

Comments can be stored against either a job_id (modern) or a commit_id (legacy). To get the full picture, all comment-fetching paths now merge both sources, deduplicated by response ID and sorted chronologically. The dedup logic lives in storage.MergeResponses, called from five sites that each fetch from different layers (DB direct, HTTP client, CLI HTTP with retry, show CLI, TUI).

The outer "fetch primary + fetch legacy + merge" pattern is intentionally duplicated across these five sites rather than consolidated into one. Each site uses a different transport (DB queries, HTTPClient, raw HTTP with withFixDaemonRetryContext, m.getJSON) and has different error handling semantics (fix returns errors for caller warning, show/TUI silently degrade, daemon logs). Forcing them through a single abstraction would require parameterizing the transport, error policy, and retry behavior — more complexity than the duplication warrants.

3. show --json now includes comments

The JSON output includes a comments array so skills and external tools can see developer feedback. The fix skills are updated to document this field.

4. --commenter in skills

Skill-driven roborev comment calls were using $USER as the responder, making them indistinguishable from human comments. The fix and refine skills now pass --commenter roborev-fix / --commenter roborev-refine to match the CLI commands. This ensures IsToolResponse correctly identifies automated responses regardless of whether the CLI or a skill initiated the fix.

5. git.LooksLikeSHA consolidation

Four identical local looksLikeSHA functions were consolidated into a single git.LooksLikeSHA that reuses the existing isHex helper. 7-char minimum (git's default abbreviation length) with mixed-case support.

6. storage.MergeResponses extraction

The dedup-by-ID + chronological sort pattern was inlined in 5 places. Extracted into storage.MergeResponses and replaced all copies (net -87 lines).

Recent cleanup (last few commits)

Based on internal review, the last batch of commits addressed structural concerns:

  • Removed internal/storageinternal/git dependency. GetAllCommentsForJob was importing git just for LooksLikeSHA. Now the function takes a pre-validated fallbackSHA parameter (empty string = skip). Callers that have a git ref validate it with git.LooksLikeSHA before passing it. This keeps the storage layer free of git-specific logic.

  • Added ReviewJob.CommitIDValue() helper. The pattern if j.CommitID != nil { id = *j.CommitID } appeared 4 times across fix.go, refine.go, and server.go. The new method on ReviewJob eliminates this boilerplate.

  • Simplified skill instructions. The verbose "check comments before fixing" admonition in the fix skills was trimmed to a one-liner — models naturally read JSON fields without being told to.

  • Cleaned up test assertions. compact_test.go had assert.Condition(t, func() bool { return false }) — an anti-pattern from an earlier iteration. Replaced with proper require.NotEmpty/assert.Equal.

Comment by Claude Code

@shoyer
Copy link
Copy Markdown
Contributor Author

shoyer commented Apr 5, 2026

OK, I think this is good to go from my end. I had Claude make a new summary of what we did.

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 5, 2026

roborev: Combined Review (4ed45da)

Verdict: 2 medium-severity issues should be addressed before merge.

Medium

  • Legacy tool comments can be misclassified as user comments

    • Location: internal/prompt/prompt.go:944
    • IsToolResponse only recognizes automated responders with a roborev- prefix. Older fix/refine flows in this diff recorded tool-authored comments without --commenter, so existing legacy/open reviews may now surface prior machine-generated comments under User Comments. That changes prompt semantics and can cause a later agent to treat prior tool output as developer instruction.
    • Suggested fix: add backward-compatible detection for legacy tool comments, or persist an explicit source/type instead of inferring from responder name, and cover this path with tests.
  • Legacy comment fallback path drops errors silently

    • Location: internal/daemon/client.go:389
    • GetAllCommentsForJob ignores failures in the legacy commit_id/sha fallback path, including non-200 responses and JSON decode failures. Since cmd/roborev/refine.go now depends on this helper, refine can continue with incomplete history and no warning, potentially omitting prior attempts or developer feedback for legacy jobs.
    • Suggested fix: propagate fallback fetch/status/decode errors to the caller, or at least surface a warning that runRefine can report, and add tests for non-200 and malformed legacy responses.

Security review found no issues.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Note: gemini review skipped (agent quota exhausted)

@shoyer
Copy link
Copy Markdown
Contributor Author

shoyer commented Apr 5, 2026

Both findings are duplicates of previously addressed issues:

1. Legacy tool comments misclassified: This is the exact issue the --commenter skill changes in this PR fix going forward. For existing legacy comments that were recorded without --commenter: these are rare (only comments left by skill-driven fix/refine before this PR), and misclassifying them as user comments is a safe degradation — the agent gets slightly more context framed as developer feedback rather than tool output. Not worth adding backward-compatible detection heuristics for a transient edge case.

2. Legacy comment fallback drops errors silently: Addressed in earlier review cycles (see comments above). The legacy fetch is best-effort supplementary context. The primary job-based fetch already succeeded. The DB path (storage.GetAllCommentsForJob) does propagate errors. The HTTP paths intentionally degrade silently because adding warnings reintroduces stderr noise that interferes with --quiet and automation flows.

Comment by Claude Code

@wesm
Copy link
Copy Markdown
Collaborator

wesm commented Apr 5, 2026

LGTM. thanks @shoyer !

@wesm wesm merged commit cd9f68d into roborev-dev:main Apr 5, 2026
8 checks passed
wesm pushed a commit that referenced this pull request Apr 5, 2026
## Summary

- When the refine workflow runs a branch-level range review
(`mergeBase..HEAD`), the reviewer previously had no context about what
individual per-commit reviews had already found, fixed, or dismissed —
causing it to re-raise the same issues
- Add `writeInRangeReviews()` to include per-commit review results (with
verdicts, agent names, and dismissal comments) in range/branch review
prompts, with a "do not re-raise" instruction (`InRangeReviewsHeader`)
- Extract shared `lookupReviewContexts()` helper to deduplicate the
review-fetching loop between `writeInRangeReviews` and
`getPreviousReviewContexts`
- Update both claude and codex refine skills to note that dismissal
comments are included in re-review prompts

Companion to #622, which ensures comments reach **fix agents**. This PR
ensures per-commit reviews and their comments reach the **reviewing
agent** during branch reviews, closing the other half of the
duplicate-findings loop.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Stephan Hoyer <stephan@periodiclabs.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants