Skip to content

Conversation

@Gkrumbach07
Copy link
Collaborator

@Gkrumbach07 Gkrumbach07 commented Jan 9, 2026

Summary

Fixes hanging behaviour on second message by ensuring fresh Claude SDK client per message turn, plus adds a custom MCP tool for Claude to restart itself.

What Changed

Fresh Client Per Turn Pattern

  • Each message creates a new ClaudeSDKClient (connect → query → receive → disconnect)
  • Using continue_conversation=True maintains state from disk (.claude/ directory)
  • Prevents the SDK blocking issue where receive_response() hangs indefinitely on subsequent calls

Race Condition Fix

  • Moved current_message_id from instance variable to local variable in _run_claude_agent_sdk
  • Prevents cross-contamination between concurrent runs

Custom Restart MCP Tool (NEW)

  • Added mcp__session__restart_session tool that Claude can call to request a session restart
  • Uses @tool decorator and create_sdk_mcp_server() from the Claude Agent SDK
  • When called, sets a flag and emits session_restart_requested event after run completes
  • Allows Claude to recover from broken states or request a fresh connection

Feature Branch on Runtime Clone

  • clone_repo_at_runtime creates ambient/<session-id> branches when repos are added dynamically
  • Removed from hydrate.sh startup (repos stay on original branch at init)

Idempotent Repo Addition

  • /add-repo endpoint now returns was_newly_cloned flag
  • Prevents duplicate notifications when both backend and operator call the endpoint

Trade-offs

Performance:

  • ❌ ~1-2 second overhead per message from subprocess initialisation
  • ✅ System actually works without hanging

Correctness:

  • ✅ Conversation history maintained via disk state
  • ✅ No blocking or freezing
  • ✅ Reliable multi-turn conversations
  • ✅ Claude can self-recover via restart tool

Testing

Tested with multiple consecutive messages - no hanging, proper conversation flow maintained.

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Claude Code Review

Summary

This PR reverts persistent Claude SDK client functionality due to blocking behavior on second calls to receive_response(). The implementation correctly falls back to a fresh-client-per-turn pattern while maintaining conversation state via disk persistence. Additionally, it adds automatic feature branch creation (ambient/<session-id>) for better git workflow management.

Verdict: ✅ Approved with minor recommendations

The solution is pragmatic and well-reasoned. The PR properly handles SDK limitations and includes good defensive programming patterns.


Issues by Severity

🚫 Blocker Issues

None - Code is safe to merge.


🔴 Critical Issues

None - No security vulnerabilities or critical bugs detected.


🟡 Major Issues

1. Token Redaction Pattern Incomplete (Security)

Location: main.py:622-625

if github_token:
    error_msg = error_msg.replace(github_token, "***REDACTED***")
if gitlab_token:
    error_msg = error_msg.replace(gitlab_token, "***REDACTED***")

Issue: Token redaction only happens in the error case. However, tokens are embedded in URLs passed to git clone which could appear in stdout as well.

Security Risk: Medium - tokens could leak in subprocess output if git prints the URL.

Recommendation: Also redact tokens from stdout:

stdout_str = stdout.decode()
stderr_str = stderr.decode()
if github_token:
    stdout_str = stdout_str.replace(github_token, "***REDACTED***")
    stderr_str = stderr_str.replace(github_token, "***REDACTED***")
if gitlab_token:
    stdout_str = stdout_str.replace(gitlab_token, "***REDACTED***")
    stderr_str = stderr_str.replace(gitlab_token, "***REDACTED***")
logger.error(f"Failed to clone repo: {stderr_str}")

2. Race Condition in Feature Branch Creation

Location: main.py:632-645

Issue: If two calls to add_repo happen concurrently for the same repo, both could pass the existence check (repo_final.exists()) and attempt to create the feature branch simultaneously. While the second clone will fail (directory exists), there's a small window where git operations could conflict.

Impact: Low probability but could cause transient failures.

Recommendation: Use a file lock around the clone operation:

from filelock import FileLock

lock_path = repos_dir / f"{name}.lock"
async with FileLock(lock_path, timeout=30):
    if repo_final.exists():
        return True, str(repo_final), False
    # ... rest of clone logic

Alternative: Keep current implementation but document the idempotency behavior in a comment.


🔵 Minor Issues

1. Incomplete Docstring

Location: adapter.py:786-795

async def disconnect_client(self) -> None:
    """
    No-op method for compatibility (client is no longer persistent).
    
    Previously used when client was persistent across runs.
    Now each run creates and destroys its own client.
    """
    logger.debug("disconnect_client() called but client is not persistent (no-op)")

Issue: Method exists solely for backward compatibility but doesn't explain why it's still needed.

Recommendation: Add context about callers:

"""
No-op method for backward compatibility.

This method is called by main.py in change_workflow(), add_repo(), and remove_repo()
to ensure clean state transitions. With the fresh-client-per-turn pattern, no cleanup
is needed since clients self-destruct after each run.

Previously used when client was persistent across runs.
"""

2. Missing Error Context in Feature Branch Creation

Location: main.py:641-645

if checkout_process.returncode \!= 0:
    logger.warning(f"Failed to create feature branch '{feature_branch}': {checkout_stderr.decode()}")
    # Continue anyway - repo is still usable on the original branch

Issue: Warning logged but no indication to user why they might see different branch than expected.

Recommendation: Emit a RawEvent notification (similar to adapter.py patterns) so the user sees it in the UI:

if checkout_process.returncode \!= 0:
    logger.warning(f"Failed to create feature branch '{feature_branch}': {checkout_stderr.decode()}")
    # Consider emitting event via adapter notification system

3. Inconsistent Logging Verbosity

Location: adapter.py:560-561

async for message in client.receive_response():
    message_count += 1
    logger.info(f"[ClaudeSDKClient Message #{message_count}]: {message}")

Issue: This will log every single streaming message at INFO level, which could be very noisy (potentially hundreds of messages per turn).

Recommendation: Reduce to DEBUG level:

logger.debug(f"[ClaudeSDKClient Message #{message_count}]: {message}")

Or add a throttle:

if message_count % 10 == 0:  # Log every 10th message
    logger.info(f"Processed {message_count} messages...")

4. Shell Script Error Handling

Location: hydrate.sh:182-183

if (cd "" && git checkout -b "" 2>&1); then
    echo "  ✓ Created feature branch: "

Issue: If the branch already exists (e.g., from a previous failed run), git checkout -b will fail. Should use git checkout -B (force create) or check existence first.

Recommendation:

if (cd "" && git checkout -B "" 2>&1); then
    echo "  ✓ Created/switched to feature branch: "

Positive Highlights

1. ✅ Excellent Problem Analysis

The PR description clearly explains the root cause (SDK limitation) and trade-offs. This kind of transparency is valuable for future maintainers.

2. ✅ Proper Revert Strategy

Rather than attempting complex workarounds, the team recognized when to revert. This shows good engineering judgment.

3. ✅ Reconnection Logic

Location: adapter.py:528-551

The query reconnection handler is defensive and well-structured:

try:
    await client.query(prompt)
except Exception as query_error:
    if any(keyword in error_str for keyword in ["connection", "subprocess", "pipe", "broken"]):
        # Reconnect with fresh client
        client = create_sdk_client(options)
        await client.connect()
        await client.query(prompt)
    else:
        raise

This follows the error handling patterns from .claude/patterns/error-handling.md.

4. ✅ Idempotency in add_repo

Location: main.py:760-776

The was_newly_cloned flag prevents duplicate state updates and notifications:

if was_newly_cloned:
    # Only update state and notify if actually cloned
    os.environ["REPOS_JSON"] = json.dumps(repos)
    asyncio.create_task(trigger_repo_added_notification(name, url))
else:
    logger.info(f"Repo '{name}' already existed, skipping notification (idempotent call)")

This is good defensive programming for concurrent operations.

5. ✅ Race Condition Fix

Location: adapter.py:292-293

Moving current_message_id from instance variable to local variable:

# Per-run state - NOT instance variables to avoid race conditions with concurrent runs
current_message_id: Optional[str] = None

This is exactly the right fix per the commit message. Prevents cross-contamination between concurrent runs.

6. ✅ Feature Branch Automation

The automatic creation of ambient/<session-id> branches is a great UX improvement for git workflow management. Users get isolated branches per session automatically.


Recommendations

Priority 1: Fix Token Redaction (Security)

Address the incomplete token redaction in stdout (see Major Issue #1).

Priority 2: Reduce Logging Verbosity

Change streaming message logs from INFO to DEBUG level (see Minor Issue #3).

Priority 3: Documentation

  • Add comments explaining the SDK limitation and why fresh-client-per-turn is necessary
  • Document the was_newly_cloned pattern for future endpoint authors
  • Explain why disconnect_client() is kept as no-op

Priority 4: Testing

Consider adding integration tests for:

  • Concurrent add_repo calls with same repo URL
  • Feature branch creation when branch already exists
  • Token redaction in error scenarios

Code Quality Assessment

Criteria Rating Notes
Security 🟡 Good Minor token redaction gap, otherwise solid
Error Handling 🟢 Excellent Follows established patterns, defensive programming
Code Organization 🟢 Excellent Clear separation of concerns
Documentation 🟡 Good Code comments present, could expand on "why"
Performance 🟢 Good Accepts ~1-2s overhead as documented trade-off
Testing ⚪ N/A No test changes in PR

Final Verdict

LGTM with minor fixes recommended

This PR correctly addresses a blocking SDK issue and includes thoughtful improvements to git workflow management. The code follows established project patterns and includes good defensive programming practices.

Suggested merge strategy:

  1. Fix token redaction in stdout (5-minute change)
  2. Reduce log verbosity to DEBUG (1-minute change)
  3. Merge to main
  4. Consider follow-up PR for filelock-based concurrency handling if needed

Estimated time to address recommendations: 15 minutes


References

  • ✅ Follows error handling patterns from .claude/patterns/error-handling.md
  • ✅ Proper logging practices (though verbosity could be reduced)
  • ✅ No violations of security standards from .claude/context/security-standards.md
  • ⚠️ Token redaction should be more comprehensive (see security standards)

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Claude Code Review

Summary

This PR reverts an attempted optimization to reuse Claude SDK clients across message turns, which caused the system to hang indefinitely on the second message. The root cause is a limitation in the Python Claude SDK where the receive_response() iterator cannot be reused. The solution is sound: revert to fresh-client-per-turn while using continue_conversation=True to maintain state via disk persistence.

Verdict: Acceptable with minor improvements recommended. The revert addresses a critical blocking issue, and the trade-off (1-2s latency vs. working system) is appropriate.


Issues by Severity

Blocker Issues

None. The PR addresses a blocker (hanging on second message) and unblocks the system.


Critical Issues

1. Race Condition Risk with current_message_id (adapter.py:292-295)

The PR converts _current_message_id and _current_tool_id from instance variables to local variables to avoid race conditions. However:

  • _current_tool_id is still declared as an instance variable (line 75 comment) but is never initialized as a local variable in _run_claude_agent_sdk
  • The code references _current_tool_id elsewhere (outside the diff), which may still use the stale instance variable

Recommendation: Search for all uses of _current_tool_id and convert them to local variables, or remove the instance variable declaration if unused.


Major Issues

1. Incomplete Error Handling in Reconnection Logic (adapter.py:527-550)

The reconnection logic is good but has gaps:

  • Only catches connection-related errors (connection, subprocess, pipe, broken)
  • Other transient errors (timeouts, rate limits) are not handled
  • No maximum retry count - could retry indefinitely on persistent issues

Recommendation: Add retry limits (MAX_RETRIES = 2) and handle timeout errors

2. Logging Verbosity May Impact Performance (adapter.py:561-563)

The PR adds per-message logging in the hot path. For sessions with many tool calls or long responses, this could generate massive logs and impact performance.

Recommendation: Use logger.debug() instead or add conditional logging (every 10th message)

3. Feature Branch Creation Without Error Recovery (main.py:628-643)

The feature branch creation logic logs a warning if it fails but continues anyway, and doesn't update the user or context about the actual branch being used.

Recommendation: Return the actual branch name and update prompt context

4. Duplicate Logic Between hydrate.sh and clone_repo_at_runtime

Both create feature branches with nearly identical logic. This violates DRY and creates maintenance burden.

Recommendation: Extract branch creation logic to a shared shell function or Python helper.


Minor Issues

  1. Inconsistent logging format mixing styles
  2. Magic strings for error detection (string matching is fragile)
  3. Return value not documented (was_newly_cloned semantics)
  4. Environment variable fallback inconsistency
  5. Commented-out code removed (Good!)

Positive Highlights

  • Clear problem diagnosis
  • Correct solution approach
  • Backwards compatibility maintained
  • Logging improvements for debugging
  • Feature branch support added
  • Idempotency improvement (was_newly_cloned flag)
  • Dead code removal
  • Comprehensive commit messages

Recommendations

Priority 1 (Should Fix Before Merge)

  1. Verify _current_tool_id usage - ensure no references to the removed instance variable remain
  2. Add retry limits to reconnection logic (prevent infinite loops)
  3. Reduce logging verbosity in hot path (use .debug() or throttle)

Priority 2 (Address in Follow-up)

  1. Return actual branch name from clone_repo_at_runtime for accurate context
  2. Extract feature branch creation to shared utility
  3. Use exception types instead of string matching for error detection
  4. Add inline comments for non-obvious return semantics

Priority 3 (Nice-to-Have)

  1. Standardize logging format
  2. Create helper for session ID extraction

Testing Recommendations

Already Tested: Multiple consecutive messages (no hanging)

Recommended Additional Tests:

  • Test with connection interruption mid-message (validate reconnection)
  • Test with very long responses (validate logging doesn't impact performance)
  • Test feature branch creation on repos with existing ambient branches
  • Test idempotency of add_repo endpoint (call twice, verify single notification)

Final Verdict

Approve with Recommendations

This PR fixes a critical blocking issue and makes the system usable again. The trade-off between latency and correctness is appropriate. The suggested improvements are mostly polish and future-proofing, not blockers.

Confidence Level: High (reviewed against established patterns, tested for hanging issue)


Review conducted using memory system context: backend-development.md, error-handling.md, and Python configuration standards from CLAUDE.md


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@Gkrumbach07 Gkrumbach07 changed the title Revert persistent Claude client - fix hanging on second message fix: Fresh client per turn pattern and race condition fixes for Claude SDK Jan 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Claude Code Review

Summary

This PR successfully fixes the SDK hanging issue on second messages by implementing a fresh client per turn pattern with disk-based conversation state. The approach is pragmatic and well-reasoned given SDK behavior. Additional improvements include race condition fixes, feature branch automation, and reconnection handling.

Issues by Severity

🚫 Blocker Issues

None - No blocking issues found.

🔴 Critical Issues

None - No critical security or correctness issues.

🟡 Major Issues

1. Python Code Standards - Missing Docstrings (adapter.py:292-293)

Issue: While the comment is helpful, Python standards (PEP 257) require docstrings for functions. The function docstring exists but does not mention the race condition fix.

Recommendation: Update the function docstring to explicitly mention the race condition fix.

2. Error Handling - Broad Exception Catching (adapter.py:529-551)

Issue: Catching Exception is too broad and relies on string matching for error classification. This could mask unexpected errors.

Recommendation: Document the specific SDK exceptions, consider catching more specific exception types, add logging for the else branch before re-raising.

3. Git Operations Without Depth Limit (main.py:608, hydrate.sh:175)

Issue: Removed --depth 1 to support branch operations, but this means cloning full history which could be slow/large for repos with extensive history.

Recommendation: Use --depth 1 initially, then use git fetch --unshallow only if branch creation fails.

🔵 Minor Issues

1. Logging Verbosity Change (adapter.py:560-561)

Changed from logger.info to logger.debug for message streaming. This reduces visibility into message processing.

Recommendation: Consider keeping INFO level but throttling or using structured logging levels.

2. Feature Branch Name Pattern (main.py:631, hydrate.sh:180)

Uses AGENTIC_SESSION_NAME which is typically a UUID. Document this branch naming convention.

3. Race Condition Fix Documentation (adapter.py:73-75)

The comment mentions both _current_message_id and _current_tool_id but only current_message_id is visible. Clean up the comment.

4. Idempotency Return Value (main.py:789)

Good: The newly_cloned field helps callers understand if this was a new clone. Consider documenting this in the endpoint docstring.

Positive Highlights

✅ Excellent Problem Analysis: Clear explanation of root cause and trade-offs
✅ Proper Race Condition Fix: Moving current_message_id to local variable
✅ Feature Branch Automation: Consistent ambient/ branches
✅ Idempotent Design: was_newly_cloned prevents duplicate notifications
✅ Reconnection Resilience: Graceful handling of transient failures
✅ Code Cleanup: Removed dead _persistent_client references
✅ Consistent Logging: Good use of redacted logging for tokens

Recommendations

Immediate Actions (Before Merge)

  1. Update function docstring to mention race condition fix
  2. Add logging before re-raising in reconnection logic
  3. Clean up comment about _current_tool_id

Follow-up Tasks (After Merge)

  1. Performance Optimization: Consider --depth 1 + --unshallow pattern
  2. SDK Exception Handling: Document specific SDK exceptions
  3. Logging Strategy: Throttled INFO logging for production visibility

Testing Checklist

  • ✅ Multiple consecutive messages tested
  • ✅ Conversation history maintained via disk state
  • ⚠️ Recommend: Test concurrent run scenarios
  • ⚠️ Recommend: Test reconnection logic with simulated failures
  • ⚠️ Recommend: Test feature branch creation with various session ID formats

Architecture Compliance

Python Standards:

  • ✅ Uses async/await appropriately
  • ✅ Type hints present
  • ⚠️ Could improve docstring coverage

Error Handling:

  • ✅ Proper logging with context
  • ✅ Error messages do not expose internals
  • ⚠️ Broad exception catching could be more specific

Code Quality:

  • ✅ Clear variable names
  • ✅ Appropriate comments
  • ✅ Consistent code style

Conclusion

Recommendation: APPROVE with minor improvements

This PR effectively solves the SDK hanging issue with a well-reasoned approach. The race condition fix, feature branch automation, and idempotency handling are excellent. The suggested improvements are minor and could be addressed in follow-up work.

The trade-off of 1-2 second overhead per message for reliability is acceptable given the alternative is a non-functional system.

Key Strengths:

  • Pragmatic solution to SDK limitations
  • Proper race condition fix
  • Good code cleanup
  • Thoughtful idempotency design

Minor Improvements:

  • Docstring coverage
  • Exception specificity
  • Git clone performance optimization

Great work on identifying and fixing a critical reliability issue! 🎉


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

…tory handling

- Updated ClaudeCodeAdapter to use local variables for message IDs to prevent race conditions during concurrent runs.
- Introduced a new `restart_session_tool` for handling session restarts via MCP tools.
- Enhanced repository cloning logic to create and checkout feature branches named 'ambient/<session-id>' when cloning at runtime.
- Modified the add_repo function to only trigger notifications for newly cloned repositories, preventing duplicate notifications.
- Improved logging for better observability of session and repository operations.

This refactor enhances the reliability and clarity of session management and repository interactions within the ClaudeCodeAdapter.
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Claude Code Review

Summary

Fresh client per turn pattern fixes SDK hanging. Well-documented with race condition fixes, feature branch automation, and idempotency improvements.

Issues by Severity

🚫 Blocker Issues

None

🔴 Critical Issues

None

🟡 Major Issues

1. Git Clone Performance (main.py:608)
Removed --depth 1, could cause issues with large repos.
Recommendation: Use --depth 1 with --unshallow fallback.

2. Branch Failure Handling (main.py:641-645)
Warns if branch creation fails but prompt claims success.
Recommendation: Update workspace prompt to reflect actual status.

🔵 Minor Issues

  1. Comment mentions unused _current_tool_id (adapter.py:73-75)
  2. Type hint could be dict[str, Any] (adapter.py:413)

Positive Highlights

✅ Excellent problem analysis
✅ Proper race condition fix
✅ Feature branch automation
✅ Idempotent API design
✅ Creative session control tool
✅ Good code cleanup

Recommendations

Before Merge: Restore --depth 1 with fallback, update workspace prompt

After Merge: Add test coverage, monitor overhead

Conclusion

APPROVE with Minor Improvements - Fixes critical SDK issue. Priority: git clone performance.


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@Gkrumbach07 Gkrumbach07 changed the title fix: Fresh client per turn pattern and race condition fixes for Claude SDK fix: Fresh client per turn + custom restart tool for Claude SDK Jan 9, 2026
Copy link
Collaborator

@sallyom sallyom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only auto-generate a new branch if a user does not provide a branch in the context modal. Other than that this LGTM

if repos_cfg:
session_id = os.getenv('AGENTIC_SESSION_NAME', '').strip()
feature_branch = f"ambient/{session_id}" if session_id else None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update #498 to use this branch name - but I think we don't need to always auto-create a branch - we should only auto-create a branch if a user does not provide a branch name in the context modal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants