Skip to content

Conversation

@joshadambell
Copy link

@joshadambell joshadambell commented Jan 15, 2026

Summary

Fixes controller reconciliation deadlock by removing the application-level upsertLock mutex. Database-level concurrency control (atomic upserts and transactions) now handles synchronization.

Problem

The shared upsertLock mutex blocked all Agent reconciliations when RemoteMCPServer reconciliation performed slow network I/O while holding the lock.

Solution

Phase 1: Reduced lock scope to exclude network I/O
Phase 2: Fixed Agent predicate to ensure Create events are always processed
Phase 3: Removed mutex entirely - database handles concurrency via:

  • Atomic upserts: INSERT ... ON CONFLICT DO UPDATE
  • Transactions: Atomic tool replacement in RefreshToolsForServer()

Phase 4: Documentation and tests

  • Added architecture doc explaining concurrency model
  • Added idempotence tests for StoreAgent and StoreToolServer
  • Added IMPORTANT comment to RefreshToolsForServer about transaction scope

@joshadambell joshadambell changed the title docs: add controller reconciliation architecture documentation fix(controller): resolve mutex deadlock between agent and remotemcpserver reconcilers Jan 15, 2026
…MCPServer

Signed-off-by: Josh Bell <joshadambell@me.com>
…re processed

Signed-off-by: Josh Bell <joshadambell@me.com>
Signed-off-by: Josh Bell <joshadambell@me.com>
@joshadambell joshadambell force-pushed the bugfix/fix-reconciler-mutex-deadlock branch from 56607a2 to 8a74de3 Compare January 15, 2026 23:18
Copy link
Contributor

@EItanya EItanya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch on the predicate issues with agents! That's so weird, I'll need to dig into the impl of those predicates.

In terms of the lock, I hate to say it but I think we can actually get rid of those locks altogether. Those are a relic of an older impl which didn't have a DB. The DB should function as the atomic record for us.

Signed-off-by: Josh Bell <joshadambell@me.com>
@joshadambell
Copy link
Author

Great catch on the predicate issues with agents! That's so weird, I'll need to dig into the impl of those predicates.

In terms of the lock, I hate to say it but I think we can actually get rid of those locks altogether. Those are a relic of an older impl which didn't have a DB. The DB should function as the atomic record for us.

@EItanya you are right. I completely removed the app level locks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants