-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Description
This issue covers the migration of existing services from the current polling-based approach to the event-driven architecture using the library implemented in the issue #731 .
Current State
Services currently run polling loops that periodically query the database:
- EvmReader: Polls L1 for inputs and epoch events, writes to DB
- Advancer: Polls for
CLOSEDepochs and unprocessed inputs - Validator: Polls for
INPUTS_PROCESSEDepochs - Claimer: Polls for
CLAIM_CALCULATEDepochs - PRT: Polls for
CLAIM_CALCULATEDepochs and dispute events
Target State
Services will:
- Subscribe to relevant events on startup
- React to events as they arrive
- Perform periodic sync as fallback (hybrid pattern)
- Publish events when they cause state transitions
Tasks
EvmReader
- Publish
epoch.openedwhen creating a new epoch - Publish
epoch.closedwhen closing an epoch - Publish
input.receivedwhen storing new inputs - Publish
app.registeredwhen registering new applications
Advancer
- Subscribe to
epoch.closedandinput.receivedevents - Replace main polling loop with event-driven processing
- Keep periodic sync for catch-up on startup and missed events
- Publish
epoch.inputs_processedwhen completing epoch processing
Validator
- Subscribe to
epoch.inputs_processedevents - Replace polling loop with event-driven processing
- Keep periodic sync for resilience
- Publish
epoch.claim_calculatedwhen proof is ready
Claimer
- Subscribe to
epoch.claim_calculatedevents - Replace polling loop with event-driven processing
- Keep periodic sync for resilience
- Publish
epoch.claim_submittedwhen submitting to L1 - Publish
epoch.claim_acceptedorepoch.claim_rejectedbased on L1 result - Publish
app.inoperablewhen proof is rejected (quorum)
PRT
- Subscribe to
epoch.claim_calculatedevents - Replace polling loop with event-driven processing
- Keep periodic sync for resilience
- Publish
epoch.claim_submitted,epoch.claim_accepted,epoch.claim_rejected - Publish
app.inoperablewhen proof is rejected
General
- Update service initialization to create event bus
- Ensure graceful shutdown closes event subscriptions
- Add metrics for event processing latency
- Update existing tests to account for event publishing
Hybrid Pattern
Each service should implement the following pattern for resilience:
func (s *Service) Run(ctx context.Context) error {
// 1. Initial sync on startup (catch up on anything missed while offline)
if err := s.syncPendingWork(ctx); err != nil {
return err
}
// 2. Subscribe to relevant events
ch, err := s.bus.Subscribe(ctx, s.eventFilter())
if err != nil {
return err
}
// 3. Periodic sync ticker (catch any missed events)
ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop()
// 4. Main loop
for {
select {
case event := <-ch:
s.handleEvent(ctx, event)
case <-ticker.C:
s.syncPendingWork(ctx)
case <-ctx.Done():
return ctx.Err()
}
}
}This ensures:
- No work is missed during service restarts
- Events that arrive while processing are not lost
- System remains functional even if event delivery fails
Note on event ordering
The
selectstatement in Go does not guarantee ordering when multiple cases are ready. However, this is acceptable for our design because:
- Events are hints, not commands — they signal that work may be available
- The authoritative state is always in the database
- Business invariants (epoch ordering, sequential processing) are enforced by database queries, not event order
- Services must be idempotent — receiving the same event twice or out of order should not cause incorrect behavior
If stricter ordering guarantees are needed in the future, the library can be extended to use a priority queue or sequence numbers.
Acceptance Criteria
- All services use event-driven communication for inter-service coordination
- Polling interval can be significantly increased (e.g., 30s+) as it's now only a fallback
- System behavior remains correct when events are missed (hybrid pattern works)
- No regression in existing functionality
- Integration tests pass with event-driven architecture
- Observability: logs/metrics show event processing
Dependencies
- Prerequisite: Event-driven communication library implementation (Issue Implement event-driven communication library using PostgreSQL LISTEN/NOTIFY #731 )
Migration Strategy
The migration can be done incrementally per service:
- Phase 1: Add event publishing to EvmReader (no consumers yet)
- Phase 2: Migrate Advancer to consume events + publish its own
- Phase 3: Migrate Validator
- Phase 4: Migrate Claimer and PRT
Each phase can be merged independently, and the system will continue working during migration since the polling fallback remains active.
Notes
- Keep polling as fallback, don't remove it entirely
- The periodic sync interval can be tuned based on operational experience
- Monitor event queue depths to detect backpressure issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status