Skip to content

feat(backend): enhance error recovery for Trustline Manager#858

Merged
emdevelopa merged 1 commit into
emdevelopa:mainfrom
dannyy2000:feature/be-enhance-error-recovery-for-trustline-manager
May 31, 2026
Merged

feat(backend): enhance error recovery for Trustline Manager#858
emdevelopa merged 1 commit into
emdevelopa:mainfrom
dannyy2000:feature/be-enhance-error-recovery-for-trustline-manager

Conversation

@dannyy2000
Copy link
Copy Markdown
Contributor

Summary

  • Replace the single global circuit breaker with a per-context registry (Map), isolating failure domains so a Horizon outage cannot block unrelated DB operations
  • Add a half-open state to the circuit breaker: one probe attempt is allowed after the timeout window before the breaker fully closes, preventing premature recovery failures
  • Wrap every operation in a configurable hard timeout (default 15 s) via withTimeout() to prevent runaway async calls from locking up the retry loop
  • Introduce an in-memory dead-letter queue (capped at 100 entries) for terminal failures; exposes getDeadLetterQueue() and drainDeadLetterQueue() for inspection and replay
  • Support pluggable fallback handlers — callers can supply an async fallback(err) that returns cached / degraded data when all retries are exhausted or the circuit is open
  • Expose getCircuitBreakerMetrics() for integration with health and monitoring endpoints
  • Extend error classification with dedicated timeout and auth error (401 / 403) categories
  • Increase test coverage from 21 → 48 tests (all passing), adding suites for per-context isolation, half-open probing, timeout, DLQ, and fallback behaviour

Test plan

  • npx vitest run src/lib/trustline-manager.test.js — 48/48 tests pass
  • All pre-existing signature verification, rate limiting, and SQL optimisation tests continue to pass
  • New tests cover: basic execution, all error classifications, per-context circuit breaker isolation, half-open probe success/failure, operation timeout, DLQ push/drain, and fallback handler invocation

Security notes

  • No secrets or credentials touched
  • Fallback handlers receive the enhanced error object (no raw DB rows or internal state leaked)
  • DLQ stores only context label, error type, sanitised message, and attempt count

Closes #745
Closes #746
Closes #747
Closes #748

- Replace single global circuit breaker with per-context registry so
  failure domains are fully isolated (a Horizon surge won't block DB ops)
- Add half-open state: one probe attempt before re-closing the breaker
- Wrap every operation in a configurable hard timeout (default 15 s)
  to prevent runaway async calls from stalling the retry loop
- Add in-memory dead-letter queue (capped at 100 entries) for
  unrecoverable failures; expose getDeadLetterQueue / drainDeadLetterQueue
- Support pluggable fallback handlers so callers can return cached /
  degraded data instead of bubbling an error to the client
- Expose getCircuitBreakerMetrics() for health/monitoring endpoints
- Extend error classification with timeout and auth error categories
- Add 27 new targeted tests (48 total, all passing) covering per-context
  isolation, half-open probing, timeout, DLQ, and fallback behaviour

Closes emdevelopa#746
@vercel
Copy link
Copy Markdown

vercel Bot commented May 31, 2026

@dannyy2000 is attempting to deploy a commit to the Emmanuel's projects Team on Vercel.

A member of the Team first needs to authorize it.

@drips-wave
Copy link
Copy Markdown

drips-wave Bot commented May 31, 2026

@dannyy2000 Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

@emdevelopa emdevelopa merged commit 2900f1c into emdevelopa:main May 31, 2026
1 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants