Enhance health check functionality with caching#291
Conversation
📝 WalkthroughWalkthroughA shared admin storage backend (Redis) is created once during server startup and reused to construct the operator config store and nonce store; startup now seeds operator config if missing and surfaces clearer IO errors and logs for storage/config/nonce creation failures. Changes
Sequence Diagram(s)sequenceDiagram
participant Starter as StartServer
participant AdminStore as create_admin_storage_backend
participant ConfigStore as create_admin_config_store
participant NonceStore as create_admin_nonce_store
participant Seeder as seed_operator_config
participant AdminAPI as AdminApiState
Starter->>AdminStore: initialize shared admin storage (redis_url)
AdminStore-->>Starter: admin_storage (or error)
Starter->>ConfigStore: create config store using admin_storage + solver_id
ConfigStore-->>Starter: config_store (or error)
Starter->>NonceStore: create nonce store using admin_storage + solver_id + ttl
NonceStore-->>Starter: nonce_store (or error)
Starter->>Seeder: ensure operator config exists via config_store
Seeder-->>Starter: seeded / already present
Starter->>AdminAPI: build AdminApiState(token_manager, config_store, nonce_store)
AdminAPI-->>Starter: admin API ready
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
shahnami
left a comment
There was a problem hiding this comment.
I’m a bit confused about the need for this change. Are we caching the health state for storage readiness? If so, what problem are we trying to solve?
My understanding is that a health check should always reflect the current state of the system. Returning a cached response from e.g. two minutes ago means we’re no longer reporting the actual (real-time) status, which somewhat defeats the purpose of having a health check in the first place?
If the goal is to prevent excessive calls (e.g. to avoid spamming an internal dependency), wouldn’t it be better to adjust the check frequency or caller behavior instead of introducing caching at the health endpoint level? Could you clarify the rationale behind this?
So, the problem we are solving is that /health currently triggers storage readiness that can create Redis connection churn,regarding to cache yes, we can remove it, although the default is 5 seconds, definitely not aggresive cache in that case, but I prefer remove it. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Summary
Testing Process
Checklist
Summary by CodeRabbit