Skip to content

Enhance health check functionality with caching#291

Merged
nahimterrazas merged 4 commits intomainfrom
redis-improve-healthcheck
Feb 13, 2026
Merged

Enhance health check functionality with caching#291
nahimterrazas merged 4 commits intomainfrom
redis-improve-healthcheck

Conversation

@nahimterrazas
Copy link
Copy Markdown
Collaborator

@nahimterrazas nahimterrazas commented Feb 10, 2026

Summary

Testing Process

Checklist

  • Add a reference to related issues in the PR description.
  • Add unit tests if applicable.

Summary by CodeRabbit

  • Refactor
    • Improved storage backend initialization and admin API startup for more reliable configuration loading.
  • Bug Fixes
    • Ensure operator admin configuration is seeded if missing and surface clearer errors during startup.
  • Tests
    • Added tests covering storage initialization and admin config/nonce store behaviors to prevent regressions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

A shared admin storage backend (Redis) is created once during server startup and reused to construct the operator config store and nonce store; startup now seeds operator config if missing and surfaces clearer IO errors and logs for storage/config/nonce creation failures.

Changes

Cohort / File(s) Summary
Server storage & helpers
crates/solver-service/src/server.rs
Added create_admin_storage_backend, create_admin_config_store, create_admin_nonce_store. Start-up flow now creates a single admin storage backend, reuses it for OperatorConfig and Nonce stores, seeds operator config if missing, and adds descriptive IO error mapping and tracing logs.
Tests
crates/solver-service/src/tests/*
Added tests validating error on invalid Redis URL and that config/nonce stores use the shared admin storage backend; updated existing tests to call new helpers and shared-backend paths.

Sequence Diagram(s)

sequenceDiagram
  participant Starter as StartServer
  participant AdminStore as create_admin_storage_backend
  participant ConfigStore as create_admin_config_store
  participant NonceStore as create_admin_nonce_store
  participant Seeder as seed_operator_config
  participant AdminAPI as AdminApiState

  Starter->>AdminStore: initialize shared admin storage (redis_url)
  AdminStore-->>Starter: admin_storage (or error)
  Starter->>ConfigStore: create config store using admin_storage + solver_id
  ConfigStore-->>Starter: config_store (or error)
  Starter->>NonceStore: create nonce store using admin_storage + solver_id + ttl
  NonceStore-->>Starter: nonce_store (or error)
  Starter->>Seeder: ensure operator config exists via config_store
  Seeder-->>Starter: seeded / already present
  Starter->>AdminAPI: build AdminApiState(token_manager, config_store, nonce_store)
  AdminAPI-->>Starter: admin API ready
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • NicoMolinaOZ
  • shahnami
  • zeljkoX

Poem

🐰 I forged one backend neat and tight,
Reused with care from morning to night,
Configs and nonces now sit side-by-side,
A hopping helper for the server's stride. 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title 'Enhance health check functionality with caching' does not match the actual changes, which refactor admin storage helpers and startup flow—not health check caching. Update the title to accurately reflect the main changes, such as 'Refactor admin storage helpers and startup flow' or 'Consolidate admin storage backend initialization'.
Description check ⚠️ Warning The PR description only contains template headings with no actual implementation details, making it impossible to understand the purpose, rationale, or testing approach. Fill in the Summary section with implementation details, explain the rationale for changes, and document the testing process performed. Check the two checklist items if applicable.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch redis-improve-healthcheck

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
crates/solver-service/src/server.rs (3)

76-99: Redundant empty solver_id validation.

The check at lines 83–87 duplicates the validation already present in create_config_store (see config_store.rs lines 178–181). It's not harmful, but it means two code paths need to stay in sync.

That said, it does produce a more contextual error message ("Config store error: solver ID cannot be empty" vs the upstream generic one), so keeping it is reasonable if intentional.


215-216: Unnecessary .clone() on redis_url.

redis_url is not used after this line, so the clone is superfluous. You can pass it directly.

Suggested fix
-				let admin_storage = create_admin_storage_backend(redis_url.clone())?;
+				let admin_storage = create_admin_storage_backend(redis_url)?;

244-276: Duplicate error log for nonce store creation failure.

create_admin_nonce_store (line 112) already logs tracing::error!("Failed to initialize admin nonce store: ..."). The Err arm at line 274 logs the same message again. This results in a duplicated error log entry for a single failure, which can be noisy and confusing during incident triage.

Either remove the log inside the helper (to let callers decide) or remove the one here.

Option A: Remove the duplicate at the call site
 				Err(e) => {
-					tracing::error!("Failed to initialize admin nonce store: {}", e);
 					None
 				},

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread crates/solver-service/src/server.rs Outdated
Copy link
Copy Markdown
Collaborator

@shahnami shahnami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a bit confused about the need for this change. Are we caching the health state for storage readiness? If so, what problem are we trying to solve?

My understanding is that a health check should always reflect the current state of the system. Returning a cached response from e.g. two minutes ago means we’re no longer reporting the actual (real-time) status, which somewhat defeats the purpose of having a health check in the first place?

If the goal is to prevent excessive calls (e.g. to avoid spamming an internal dependency), wouldn’t it be better to adjust the check frequency or caller behavior instead of introducing caching at the health endpoint level? Could you clarify the rationale behind this?

@nahimterrazas
Copy link
Copy Markdown
Collaborator Author

I’m a bit confused about the need for this change. Are we caching the health state for storage readiness? If so, what problem are we trying to solve?

My understanding is that a health check should always reflect the current state of the system. Returning a cached response from e.g. two minutes ago means we’re no longer reporting the actual (real-time) status, which somewhat defeats the purpose of having a health check in the first place?

If the goal is to prevent excessive calls (e.g. to avoid spamming an internal dependency), wouldn’t it be better to adjust the check frequency or caller behavior instead of introducing caching at the health endpoint level? Could you clarify the rationale behind this?

So, the problem we are solving is that /health currently triggers storage readiness that can create Redis connection churn,regarding to cache yes, we can remove it, although the default is 5 seconds, definitely not aggresive cache in that case, but I prefer remove it.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 89.42308% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/solver-service/src/server.rs 89.4% 11 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@shahnami shahnami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nahimterrazas nahimterrazas merged commit e366e06 into main Feb 13, 2026
8 checks passed
@nahimterrazas nahimterrazas deleted the redis-improve-healthcheck branch February 13, 2026 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants