refactor: migrate services with existing config.rs to shared TOML config#1204
Conversation
geoffjay
left a comment
There was a problem hiding this comment.
Review: refactor: migrate services with existing config.rs to shared TOML config
Stack position: issue-1196 ← issue-1195 (needs-rework) ← issue-1199 ← issue-1194 (needs-rework). The chain is broken two levels up. This PR cannot merge until #1201 and #1203 are reworked and the stack restacks. Reviewing now so the feedback is ready to act on immediately.
The migration pattern — load() as the new entry point, from_env() deprecated and delegating to load(), ENV_LOCK added to tests, remove_var moved before assertions — is consistently applied and correct. One blocking issue: silent config-file errors.
Blocking
unwrap_or_default() swallows malformed config-file errors with no diagnostic
Every migrated service uses the same pattern:
let shared = agentd_common::config::load().unwrap_or_default();agentd_common::config::load() returns Ok(AgentdConfig::default()) for a missing file (intentional), but returns Err(...) for a malformed file (parse error). unwrap_or_default() treats both cases identically — the service starts on compiled defaults with no indication to the operator that their config.toml was ignored.
Consequence: a user with a typo in config.toml (wrong key name, bad value type, stray character) will see the service start on port 17006 instead of their configured 19006, with no log line explaining why. This is a silent misconfiguration failure.
The fix is one line per service. Since init_tracing() is called before Config::load() in every main.rs, tracing is available:
let shared = agentd_common::config::load().unwrap_or_else(|e| {
tracing::warn!("failed to load config file, using compiled defaults: {e:#}");
agentd_common::config::AgentdConfig::default()
});This applies to all six load() / load_with_base() callers: hook, monitor, mcp, memory::EmbeddingConfig, memory::LanceConfig, and ui. The index service inherits via load_with_base which receives a &SharedIndexConfig already extracted from the shared config, so the warning only needs to be in IndexConfig::load() (which is not shown in the diff — presumably it calls EmbeddingConfig::load_with_base and LanceConfig::load_with_base after its own agentd_common::config::load() call).
Partial migrations — expected, but annotate clearly
Several fields in MonitorConfig and HookConfig still fall back to hardcoded literals because the corresponding fields don't exist in the shared schema yet (flagged in the #1201 review):
| Service | Fields still hardcoded (pending #1201 schema additions) |
|---|---|
monitor |
cpu_alert_threshold (90.0), memory_alert_threshold (90.0), disk_alert_threshold (90.0), history_size (120) |
hook |
notify_on_failure (true), notify_on_long_running (true), long_running_threshold_ms (30_000) |
index.embedding |
endpoint ("http://localhost:11434/v1"), api_key |
index.lance |
table ("code_chunks") |
This is the correct approach given the current state of the shared schema. Add a // TODO(#1201): migrate when shared schema adds <field> comment on each hardcoded fallback so the follow-up is obvious when #1201 lands.
Non-blocking observations
1. Config file parsed twice for memory
EmbeddingConfig::load() and LanceConfig::load() each call agentd_common::config::load() independently. From memory/main.rs:
let lance_config = LanceConfig::load(); // → parses config.toml
let embedding_config = EmbeddingConfig::load(); // → parses config.toml againThe file is read and deserialised twice on startup. Consider adding MemoryConfig::load() -> (LanceConfig, EmbeddingConfig) that calls agentd_common::config::load() once and passes the section to both sub-configs (mirroring the load_with_base pattern already used in the index crate).
2. UiConfig derives Debug, Clone — minor scope addition
Not required for the migration, but harmless and likely needed. Fine to keep; just note it in the commit message or PR description so it doesn't get lost.
3. Index test env-var save/restore doesn't protect against panics
env::remove_var("AGENTD_PORT");
env::remove_var("AGENTD_INDEX_PORT");
let config = IndexConfig::from_env(); // ← if this panics, restore code never runs
if let Some(v) = port_saved { env::set_var("AGENTD_PORT", v); }If from_env() panics, the vars are left unset for subsequent tests. Since ENV_LOCK is held, no other test will run concurrently, but the next test that acquires the lock will see absent vars. The risk is low (from_env is unlikely to panic), and the pattern matches the rest of the test suite. Noting for awareness.
What's correct
load()→from_env()deprecation chain — correct; delegates cleanly ✓#[deprecated]+#[allow(deprecated)]in tests — applied consistently across all six crates ✓- MCP
remove_varmoved before assertion — correct cleanup-before-assert pattern ✓ ENV_LOCKadded to index and mcp test modules — fixes a real concurrency issue ✓- Precedence is preserved:
AGENTD_PORT(direct env) overridesbase.port(shared config) ✓ - Backward-compatible:
AGENTD_ORCHESTRATOR_URLfor mcp still works as highest-priority override ✓ mcp/Cargo.tomladdsagentd-commondependency correctly ✓- Only
main.rscall sites updated toload()— correct; the deprecatedfrom_env()wrapper ensures existing callers still compile ✓
Summary: Add a tracing::warn! to the unwrap_or_default() call in every migrated load() function, and add // TODO(#1201) comments on the hardcoded fallbacks. Everything else is clean. Hold for parent chain restack.
…g tests Add ENV_LOCK to mcp and index config test modules so env-var-mutating tests are serialised in the parallel test harness. Also clear AGENTD_PORT (set by agentd at runtime) before asserting default port values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s; add TODO(#1201) comments Replace silent unwrap_or_default() with unwrap_or_else(|e| tracing::warn\!(...)) in all six migrated load() callers (hook, monitor, mcp, memory::EmbeddingConfig, memory::LanceConfig, index, ui). Operators now see a warning when config.toml is malformed rather than silently falling back to compiled defaults. Add TODO(#1201) comments on every hardcoded fallback that is pending a schema addition: notify_on_failure, notify_on_long_running, long_running_threshold_ms (hook); cpu/memory/disk_alert_threshold, history_size (monitor); embedding_endpoint (index.embedding); lance_table (index.lance). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a ValidateConfig trait to agentd_common with implementations for all 12 shared config sections. Add AgentdConfig::validate() that collects errors from every section. Implement ValidateConfig on all service-level config structs and wire validate() into each service's startup path so misconfigurations fail early with descriptive error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Create dedicated config.rs modules for core, orchestrator, wrap, ask, notify, and communicate. Each struct loads from agentd_common::config::load() first, then overlays legacy AGENTD_* env vars for backward compatibility. All inline env::var() calls in main.rs replaced with config struct access. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add impl ValidateConfig for EmbeddingConfig and LanceConfig in crates/memory/src/config.rs (blocking gap: memory was the only service with no validation at startup) - Wire validation in memory/main.rs: call validate() on embedding and lance configs, load shared config for port/host so port=0 is caught before any I/O with a clear error message - Remove validate_inner() indirection in crates/index/src/config.rs; move logic directly into impl ValidateConfig for IndexConfig and import the trait in index/main.rs and the test module - Make validate_url() pub in agentd-common so service crates can reuse it without duplicating the http/https check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ilures
Replace silent unwrap_or_default() with unwrap_or_else(|e| { tracing::warn\!(...) })
in all six new service config loaders (ask, communicate, core, notify, orchestrator,
wrap). Restore diagnostic warn\!() calls for invalid AGENTD_WRAP_HISTORY_BYTES and
AGENTD_WRAP_CHANNEL_CAPACITY env vars with TODO comments referencing #1201.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create docs/configuration.md with quick start, file location, precedence rules, complete key reference table, per-service sections, migration guide, and four example configs (minimal, development, production, Docker) - Fix the env var table in agentd-common/config.rs to include all overrides (AGENTD_INDEX_LANGUAGES, AGENTD_RECONCILE_INTERVAL_SECS, all MCP URLs, AGENTD_NOTIFY_SERVICE_URL, AGENTD_COLLECTION_INTERVAL_SECS) - Add link to configuration reference from README.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This change is part of the following stack:
Change managed by git-spice. |
docs: add configuration system documentation
feat: add ValidateConfig trait and per-service validation
feat: add config structs for services with inline env var config
Migrates all services that already had a
config.rsto useagentd_common::config::load()as the base layer, with legacy environment variables overlaid on top for backward compatibility.Services migrated:
hook—HookConfig::load()reads fromshared.services.hookmonitor—MonitorConfig::load()reads fromshared.services.monitormcp—AgentdMcpConfig::load()reads fromshared.services.mcpmemory—EmbeddingConfig::load()/LanceConfig::load()read fromshared.services.memoryindex—IndexConfig::load()reads fromshared.services.indexui—UiConfig::load()reads fromshared.services.uiAll
from_env()methods are deprecated in favour ofload(). Test modules updated withENV_LOCKto prevent parallel env-var races and handleAGENTD_PORTset by the agentd runtime.Closes #1196