feat(common): add config module with TOML schema and layered loading#1201
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1201 +/- ##
=======================================
Coverage 63.77% 63.77%
=======================================
Files 173 173
Lines 7733 7733
Branches 2566 2614 +48
=======================================
Hits 4932 4932
Misses 2780 2780
Partials 21 21
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
geoffjay
left a comment
There was a problem hiding this comment.
Review: feat(common): add config module with TOML schema and layered loading
Stack position: Based directly on main — this is the root of the config epic. PRs #1195–#1199 are all stacked on top and cannot merge until this one lands. The architecture and any breaking decisions made here propagate to all five children.
The foundation approach is solid — #[serde(default)] on all structs, load_from_path as the testable entry point, Mutex-serialised env var tests — all correct. Three issues need addressing before merge, followed by schema gaps the migration PRs will need to handle.
Blocking — must fix before merge
1. AGENTD_HISTORY_SIZE naming collision between hook and monitor
Both existing services use the same env var name for different things:
| Service | Variable | Meaning | Default |
|---|---|---|---|
crates/hook/src/lib.rs |
AGENTD_HISTORY_SIZE |
Shell event history | 500 |
crates/monitor/src/lib.rs |
AGENTD_HISTORY_SIZE |
Metric snapshot count | 120 |
The shared config (line 800) maps AGENTD_HISTORY_SIZE → services.hook.history_size only. There is no path for operators to independently tune the monitor's snapshot retention, and the migration PRs cannot fully replace both services with this schema as-is.
This needs a decision now — changing env var names post-migration is an operator-visible breaking change. Options:
# Option A — give both services distinct names
AGENTD_HOOK_HISTORY_SIZE → services.hook.history_size
AGENTD_MONITOR_HISTORY_SIZE → services.monitor.history_size (new field needed)
# Option B — keep hook's legacy name, add new one for monitor
AGENTD_HISTORY_SIZE → services.hook.history_size (backward-compat)
AGENTD_MONITOR_HISTORY_SIZE → services.monitor.history_size (new)
Either option requires adding history_size: usize to MonitorConfig.
2. Incomplete env-var reference table — 11 implemented variables missing from the docs
The # Environment Variable Overrides table is the primary operator reference. It lists 26 variables but 11 that are fully implemented in apply_env_overrides are absent:
| Missing from table | Maps to |
|---|---|
AGENTD_RECONCILE_INTERVAL_SECS |
services.orchestrator.reconcile_interval_secs |
AGENTD_COLLECTION_INTERVAL_SECS |
services.monitor.collection_interval_secs |
AGENTD_INDEX_LANGUAGES |
services.index.languages |
AGENTD_NOTIFY_SERVICE_URL |
services.hook.notify_service_url |
AGENTD_MCP_NOTIFY_URL |
services.mcp.notify_url |
AGENTD_MCP_ASK_URL |
services.mcp.ask_url |
AGENTD_MCP_MEMORY_URL |
services.mcp.memory_url |
AGENTD_MCP_COMMUNICATE_URL |
services.mcp.communicate_url |
AGENTD_MCP_WRAP_URL |
services.mcp.wrap_url |
AGENTD_MCP_MONITOR_URL |
services.mcp.monitor_url |
AGENTD_MCP_HOOK_URL |
services.mcp.hook_url |
3. Orchestrator env-var rename introduces a silent breaking change
The existing orchestrator (crates/orchestrator/src/main.rs, line 351) reads:
let communicate_url = env::var("AGENTD_COMMUNICATE_SERVICE_URL")The shared config introduces AGENTD_ORCHESTRATOR_COMMUNICATE_URL instead. When PR #1197 migrates the orchestrator, any operator who has AGENTD_COMMUNICATE_SERVICE_URL in their environment or systemd unit will silently fall back to the compiled default. The rename is fine — but it needs to be explicit and documented, e.g. a note in the env-var table: AGENTD_ORCHESTRATOR_COMMUNICATE_URL (replaces legacy AGENTD_COMMUNICATE_SERVICE_URL).
Schema gaps — required before migration PRs can complete
These fields exist in the current per-service configs but are absent from the shared structs. The migration PRs (#1196/#1197) cannot fully replace the existing configs without them.
HookConfig is missing (from crates/hook/src/config.rs):
pub notify_on_failure: bool, // AGENTD_NOTIFY_ON_FAILURE, default: true
pub notify_on_long_running: bool, // AGENTD_NOTIFY_ON_LONG_RUNNING, default: true
pub long_running_threshold_ms: u64, // AGENTD_LONG_RUNNING_THRESHOLD_MS, default: 30_000MonitorConfig is missing (from crates/monitor/src/lib.rs):
pub cpu_alert_threshold: f64, // AGENTD_CPU_ALERT_THRESHOLD, default: 90.0
pub memory_alert_threshold: f64, // AGENTD_MEMORY_ALERT_THRESHOLD, default: 90.0
pub disk_alert_threshold: f64, // AGENTD_DISK_ALERT_THRESHOLD, default: 90.0
pub history_size: usize, // see naming-collision note above, default: 120IndexConfig is missing (from crates/index/src/config.rs):
pub embedding_endpoint: String, // AGENTD_INDEX_EMBEDDING_ENDPOINT, default: "http://localhost:11434/v1"
pub embedding_api_key: Option<String>, // AGENTD_INDEX_EMBEDDING_API_KEY
pub lance_table: String, // AGENTD_INDEX_LANCE_TABLE, default: "code_chunks"
pub watch_interval_secs: u64, // AGENTD_INDEX_WATCH_INTERVAL, default: 30
pub ignore_patterns: Vec<String>, // AGENTD_INDEX_IGNORE_PATTERNSIf adding these is intentionally deferred to the migration PRs, document the omissions with // TODO(#1196): add remaining fields before migration so the child PRs know what to add and Clippy can't silently miss it.
Non-blocking suggestions
Silent discard of malformed env var values — parse_port("AGENTD_NOTIFY_PORT=abc") returns None and falls back to the default with no diagnostic. A tracing::warn! would help operators catch typos, though logging before the logger is initialised has its own tradeoffs.
Missing test coverage — no tests for AGENTD_RECONCILE_INTERVAL_SECS or AGENTD_COLLECTION_INTERVAL_SECS env overrides, and no test for AGENTD_INDEX_LANGUAGES with spaces ("go, ruby, kotlin") even though the implementation trims them correctly.
merge() "reset to default" limitation — pick treats file == default as "not set in file", so a config file cannot explicitly reset a field to its compiled default value after a prior layer set it differently. This is harmless in practice since base is always AgentdConfig::default() at the call site, but the function signature implies a more general contract. A brief comment noting the assumption (// base is always AgentdConfig::default() in practice) would prevent confusion.
What's correct
#[serde(default)]on all structs — right approach for partial TOMLload_from_path(None)as the test entry point — avoidsAGENTD_CONFIGracesENV_LOCK: Mutex<()>pattern for env-var test serialisationOption::orfornotify_service_urlinmerge()— correct semantics- Comma-split with
.trim()forAGENTD_INDEX_LANGUAGES config_file_path()correctly guards against emptyAGENTD_CONFIGpub mod configinlib.rs— right re-export leveltoml = { workspace = true }— no version drift risk- XDG path resolution via
directories::ProjectDirsfor lance paths
The layered loading logic is sound. Resolve the three blocking items and add the schema fields above, and this is ready to go.
…ilures
Replace silent unwrap_or_default() with unwrap_or_else(|e| { tracing::warn\!(...) })
in all six new service config loaders (ask, communicate, core, notify, orchestrator,
wrap). Restore diagnostic warn\!() calls for invalid AGENTD_WRAP_HISTORY_BYTES and
AGENTD_WRAP_CHANNEL_CAPACITY env vars with TODO comments referencing #1201.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s; add TODO(#1201) comments Replace silent unwrap_or_default() with unwrap_or_else(|e| tracing::warn\!(...)) in all six migrated load() callers (hook, monitor, mcp, memory::EmbeddingConfig, memory::LanceConfig, index, ui). Operators now see a warning when config.toml is malformed rather than silently falling back to compiled defaults. Add TODO(#1201) comments on every hardcoded fallback that is pending a schema addition: notify_on_failure, notify_on_long_running, long_running_threshold_ms (hook); cpu/memory/disk_alert_threshold, history_size (monitor); embedding_endpoint (index.embedding); lance_table (index.lance). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…docs, add missing schema fields
…_file_path indirection - Make cmd_init pub(crate) so tests can call it directly - Replace boolean-literal overwrite-guard tests with real tests that set AGENTD_CONFIG to a tempdir and call cmd_init; verify error message mentions --force on refusal and that the template is written on force-overwrite - Add ENV_LOCK mutex to serialise tests that mutate AGENTD_CONFIG - Remove the config_file_path() local wrapper; import via use instead - Fix cmd_show None arm to reuse the already-resolved path rather than calling config_file_path() a second time Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g tests Add ENV_LOCK to mcp and index config test modules so env-var-mutating tests are serialised in the parallel test harness. Also clear AGENTD_PORT (set by agentd at runtime) before asserting default port values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s; add TODO(#1201) comments Replace silent unwrap_or_default() with unwrap_or_else(|e| tracing::warn\!(...)) in all six migrated load() callers (hook, monitor, mcp, memory::EmbeddingConfig, memory::LanceConfig, index, ui). Operators now see a warning when config.toml is malformed rather than silently falling back to compiled defaults. Add TODO(#1201) comments on every hardcoded fallback that is pending a schema addition: notify_on_failure, notify_on_long_running, long_running_threshold_ms (hook); cpu/memory/disk_alert_threshold, history_size (monitor); embedding_endpoint (index.embedding); lance_table (index.lance). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a ValidateConfig trait to agentd_common with implementations for all 12 shared config sections. Add AgentdConfig::validate() that collects errors from every section. Implement ValidateConfig on all service-level config structs and wire validate() into each service's startup path so misconfigurations fail early with descriptive error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Create dedicated config.rs modules for core, orchestrator, wrap, ask, notify, and communicate. Each struct loads from agentd_common::config::load() first, then overlays legacy AGENTD_* env vars for backward compatibility. All inline env::var() calls in main.rs replaced with config struct access. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add impl ValidateConfig for EmbeddingConfig and LanceConfig in crates/memory/src/config.rs (blocking gap: memory was the only service with no validation at startup) - Wire validation in memory/main.rs: call validate() on embedding and lance configs, load shared config for port/host so port=0 is caught before any I/O with a clear error message - Remove validate_inner() indirection in crates/index/src/config.rs; move logic directly into impl ValidateConfig for IndexConfig and import the trait in index/main.rs and the test module - Make validate_url() pub in agentd-common so service crates can reuse it without duplicating the http/https check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ilures
Replace silent unwrap_or_default() with unwrap_or_else(|e| { tracing::warn\!(...) })
in all six new service config loaders (ask, communicate, core, notify, orchestrator,
wrap). Restore diagnostic warn\!() calls for invalid AGENTD_WRAP_HISTORY_BYTES and
AGENTD_WRAP_CHANNEL_CAPACITY env vars with TODO comments referencing #1201.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create docs/configuration.md with quick start, file location, precedence rules, complete key reference table, per-service sections, migration guide, and four example configs (minimal, development, production, Docker) - Fix the env var table in agentd-common/config.rs to include all overrides (AGENTD_INDEX_LANGUAGES, AGENTD_RECONCILE_INTERVAL_SECS, all MCP URLs, AGENTD_NOTIFY_SERVICE_URL, AGENTD_COLLECTION_INTERVAL_SECS) - Add link to configuration reference from README.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This change is part of the following stack:
Change managed by git-spice. |
docs: add configuration system documentation
feat: add ValidateConfig trait and per-service validation
feat: add config structs for services with inline env var config
refactor: migrate services with existing config.rs to shared TOML config
feat(cli): add config init and config show subcommands
test(common): add integration tests for layered config precedence
Adds agentd_common::config with AgentdConfig TOML schema covering all 12 services, XDG-compliant file discovery, and three-layer precedence loading (defaults < file < env vars).
Summary
configmodule inagentd-commonwithAgentdConfigtop-level structGeneralConfigcoveringlog_level,log_format, andhostconfig_file_path(): checksAGENTD_CONFIGenv var then XDG config dirload()/load_from_path(): layered loading (compiled defaults < TOML file < env vars)apply_env_overrides(): mapsAGENTD_*env vars onto config fieldsMutex-serialised env var tests to prevent parallel racesTest plan
cargo test -p agentd-common— all 23 config tests passcargo fmt -p agentd-common— no formatting changescargo clippy -p agentd-common -- -D warnings— cleanCloses #1194