feat: add ValidateConfig trait and per-service validation#1206
Conversation
geoffjay
left a comment
There was a problem hiding this comment.
Review: feat: add ValidateConfig trait and per-service validation
Stack Position
This PR (issue-1198) is stacked on issue-1197 (#1205), which currently carries needs-rework. The chain below:
issue-1198 (this PR)
└── issue-1197 (#1205) [needs-rework]
└── issue-1196 (#1204) [needs-rework]
└── issue-1195 (#1203) [needs-rework]
└── issue-1199 (#1202)
└── issue-1194 (#1201) [needs-rework]
This PR cannot merge until the chain below it is resolved. Review comments below are against this PR's own changes in isolation.
Blocking Issues
1. Memory service entirely absent — the only service with no validation wired
The PR description says cfg.validate()? is called in main() for 11 services and "all 12 service configuration sections" have impls. Memory is the 12th section: impl ValidateConfig for MemoryConfig is present in crates/common/src/config.rs and AgentdConfig::validate() calls self.services.memory.validate() — but crates/memory/src/main.rs is not in this diff at all.
Looking at the current memory/main.rs, it:
- Reads
AGENTD_HOST/AGENTD_PORTdirectly via inlineenv::var()calls (noMemoryConfigload/validate struct) - Loads
EmbeddingConfig::from_env()andLanceConfig::from_env()separately - Has no
validate()call of any kind at startup
Neither EmbeddingConfig nor LanceConfig implement ValidateConfig, and there is no unified MemoryConfig struct in the service crate to hang one on. This means:
- A bad
AGENTD_MEMORY_EMBEDDING_PROVIDERvalue is silently accepted until the first embedding call fails deep inside the request path - Port=0 will produce a kernel-level bind error rather than the clear startup message this PR aims to provide for every other service
Required fix: add ValidateConfig to EmbeddingConfig (and optionally LanceConfig) in crates/memory/src/config.rs, and call validate on both at the top of memory/main() before the vector store is initialised. Alternatively, introduce a MemoryConfig wrapper struct that holds both, similar to the pattern in other services.
Minimum validator for EmbeddingConfig:
impl ValidateConfig for EmbeddingConfig {
fn validate(&self) -> Result<()> {
match self.provider.as_str() {
"none" | "openai" | "ollama" => {}
other => bail!(
"memory.embedding_provider must be one of none, openai, ollama; got: {other}"
),
}
Ok(())
}
}Non-Blocking Issues
2. Unnecessary validate_inner() indirection in crates/index/src/config.rs
The existing IndexConfig::validate() inherent method is kept and both it and the new ValidateConfig trait impl delegate to a private validate_inner():
pub fn validate(&self) -> Result<()> { self.validate_inner() }
fn validate_inner(&self) -> Result<()> { /* actual logic */ }
impl ValidateConfig for IndexConfig {
fn validate(&self) -> Result<()> { self.validate_inner() }
}The indirection exists only to avoid a method name conflict between the inherent validate() and the trait validate(). The simplest resolution would be to remove the inherent method entirely and require callers to use the trait (they already do in main.rs via use ValidateConfig). Or, if keeping the inherent method for backwards compatibility, at least document why the three-layer indirection exists. As-is it reads like accidental duplication.
3. validate_url helper is private; service-local impls inline the same check
validate_url() in crates/common/src/config.rs is a private free function, so service crates cannot import it. As a result, crates/orchestrator/src/config.rs and crates/ask/src/config.rs each inline the same starts_with("http://") || starts_with("https://") check. Consider making it pub to allow reuse, or accept the duplication — but note it will grow as more service-local impls add URL fields. Not a blocker.
What's Working Well
- Collect-all-errors in
AgentdConfig::validate(): building thechecksarray evaluates all 12validate()calls eagerly before any errors are examined — operators see every problem in one startup failure, not one at a time. Excellent UX. validate_port/validate_urlhelpers reduce duplication across the 12 shared impls.- Validation before port binding: every wired
main()callscfg.validate()?beforeTcpListener::bind(), which is the correct ordering. - Monitor threshold boundary tests (
test_validate_boundary_thresholds_pass,test_validate_cpu_threshold_above_100_fails, negative threshold) provide good edge-case coverage. test_validate_all_valid_backends_passloop in wrap tests all four backend values exhaustively.- ENV_LOCK + save/restore/remove-before-assert pattern consistently applied in MCP tests.
Summary
One blocking gap: the memory service has no ValidateConfig implementation on its service-local config structs and no validate()? call at startup, leaving it as the only service where a misconfigured AGENTD_MEMORY_EMBEDDING_PROVIDER or port=0 silently passes the startup check. Fix the memory gap and this PR is in good shape (modulo needing a restack once the parent chain resolves).
geoffjay
left a comment
There was a problem hiding this comment.
Review: feat: add ValidateConfig trait and per-service validation (rework)
Stack Position
This PR (issue-1198) is stacked in the config epic chain:
issue-1200 (#1207)
└── issue-1198 (this PR)
└── issue-1197 (#1205) [needs-rework]
└── issue-1196 (#1204) [needs-rework]
└── issue-1195 (#1203) [needs-rework]
└── issue-1199 (#1202)
└── issue-1194 (#1201) [needs-rework]
This PR cannot merge until the chain below it is resolved. The changes here are approvable in isolation.
Note: PR #1212 (issue-1198-2) was a duplicate rework branch that has been closed. The canonical changes are here on issue-1198.
Feedback Addressed
All three blocking issues from the previous review are resolved:
1. Memory service now fully covered ✓
crates/memory/src/config.rs adds:
impl ValidateConfig for EmbeddingConfig— validatesprovideris one ofnone,openai,ollamaimpl ValidateConfig for LanceConfig— validatespathandtableare non-empty
crates/memory/src/main.rs wires validation correctly:
let shared = agentd_common::config::load()?;
shared.services.memory.validate()?; // validates shared port + embedding_provider
let lance_config = LanceConfig::load();
let embedding_config = EmbeddingConfig::load();
lance_config.validate()?;
embedding_config.validate()?;The host/port binding is now sourced from the shared config (shared.general.host, shared.services.memory.port) rather than inline env::var() calls — consistent with all other services.
2. validate_inner() indirection removed from crates/index/src/config.rs ✓
The inherent validate() method is gone. Logic now lives directly in impl ValidateConfig for IndexConfig. The test module correctly adds use agentd_common::config::ValidateConfig to pull the trait into scope.
3. validate_url() made pub ✓
The helper is now pub fn validate_url(url: &str, field: &str) with a doc comment explaining its intended use by service crates.
Non-Blocking Observation
memory/main.rs uses agentd_common::config::load()? (hard-fail on malformed config file), while the other services in #1204 and #1205 use unwrap_or_default() (silent fallback — flagged in those PRs as needing unwrap_or_else + tracing::warn!). Memory's strict approach is fine and arguably more correct for a production service; just note that there will be a behavioural inconsistency across services until #1204/#1205 are reworked. No change needed here — this is tracked by the parent PR reviews.
Test Coverage
Memory tests are thorough:
EmbeddingConfig: default passes;none/openai/ollamaeach pass;huggingfacefails with message containing the invalid value; empty string failsLanceConfig: default passes; emptypathfails with "path" in the message; emptytablefails with "table"; custom values pass
The error-message assertions (assert!(err.to_string().contains("huggingface"))) are good practice — they verify the diagnostic quality, not just that an error occurred.
Summary
All three blocking issues resolved. The memory gap is fully closed: EmbeddingConfig and LanceConfig both have ValidateConfig impls with tests, and memory/main.rs validates before any I/O. The validate_inner() layering is gone. validate_url() is exported for reuse. This PR is approved pending resolution of the parent chain.
Add a ValidateConfig trait to agentd_common with implementations for all 12 shared config sections. Add AgentdConfig::validate() that collects errors from every section. Implement ValidateConfig on all service-level config structs and wire validate() into each service's startup path so misconfigurations fail early with descriptive error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add impl ValidateConfig for EmbeddingConfig and LanceConfig in crates/memory/src/config.rs (blocking gap: memory was the only service with no validation at startup) - Wire validation in memory/main.rs: call validate() on embedding and lance configs, load shared config for port/host so port=0 is caught before any I/O with a clear error message - Remove validate_inner() indirection in crates/index/src/config.rs; move logic directly into impl ValidateConfig for IndexConfig and import the trait in index/main.rs and the test module - Make validate_url() pub in agentd-common so service crates can reuse it without duplicating the http/https check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create docs/configuration.md with quick start, file location, precedence rules, complete key reference table, per-service sections, migration guide, and four example configs (minimal, development, production, Docker) - Fix the env var table in agentd-common/config.rs to include all overrides (AGENTD_INDEX_LANGUAGES, AGENTD_RECONCILE_INTERVAL_SECS, all MCP URLs, AGENTD_NOTIFY_SERVICE_URL, AGENTD_COLLECTION_INTERVAL_SECS) - Add link to configuration reference from README.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs: add configuration system documentation
Adds a
ValidateConfigtrait toagentd_commonand implements it across all 12 service configuration sections. Wirescfg.validate()?into every service's startup so misconfigurations surface immediately with descriptive error messages before any ports are bound.Changes:
agentd_common::config::ValidateConfigtrait withfn validate(&self) -> Result<()>AgentdConfig::validate()that collects errors from all 12 sections (does not short-circuit)impl ValidateConfigon all shared config section structs (port > 0, backend enum, URL format, alert thresholds, etc.)impl ValidateConfigon all 11 service-level config structs in their respective cratescfg.validate()?called inmain()of: core, orchestrator, wrap, ask, notify, communicate, hook, monitor, mcp, ui, indexCloses #1198