Skip to content

feat: add ValidateConfig trait and per-service validation#1206

Merged
geoffjay merged 5 commits into
issue-1197from
issue-1198
May 13, 2026
Merged

feat: add ValidateConfig trait and per-service validation#1206
geoffjay merged 5 commits into
issue-1197from
issue-1198

Conversation

@geoffjay
Copy link
Copy Markdown
Owner

Adds a ValidateConfig trait to agentd_common and implements it across all 12 service configuration sections. Wires cfg.validate()? into every service's startup so misconfigurations surface immediately with descriptive error messages before any ports are bound.

Changes:

  • agentd_common::config::ValidateConfig trait with fn validate(&self) -> Result<()>
  • AgentdConfig::validate() that collects errors from all 12 sections (does not short-circuit)
  • impl ValidateConfig on all shared config section structs (port > 0, backend enum, URL format, alert thresholds, etc.)
  • impl ValidateConfig on all 11 service-level config structs in their respective crates
  • cfg.validate()? called in main() of: core, orchestrator, wrap, ask, notify, communicate, hook, monitor, mcp, ui, index
  • Unit tests for both valid and invalid cases per service

Closes #1198

@geoffjay geoffjay added the review-agent Used to invoke a review by an agent tracking this label label May 12, 2026
Copy link
Copy Markdown
Owner Author

@geoffjay geoffjay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat: add ValidateConfig trait and per-service validation

Stack Position

This PR (issue-1198) is stacked on issue-1197 (#1205), which currently carries needs-rework. The chain below:

issue-1198 (this PR)
  └── issue-1197 (#1205) [needs-rework]
        └── issue-1196 (#1204) [needs-rework]
              └── issue-1195 (#1203) [needs-rework]
                    └── issue-1199 (#1202)
                          └── issue-1194 (#1201) [needs-rework]

This PR cannot merge until the chain below it is resolved. Review comments below are against this PR's own changes in isolation.


Blocking Issues

1. Memory service entirely absent — the only service with no validation wired

The PR description says cfg.validate()? is called in main() for 11 services and "all 12 service configuration sections" have impls. Memory is the 12th section: impl ValidateConfig for MemoryConfig is present in crates/common/src/config.rs and AgentdConfig::validate() calls self.services.memory.validate() — but crates/memory/src/main.rs is not in this diff at all.

Looking at the current memory/main.rs, it:

  • Reads AGENTD_HOST/AGENTD_PORT directly via inline env::var() calls (no MemoryConfig load/validate struct)
  • Loads EmbeddingConfig::from_env() and LanceConfig::from_env() separately
  • Has no validate() call of any kind at startup

Neither EmbeddingConfig nor LanceConfig implement ValidateConfig, and there is no unified MemoryConfig struct in the service crate to hang one on. This means:

  • A bad AGENTD_MEMORY_EMBEDDING_PROVIDER value is silently accepted until the first embedding call fails deep inside the request path
  • Port=0 will produce a kernel-level bind error rather than the clear startup message this PR aims to provide for every other service

Required fix: add ValidateConfig to EmbeddingConfig (and optionally LanceConfig) in crates/memory/src/config.rs, and call validate on both at the top of memory/main() before the vector store is initialised. Alternatively, introduce a MemoryConfig wrapper struct that holds both, similar to the pattern in other services.

Minimum validator for EmbeddingConfig:

impl ValidateConfig for EmbeddingConfig {
    fn validate(&self) -> Result<()> {
        match self.provider.as_str() {
            "none" | "openai" | "ollama" => {}
            other => bail!(
                "memory.embedding_provider must be one of none, openai, ollama; got: {other}"
            ),
        }
        Ok(())
    }
}

Non-Blocking Issues

2. Unnecessary validate_inner() indirection in crates/index/src/config.rs

The existing IndexConfig::validate() inherent method is kept and both it and the new ValidateConfig trait impl delegate to a private validate_inner():

pub fn validate(&self) -> Result<()> { self.validate_inner() }
fn validate_inner(&self) -> Result<()> { /* actual logic */ }
impl ValidateConfig for IndexConfig {
    fn validate(&self) -> Result<()> { self.validate_inner() }
}

The indirection exists only to avoid a method name conflict between the inherent validate() and the trait validate(). The simplest resolution would be to remove the inherent method entirely and require callers to use the trait (they already do in main.rs via use ValidateConfig). Or, if keeping the inherent method for backwards compatibility, at least document why the three-layer indirection exists. As-is it reads like accidental duplication.

3. validate_url helper is private; service-local impls inline the same check

validate_url() in crates/common/src/config.rs is a private free function, so service crates cannot import it. As a result, crates/orchestrator/src/config.rs and crates/ask/src/config.rs each inline the same starts_with("http://") || starts_with("https://") check. Consider making it pub to allow reuse, or accept the duplication — but note it will grow as more service-local impls add URL fields. Not a blocker.


What's Working Well

  • Collect-all-errors in AgentdConfig::validate(): building the checks array evaluates all 12 validate() calls eagerly before any errors are examined — operators see every problem in one startup failure, not one at a time. Excellent UX.
  • validate_port / validate_url helpers reduce duplication across the 12 shared impls.
  • Validation before port binding: every wired main() calls cfg.validate()? before TcpListener::bind(), which is the correct ordering.
  • Monitor threshold boundary tests (test_validate_boundary_thresholds_pass, test_validate_cpu_threshold_above_100_fails, negative threshold) provide good edge-case coverage.
  • test_validate_all_valid_backends_pass loop in wrap tests all four backend values exhaustively.
  • ENV_LOCK + save/restore/remove-before-assert pattern consistently applied in MCP tests.

Summary

One blocking gap: the memory service has no ValidateConfig implementation on its service-local config structs and no validate()? call at startup, leaving it as the only service where a misconfigured AGENTD_MEMORY_EMBEDDING_PROVIDER or port=0 silently passes the startup check. Fix the memory gap and this PR is in good shape (modulo needing a restack once the parent chain resolves).

@geoffjay geoffjay added needs-rework PR has review feedback that must be addressed before merging and removed review-agent Used to invoke a review by an agent tracking this label labels May 12, 2026
@geoffjay geoffjay linked an issue May 12, 2026 that may be closed by this pull request
7 tasks
@geoffjay geoffjay added review-agent Used to invoke a review by an agent tracking this label and removed needs-rework PR has review feedback that must be addressed before merging labels May 12, 2026
Copy link
Copy Markdown
Owner Author

@geoffjay geoffjay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat: add ValidateConfig trait and per-service validation (rework)

Stack Position

This PR (issue-1198) is stacked in the config epic chain:

issue-1200 (#1207)
  └── issue-1198 (this PR)
        └── issue-1197 (#1205) [needs-rework]
              └── issue-1196 (#1204) [needs-rework]
                    └── issue-1195 (#1203) [needs-rework]
                          └── issue-1199 (#1202)
                                └── issue-1194 (#1201) [needs-rework]

This PR cannot merge until the chain below it is resolved. The changes here are approvable in isolation.

Note: PR #1212 (issue-1198-2) was a duplicate rework branch that has been closed. The canonical changes are here on issue-1198.


Feedback Addressed

All three blocking issues from the previous review are resolved:

1. Memory service now fully covered

crates/memory/src/config.rs adds:

  • impl ValidateConfig for EmbeddingConfig — validates provider is one of none, openai, ollama
  • impl ValidateConfig for LanceConfig — validates path and table are non-empty

crates/memory/src/main.rs wires validation correctly:

let shared = agentd_common::config::load()?;
shared.services.memory.validate()?;   // validates shared port + embedding_provider

let lance_config = LanceConfig::load();
let embedding_config = EmbeddingConfig::load();
lance_config.validate()?;
embedding_config.validate()?;

The host/port binding is now sourced from the shared config (shared.general.host, shared.services.memory.port) rather than inline env::var() calls — consistent with all other services.

2. validate_inner() indirection removed from crates/index/src/config.rs

The inherent validate() method is gone. Logic now lives directly in impl ValidateConfig for IndexConfig. The test module correctly adds use agentd_common::config::ValidateConfig to pull the trait into scope.

3. validate_url() made pub

The helper is now pub fn validate_url(url: &str, field: &str) with a doc comment explaining its intended use by service crates.


Non-Blocking Observation

memory/main.rs uses agentd_common::config::load()? (hard-fail on malformed config file), while the other services in #1204 and #1205 use unwrap_or_default() (silent fallback — flagged in those PRs as needing unwrap_or_else + tracing::warn!). Memory's strict approach is fine and arguably more correct for a production service; just note that there will be a behavioural inconsistency across services until #1204/#1205 are reworked. No change needed here — this is tracked by the parent PR reviews.


Test Coverage

Memory tests are thorough:

  • EmbeddingConfig: default passes; none/openai/ollama each pass; huggingface fails with message containing the invalid value; empty string fails
  • LanceConfig: default passes; empty path fails with "path" in the message; empty table fails with "table"; custom values pass

The error-message assertions (assert!(err.to_string().contains("huggingface"))) are good practice — they verify the diagnostic quality, not just that an error occurred.


Summary

All three blocking issues resolved. The memory gap is fully closed: EmbeddingConfig and LanceConfig both have ValidateConfig impls with tests, and memory/main.rs validates before any I/O. The validate_inner() layering is gone. validate_url() is exported for reuse. This PR is approved pending resolution of the parent chain.

@geoffjay geoffjay added merge-queue Approved by reviewer, queued for merge by conductor and removed review-agent Used to invoke a review by an agent tracking this label labels May 12, 2026
geoffjay and others added 4 commits May 12, 2026 16:29
Add a ValidateConfig trait to agentd_common with implementations for all
12 shared config sections. Add AgentdConfig::validate() that collects
errors from every section. Implement ValidateConfig on all service-level
config structs and wire validate() into each service's startup path so
misconfigurations fail early with descriptive error messages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add impl ValidateConfig for EmbeddingConfig and LanceConfig in
  crates/memory/src/config.rs (blocking gap: memory was the only service
  with no validation at startup)
- Wire validation in memory/main.rs: call validate() on embedding and
  lance configs, load shared config for port/host so port=0 is caught
  before any I/O with a clear error message
- Remove validate_inner() indirection in crates/index/src/config.rs;
  move logic directly into impl ValidateConfig for IndexConfig and
  import the trait in index/main.rs and the test module
- Make validate_url() pub in agentd-common so service crates can reuse
  it without duplicating the http/https check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create docs/configuration.md with quick start, file location, precedence
  rules, complete key reference table, per-service sections, migration guide,
  and four example configs (minimal, development, production, Docker)
- Fix the env var table in agentd-common/config.rs to include all overrides
  (AGENTD_INDEX_LANGUAGES, AGENTD_RECONCILE_INTERVAL_SECS, all MCP URLs,
  AGENTD_NOTIFY_SERVICE_URL, AGENTD_COLLECTION_INTERVAL_SECS)
- Add link to configuration reference from README.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs: add configuration system documentation
@geoffjay geoffjay merged commit 0e3bd79 into issue-1197 May 13, 2026
@geoffjay geoffjay deleted the issue-1198 branch May 13, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-queue Approved by reviewer, queued for merge by conductor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add config validation for all service sections

1 participant