Skip to content

refactor: migrate services with existing config.rs to shared TOML config#1204

Merged
geoffjay merged 12 commits into
issue-1195from
issue-1196
May 13, 2026
Merged

refactor: migrate services with existing config.rs to shared TOML config#1204
geoffjay merged 12 commits into
issue-1195from
issue-1196

Conversation

@geoffjay
Copy link
Copy Markdown
Owner

Migrates all services that already had a config.rs to use agentd_common::config::load() as the base layer, with legacy environment variables overlaid on top for backward compatibility.

Services migrated:

  • hookHookConfig::load() reads from shared.services.hook
  • monitorMonitorConfig::load() reads from shared.services.monitor
  • mcpAgentdMcpConfig::load() reads from shared.services.mcp
  • memoryEmbeddingConfig::load() / LanceConfig::load() read from shared.services.memory
  • indexIndexConfig::load() reads from shared.services.index
  • uiUiConfig::load() reads from shared.services.ui

All from_env() methods are deprecated in favour of load(). Test modules updated with ENV_LOCK to prevent parallel env-var races and handle AGENTD_PORT set by the agentd runtime.

Closes #1196

@geoffjay geoffjay added the review-agent Used to invoke a review by an agent tracking this label label May 12, 2026
Copy link
Copy Markdown
Owner Author

@geoffjay geoffjay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: refactor: migrate services with existing config.rs to shared TOML config

Stack position: issue-1196issue-1195 (needs-rework) ← issue-1199issue-1194 (needs-rework). The chain is broken two levels up. This PR cannot merge until #1201 and #1203 are reworked and the stack restacks. Reviewing now so the feedback is ready to act on immediately.

The migration pattern — load() as the new entry point, from_env() deprecated and delegating to load(), ENV_LOCK added to tests, remove_var moved before assertions — is consistently applied and correct. One blocking issue: silent config-file errors.


Blocking

unwrap_or_default() swallows malformed config-file errors with no diagnostic

Every migrated service uses the same pattern:

let shared = agentd_common::config::load().unwrap_or_default();

agentd_common::config::load() returns Ok(AgentdConfig::default()) for a missing file (intentional), but returns Err(...) for a malformed file (parse error). unwrap_or_default() treats both cases identically — the service starts on compiled defaults with no indication to the operator that their config.toml was ignored.

Consequence: a user with a typo in config.toml (wrong key name, bad value type, stray character) will see the service start on port 17006 instead of their configured 19006, with no log line explaining why. This is a silent misconfiguration failure.

The fix is one line per service. Since init_tracing() is called before Config::load() in every main.rs, tracing is available:

let shared = agentd_common::config::load().unwrap_or_else(|e| {
    tracing::warn!("failed to load config file, using compiled defaults: {e:#}");
    agentd_common::config::AgentdConfig::default()
});

This applies to all six load() / load_with_base() callers: hook, monitor, mcp, memory::EmbeddingConfig, memory::LanceConfig, and ui. The index service inherits via load_with_base which receives a &SharedIndexConfig already extracted from the shared config, so the warning only needs to be in IndexConfig::load() (which is not shown in the diff — presumably it calls EmbeddingConfig::load_with_base and LanceConfig::load_with_base after its own agentd_common::config::load() call).


Partial migrations — expected, but annotate clearly

Several fields in MonitorConfig and HookConfig still fall back to hardcoded literals because the corresponding fields don't exist in the shared schema yet (flagged in the #1201 review):

Service Fields still hardcoded (pending #1201 schema additions)
monitor cpu_alert_threshold (90.0), memory_alert_threshold (90.0), disk_alert_threshold (90.0), history_size (120)
hook notify_on_failure (true), notify_on_long_running (true), long_running_threshold_ms (30_000)
index.embedding endpoint ("http://localhost:11434/v1"), api_key
index.lance table ("code_chunks")

This is the correct approach given the current state of the shared schema. Add a // TODO(#1201): migrate when shared schema adds <field> comment on each hardcoded fallback so the follow-up is obvious when #1201 lands.


Non-blocking observations

1. Config file parsed twice for memory

EmbeddingConfig::load() and LanceConfig::load() each call agentd_common::config::load() independently. From memory/main.rs:

let lance_config = LanceConfig::load();       // → parses config.toml
let embedding_config = EmbeddingConfig::load(); // → parses config.toml again

The file is read and deserialised twice on startup. Consider adding MemoryConfig::load() -> (LanceConfig, EmbeddingConfig) that calls agentd_common::config::load() once and passes the section to both sub-configs (mirroring the load_with_base pattern already used in the index crate).

2. UiConfig derives Debug, Clone — minor scope addition

Not required for the migration, but harmless and likely needed. Fine to keep; just note it in the commit message or PR description so it doesn't get lost.

3. Index test env-var save/restore doesn't protect against panics

env::remove_var("AGENTD_PORT");
env::remove_var("AGENTD_INDEX_PORT");

let config = IndexConfig::from_env();   // ← if this panics, restore code never runs

if let Some(v) = port_saved { env::set_var("AGENTD_PORT", v); }

If from_env() panics, the vars are left unset for subsequent tests. Since ENV_LOCK is held, no other test will run concurrently, but the next test that acquires the lock will see absent vars. The risk is low (from_env is unlikely to panic), and the pattern matches the rest of the test suite. Noting for awareness.


What's correct

  • load()from_env() deprecation chain — correct; delegates cleanly ✓
  • #[deprecated] + #[allow(deprecated)] in tests — applied consistently across all six crates ✓
  • MCP remove_var moved before assertion — correct cleanup-before-assert pattern ✓
  • ENV_LOCK added to index and mcp test modules — fixes a real concurrency issue ✓
  • Precedence is preserved: AGENTD_PORT (direct env) overrides base.port (shared config) ✓
  • Backward-compatible: AGENTD_ORCHESTRATOR_URL for mcp still works as highest-priority override ✓
  • mcp/Cargo.toml adds agentd-common dependency correctly ✓
  • Only main.rs call sites updated to load() — correct; the deprecated from_env() wrapper ensures existing callers still compile ✓

Summary: Add a tracing::warn! to the unwrap_or_default() call in every migrated load() function, and add // TODO(#1201) comments on the hardcoded fallbacks. Everything else is clean. Hold for parent chain restack.

@geoffjay geoffjay added needs-rework PR has review feedback that must be addressed before merging and removed review-agent Used to invoke a review by an agent tracking this label labels May 12, 2026
@geoffjay geoffjay linked an issue May 12, 2026 that may be closed by this pull request
6 tasks
@geoffjay geoffjay added review-agent Used to invoke a review by an agent tracking this label and removed needs-rework PR has review feedback that must be addressed before merging labels May 12, 2026
geoffjay and others added 9 commits May 12, 2026 16:29
…g tests

Add ENV_LOCK to mcp and index config test modules so env-var-mutating
tests are serialised in the parallel test harness. Also clear AGENTD_PORT
(set by agentd at runtime) before asserting default port values.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s; add TODO(#1201) comments

Replace silent unwrap_or_default() with unwrap_or_else(|e| tracing::warn\!(...)) in
all six migrated load() callers (hook, monitor, mcp, memory::EmbeddingConfig,
memory::LanceConfig, index, ui). Operators now see a warning when config.toml is
malformed rather than silently falling back to compiled defaults.

Add TODO(#1201) comments on every hardcoded fallback that is pending a schema
addition: notify_on_failure, notify_on_long_running, long_running_threshold_ms
(hook); cpu/memory/disk_alert_threshold, history_size (monitor);
embedding_endpoint (index.embedding); lance_table (index.lance).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a ValidateConfig trait to agentd_common with implementations for all
12 shared config sections. Add AgentdConfig::validate() that collects
errors from every section. Implement ValidateConfig on all service-level
config structs and wire validate() into each service's startup path so
misconfigurations fail early with descriptive error messages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Create dedicated config.rs modules for core, orchestrator, wrap, ask,
notify, and communicate. Each struct loads from agentd_common::config::load()
first, then overlays legacy AGENTD_* env vars for backward compatibility.
All inline env::var() calls in main.rs replaced with config struct access.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add impl ValidateConfig for EmbeddingConfig and LanceConfig in
  crates/memory/src/config.rs (blocking gap: memory was the only service
  with no validation at startup)
- Wire validation in memory/main.rs: call validate() on embedding and
  lance configs, load shared config for port/host so port=0 is caught
  before any I/O with a clear error message
- Remove validate_inner() indirection in crates/index/src/config.rs;
  move logic directly into impl ValidateConfig for IndexConfig and
  import the trait in index/main.rs and the test module
- Make validate_url() pub in agentd-common so service crates can reuse
  it without duplicating the http/https check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ilures

Replace silent unwrap_or_default() with unwrap_or_else(|e| { tracing::warn\!(...) })
in all six new service config loaders (ask, communicate, core, notify, orchestrator,
wrap). Restore diagnostic warn\!() calls for invalid AGENTD_WRAP_HISTORY_BYTES and
AGENTD_WRAP_CHANNEL_CAPACITY env vars with TODO comments referencing #1201.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create docs/configuration.md with quick start, file location, precedence
  rules, complete key reference table, per-service sections, migration guide,
  and four example configs (minimal, development, production, Docker)
- Fix the env var table in agentd-common/config.rs to include all overrides
  (AGENTD_INDEX_LANGUAGES, AGENTD_RECONCILE_INTERVAL_SECS, all MCP URLs,
  AGENTD_NOTIFY_SERVICE_URL, AGENTD_COLLECTION_INTERVAL_SECS)
- Add link to configuration reference from README.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
geoffjay added 3 commits May 12, 2026 16:59
docs: add configuration system documentation
feat: add ValidateConfig trait and per-service validation
feat: add config structs for services with inline env var config
@geoffjay geoffjay merged commit 35d64e4 into issue-1195 May 13, 2026
@geoffjay geoffjay deleted the issue-1196 branch May 13, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-agent Used to invoke a review by an agent tracking this label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate services with existing config.rs to shared TOML config

1 participant