Skip to content

feat(orchestrator): implement skill materialization during agent spawn#1229

Open
geoffjay wants to merge 2 commits into
issue-1210from
issue-1211
Open

feat(orchestrator): implement skill materialization during agent spawn#1229
geoffjay wants to merge 2 commits into
issue-1210from
issue-1211

Conversation

@geoffjay
Copy link
Copy Markdown
Owner

Adds materialize_skills() to write skill files into an agent's .claude/skills/ directory before the Claude process is launched.

Changes:

  • MaterializeResult struct tracks which skills were written, skipped (existing file), or not found
  • materialize_skills() writes <working_dir>/.claude/skills/<name>/SKILL.md for each assigned skill
  • Agent-local skill files take precedence — existing files are skipped, not overwritten
  • Integrated into spawn_agent() between session creation and build_claude_command()
  • Missing skills produce a warn! log, not a hard failure — agent still launches

Worktree approach: Skills are written to the source working_dir pre-launch. Because .claude/ is typically in .gitignore, worktree agents discover skills via the parent directory already wired through additional_dirs.

Closes #1211

geoffjay added 2 commits May 13, 2026 09:55
- Add MaterializeResult struct (written/skipped/not_found)
- Add materialize_skills() async fn in skills.rs: writes skill content
  to <working_dir>/.claude/skills/<name>/SKILL.md before agent launch
- Skips files that already exist (agent-local skills take precedence)
- Integrate into spawn_agent() after session create, before build_claude_command
- Missing skills produce warning, not hard failure
- Logs written/skipped counts at info level, missing at warn level

Closes #1211
@geoffjay geoffjay added the review-agent Used to invoke a review by an agent tracking this label label May 13, 2026
Copy link
Copy Markdown
Owner Author

@geoffjay geoffjay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(orchestrator): implement skill materialization during agent spawn

Stack Position

Branch issue-1211 is stacked on issue-1210 (PR #1228), which is itself stacked on epic-1209 (PR #1226). Full chain: main ← epic-1209 ← issue-1210 ← issue-1211.

This PR cannot merge until #1226 and #1228 both clear. Status at review time:

  • PR #1226: re-submitted for re-review (review-agent label set) — blocking
  • PR #1228: reviewed with comment, merge-queue held pending #1226

The code in this PR is reviewed in isolation below.


Non-Blocking: target_file.exists() Is Synchronous in an Async Function

materialize_skills is async and uses tokio::fs for all other I/O, but the "skip if already exists" check uses the blocking std::path::Path::exists():

// skills.rs ~line 104 — synchronous stat call in async context:
if target_file.exists() {
    result.skipped.push(name.clone());
    continue;
}

// vs. the async calls immediately below:
tokio::fs::create_dir_all(&target_dir).await?;
tokio::fs::write(&target_file, &skill.content).await?;

For the typical .agentd/skills/ directory this is negligible, but it's an inconsistency that could mask problems on slow filesystems or network mounts. Consistent fix:

if tokio::fs::try_exists(&target_file).await.unwrap_or(false) {
    result.skipped.push(name.clone());
    continue;
}

try_exists is preferred over metadata().is_ok() because it returns Ok(false) for a missing file rather than Err, and the .unwrap_or(false) correctly falls through to a write attempt if the stat itself fails.


Non-Blocking: I/O Write Errors Land in not_found

MaterializeResult has three buckets: written, skipped, not_found. In materialize_skills, the not-found path is correct — skill names with no matching entry in discovered_skills go to not_found. But the ? propagation on create_dir_all and write would bubble the error up and abort the entire call, not route to not_found.

More importantly, if a future refactor changes the write errors to be swallowed and categorized, they'd naturally fall into not_found — but a skill that was found and failed to write is categorically different from a skill that was never discovered. A failed field would make this distinction explicit and is cheap to add:

#[derive(Debug, Default, PartialEq)]
pub struct MaterializeResult {
    pub written: Vec<String>,
    pub skipped: Vec<String>,
    pub not_found: Vec<String>,
    pub failed: Vec<String>,   // found but couldn't be written
}

The integration in manager.rs already logs not_found names as "skills not found" — having a separate failed vector would give operators the right signal when skill files exist but can't be materialized (permissions, disk full, etc.).


Non-Blocking: discover_all_skills() Is Synchronous Inside Async spawn_agent

spawn_agent in manager.rs is async, but discover_all_skills() uses std::fs::read_dir throughout. The call is:

let discovered = crate::skills::discover_all_skills();  // blocking sync
match crate::skills::materialize_skills(...).await {

Same observation as PR #1226 review — for the typical small skills directory this is fine in practice. Worth noting as a future optimization (pass discovered skills in from the caller, or spawn via tokio::task::spawn_blocking). Not worth blocking the PR on.


What's Working Well

  • Soft failure is correct: Err(e) => { warn!(...); } — agent still starts even if skill materialization fails. This is the right tradeoff; a missing skill should not prevent an agent from launching.
  • Guard avoids unnecessary work: if !agent.config.skills.is_empty() skips both discovery and materialization when no skills are requested.
  • Placement in spawn_agent: materialization after create_session but before build_claude_command is correct — the working directory exists at that point, and skills must be present before the subprocess starts.
  • Target path convention: .claude/skills/<name>/SKILL.md is consistent with the Claude Code skill directory layout.
  • Tracing fields: %result.written.len(), %result.skipped.len(), %result.not_found.len() use % (Display) for counts and ? (Debug) for the string vecs — correct.
  • HashMap index in materialize_skills: building a name → &Skill map before iterating skill_names is O(n) rather than O(n²) — correct choice.
  • MaterializeResult derives: Default + PartialEq are the right set; no Clone needed since the caller doesn't retain it.

Summary

Two non-blocking observations: the target_file.exists() sync/async inconsistency (swap for tokio::fs::try_exists) and the missing failed bucket in MaterializeResult for write errors distinct from discovery failures. The synchronous discover_all_skills() in an async context is a pre-existing pattern noted in #1226 and tracked separately.

The integration is correct and the soft-failure behavior is the right call. This PR is waiting on #1226#1228 to clear before it can merge; no action needed here beyond that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-queue Approved by reviewer, queued for merge by conductor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement skill materialization during agent spawn

1 participant