Skip to content

Comments

feat(delegate): File-based agent definitions with markdown + YAML frontmatter#2183

Open
VascoSch92 wants to merge 15 commits intomainfrom
vasco/issue-2049
Open

feat(delegate): File-based agent definitions with markdown + YAML frontmatter#2183
VascoSch92 wants to merge 15 commits intomainfrom
vasco/issue-2049

Conversation

@VascoSch92
Copy link
Contributor

@VascoSch92 VascoSch92 commented Feb 23, 2026

Summary

This PR introduces the ability to define and load agents via Markdown files and performs a structural refactor to centralize agent-loading logic. (ref #2049 )

Key Features

  • Markdown Agent Definitions: Added support for loading agent configurations directly from .md files.
  • Loading Hierarchy: Implemented a file-loading priority consistent with the existing Skills hierarchy.

Architectural Refactor
Moved all logic related to agent discovery and instantiation from plugin and tools.delegate into a dedicated subagents module.

Remarks
This PR focuses on the core infrastructure. Several advanced features for Markdown-based agents are currently unsupported and are being tracked in the primary Epic #2054.

docs
OpenHands/docs#358

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:8e214f1-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-8e214f1-python \
  ghcr.io/openhands/agent-server:8e214f1-python

All tags pushed for this build

ghcr.io/openhands/agent-server:8e214f1-golang-amd64
ghcr.io/openhands/agent-server:8e214f1-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:8e214f1-golang-arm64
ghcr.io/openhands/agent-server:8e214f1-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:8e214f1-java-amd64
ghcr.io/openhands/agent-server:8e214f1-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:8e214f1-java-arm64
ghcr.io/openhands/agent-server:8e214f1-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:8e214f1-python-amd64
ghcr.io/openhands/agent-server:8e214f1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:8e214f1-python-arm64
ghcr.io/openhands/agent-server:8e214f1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:8e214f1-golang
ghcr.io/openhands/agent-server:8e214f1-java
ghcr.io/openhands/agent-server:8e214f1-python

About Multi-Architecture Support

  • Each variant tag (e.g., 8e214f1-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 8e214f1-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

API breakage checks (Griffe)

Result: Passed

Action log

all-hands-bot

This comment was marked as outdated.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk
   __init__.py21290%75–76
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py3432193%283, 288, 316, 359, 377, 390, 455, 604–605, 608, 754, 762, 764, 775, 777–779, 804, 966, 973–974
openhands-sdk/openhands/sdk/plugin
   plugin.py2051692%408–409, 416–417, 434–435, 438–440, 458–460, 476–477, 495–496
   types.py198796%59, 62, 65, 121, 246, 662, 670
openhands-sdk/openhands/sdk/subagent
   load.py37294%154–155
   registry.py101397%79, 188–189
   schema.py41197%88
openhands-tools/openhands/tools
   __init__.py12283%30–31
openhands-tools/openhands/tools/delegate
   definition.py27774%81–82, 85, 88–89, 98, 101
   impl.py12010115%29, 31–32, 41–42, 46, 52–53, 56–59, 61, 73–74, 78–80, 84–85, 92–94, 103–104, 114–118, 122, 124, 128–130, 135–137, 139, 145, 148, 153, 158, 162, 167–169, 186–187, 194–196, 205, 207–209, 212–217, 219, 226–227, 229–230, 233–236, 238–239, 243–246, 249–251, 256–257, 260–261, 264, 266–270, 272, 275–277, 279–280, 283, 285, 290–292
TOTAL18943567470% 

all-hands-bot

This comment was marked as outdated.

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean refactoring with solid architecture.

Key strengths:

  • Right data structures: clean separation (schema, load, registry)
  • Solves real problem: multi-source agent registration with clear priority rules
  • No unnecessary complexity: straightforward loading and deduplication logic
  • Well-tested: comprehensive coverage of priority logic, edge cases, and error paths

Architecture is sound:

  • Moving agent definitions to SDK makes sense (tools should use SDK, not define core abstractions)
  • File-based loading with .agents/ priority over .openhands/ is pragmatic
  • Error handling is adequate (try/except with logging in load.py:136-139)

One future improvement: Integration test for LocalConversation._ensure_plugins_loaded() would verify the auto-registration wiring, but unit coverage is thorough enough for merge.

Verdict: ✅ Worth merging

Key insight: The 6-level priority system (programmatic > plugin > project .agents > project .openhands > user .agents > user .openhands) is complex but justified - it solves real composition needs without forcing special cases. The deduplication in _load_agents_from_dirs() handles it cleanly.

@VascoSch92 VascoSch92 marked this pull request as ready for review February 23, 2026 13:56
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean refactoring with solid architecture.

Key strengths:

  • Right data structures: clean separation (schema, load, registry)
  • Solves real problem: multi-source agent registration with clear priority rules
  • No unnecessary complexity: straightforward loading and deduplication logic
  • Well-tested: comprehensive coverage of priority logic, edge cases, and error paths

Architecture is sound:

  • Moving agent definitions to SDK makes sense (tools should use SDK, not define core types)
  • Priority hierarchy is clear and well-documented: programmatic > plugin > file-based (project > user)
  • Thread-safe with minimal locking overhead

Worth merging

@VascoSch92
Copy link
Contributor Author

Note that the docs tests are not passing because of missing documentations for examples/01_standalone_sdk/40_acp_agent_example.py. The docs for this feature should be merged soon.

@enyst

This comment was marked as resolved.

@enyst

This comment was marked as outdated.

@openhands-ai

This comment was marked as outdated.

@openhands-ai

This comment was marked as duplicate.

@enyst
Copy link
Collaborator

enyst commented Feb 23, 2026

(HUMAN: pasting the result here, but please feel free to smack it when it's over the top on nothing really - I feel that might be the case)


Taste Rating: Needs improvement

This PR is trying to solve a real problem (declarative subagents), but it does it with a couple of “global state + first-wins” shortcuts that will absolutely bite you in any long-running process (agent-server), and it breaks existing public imports without a compatibility shim.


[CRITICAL ISSUES] (must fix)

1) You broke userspace: removed public APIs without a compatibility layer

Two concrete breakages:

  • openhands.tools.delegate.register_agent is gone
    Previously exported in openhands-tools/openhands/tools/delegate/__init__.py (and tested). Now the module exports only tool classes (new file shows no register_agent, lines 3–18). Anyone with:

    from openhands.tools.delegate import register_agent

    now gets an ImportError.

  • openhands.sdk.plugin.AgentDefinition is gone
    openhands-sdk/openhands/sdk/plugin/__init__.py no longer re-exports AgentDefinition (current file exports many types, but not that one; see lines ~16–48). That was a documented import surface before.

If you want to refactor internals, fine, but keep the old import paths working (even if they just re-export and warn) or bump versions and formally deprecate. Right now it’s “surprise, your code doesn’t import anymore”.


2) Global registry + register_*_if_absent() destroys your priority model and breaks “reload on new conversation”

You claim a priority order in openhands-sdk/openhands/sdk/subagent/load.py (docstring lines 29–36) and in register_plugin_agents() (registry.py lines 210–236), but the actual implementation is basically:

  • global _agent_factories dict (registry.py line 53)
  • register_plugin_agents() uses register_agent_if_absent() (registry.py lines 225–235)
  • register_file_agents() uses register_agent_if_absent() (registry.py lines 193–206)
  • LocalConversation calls them once per conversation (local_conversation.py lines 374–379)

That means the first thing to register a given name wins forever, across:

  • different workspaces/projects in the same process
  • later conversations
  • later plugin loads
  • edits to .md agent definitions

This directly violates your own stated priority (“plugin agents higher priority than file-based”) in any scenario where:

  1. conversation A starts without plugins → file agent “foo” registered
  2. conversation B starts with plugin providing “foo” → plugin cannot override, because “foo” already exists

It also fails the issue’s “reload on new conversation” requirement for updated definitions: if the .md file changes, a new conversation will still silently keep the old definition because duplicates are skipped.

This is the core design smell: you can’t implement priority and reloading with a single global dict and a no-overwrite registration primitive. The data structure is wrong for the problem.


3) The “default agent” used for delegation changed semantics (likely a regression)

get_agent_factory(None) now resolves to openhands.sdk.subagent.builtins.default.get_default_agent() (registry.py lines 253–256), which builds an Agent via _agent_definition_to_factory() (builtins/default.py lines 24–33).

This default sub-agent is not the same as the old preset default agent (which set condenser, system_prompt_kwargs, etc.). If the intent is “delegation subagents are lightweight”, that’s OK—but it’s a behavioral change that should be explicit and tested, because it will change output quality and context handling.


[IMPROVEMENT OPPORTUNITIES] (should fix)

4) One of the registry tests is structurally wrong (so it doesn’t test what it claims)

tests/sdk/subagent/test_subagent_registry.py::test_register_file_agents_project_priority creates the user agent file here:

  • user_home = tmp_path / "fake_home" / "agents" (line 49)
  • then writes to user_home / ".agents" / "shared-agent.md" (lines 51–55)

…but load_user_agents() searches Path.home() / ".agents/agents" (load.py lines 81–96). That test never places the file under .agents/agents/, so the “user version” likely isn’t loaded at all. The test can pass even if user-loading is broken.

This is exactly how you end up merging something that “has tests” but still doesn’t work.


5) Tool naming / UX mismatch is unresolved

Your schema/tests use tool names like ReadTool, GlobTool, Read, Write (e.g., loader tests lines 30–56; schema tests lines 13–33). Real tool registry names in this repo are typically snake_case derived from class names (e.g., GlobTool.name == "glob" per ToolDefinition naming logic).

So users will write agent files like the Claude docs (Read, Grep, Glob, Bash) and it’ll parse, but may fail at runtime when tool resolution happens. If you want Claude compatibility, you need an alias/normalization layer or very explicit docs.


6) The “example usage” in the new registry module is stale

openhands-sdk/openhands/sdk/subagent/registry.py docstring still says:

from openhands.tools.delegate import register_agent, Skill

(line 5)
That’s now wrong, because you removed that symbol from openhands.tools.delegate. This is basic “don’t ship docs that don’t run”.


7) Loader error handling is “log-and-ignore everything”

_load_agents_from_dir() catches Exception (load.py lines 133–140) and continues. For user config, that’s sometimes fine, but you’ll make debugging miserable. At least consider surfacing which field failed validation (or returning structured errors), especially since this is a brand new feature people will misconfigure.


[TESTING GAPS]

  • You removed plugin-loading tests (tests/sdk/plugin/test_plugin_loading.py deleted) but didn’t replace coverage for “plugin agents get registered and take precedence”.
  • There’s no test that proves your stated precedence order actually holds across multiple conversations in the same process (which is where the global registry bug shows up).
  • There’s no test for “edit agent file → new conversation picks up new content” (currently it won’t).

VERDICT: Needs rework

The feature direction is fine, but the current implementation is built on a global registry that can’t express priority or reloading correctly, and you broke public imports without providing a compatibility path.

Key insight: You’re trying to model layered configuration with precedence using a single “first one wins” global dict. That’s the wrong data structure, and it forces broken semantics (no override, no reload, cross-workspace leakage).

@VascoSch92
Copy link
Contributor Author

Review of the roasted-review

1) You broke userspace: removed public APIs without a compatibility layer
True, but we discussed that it was Ok :-)

2) Global registry + register_*_if_absent() destroys your priority model and breaks “reload on new conversation”
I believe this is not True. If we resume a conversation, it makes sense to use old hooks and plugins and not new one. On the other hands, if we start a new conversation, everything is loaded another time following the hierarcy.

3) The “default agent” used for delegation changed semantics (likely a regression)
This is True. But just because we don't have a field for the condenser now in the Markdown.
There is already an issue to add that in the parsing.

4) One of the registry tests is structurally wrong (so it doesn’t test what it claims)
This is True. Corrected!

5) Tool naming / UX mismatch is unresolved
This is True but a very small nit

6) The “example usage” in the new registry module is stale
This is also True. Corrected!

7) Loader error handling is “log-and-ignore everything”
This is not the case as we have exc_info=True

@VascoSch92

This comment was marked as resolved.

@enyst
Copy link
Collaborator

enyst commented Feb 23, 2026

Do you mean creaiting docs for the file-based agents? Like skills and plugin?

I mean an example script in examples/ directory (and maybe it's worth it to be a directory for that example, so that it has the markdown file separately, easy to understand... maybe)

@enyst
Copy link
Collaborator

enyst commented Feb 23, 2026

1) You broke userspace: removed public APIs without a compatibility layer
True, but we discussed that it was Ok :-)

Can we decide what is public API now, and add it to the SDK's __init__ ?

@VascoSch92
Copy link
Contributor Author

Do you mean creaiting docs for the file-based agents? Like skills and plugin?

I mean an example script in examples/ directory (and maybe it's worth it to be a directory for that example, so that it has the markdown file separately, easy to understand... maybe)

Added :-)

@VascoSch92 VascoSch92 requested a review from enyst February 24, 2026 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants