Skip to content

Compare ADP and ATIF for possible format unification #243

@neubig

Description

@neubig

Summary

We are considering whether ADP and Harbor's Agent Trajectory Interchange Format (ATIF) should converge. I compared the current ADP repository schema and documentation with harbor-framework/harbor cloned at commit 1b1dbc43729edff705071a797bb0f98a90cf62f5, focusing on:

  • ADP: README.md, schema/SCHEMA.md, schema/trajectory.py, schema/action/*, schema/observation/*, schema/dataset_metadata.py, representative sample_std.json files, and the raw -> standardized -> SFT pipeline.
  • ATIF/Harbor: rfcs/0001-trajectory-format.md, src/harbor/models/trajectories/*, src/harbor/models/agent/trajectory_config.py, validator tests, and examples of ATIF writers in adapters.

High-level conclusion: the formats overlap strongly in the core concepts of trajectories, user/agent/tool/environment events, structured tool calls, tool-call result links, reasoning text, and multimodal artifacts, but they currently optimize for different layers of the ecosystem.

  • ADP is dataset-normalization and SFT-conversion centered. It turns heterogeneous source datasets into a compact standardized event/action/observation stream, then into agent-specific SFT formats.
  • ATIF is runtime logging, evaluation, debugging, visualization, and RL-rollout centered. It records the agent run as ordered steps/LLM turns with rich agent config, metrics, token IDs, timestamps, subagents, context management, and tool definitions.

This suggests unification is feasible, but likely should start with a bidirectional adapter plus schema convergence plan rather than immediately replacing either representation.

Format snapshots

ADP standardized trajectory

ADP's canonical standardized record is a Pydantic Trajectory:

{
  "schema_version": "1.3.1",
  "id": "example_trajectory_001",
  "content": [
    {"class_": "text_observation", "content": "...", "source": "user"},
    {"class_": "code_action", "tool_call_id": "call_000001", "language": "bash", "content": "ls", "description": "..."},
    {"class_": "text_observation", "tool_call_id": "call_000001", "content": "...", "source": "environment"},
    {"class_": "message_action", "content": "..."}
  ],
  "available_apis": ["optional", "per-instance", "function", "names"],
  "details": {}
}

Core ADP objects:

  • Root: schema_version, id, content, optional available_apis, details.
  • Actions:
    • MessageAction: agent/user-facing text response.
    • CodeAction: executable code or shell command with language, content, description.
    • ApiAction: function/tool invocation with function, kwargs, description.
  • Observations:
    • TextObservation: text from user, agent, or environment.
    • ImageObservation: image path plus optional UI annotations and source.
    • WebObservation: web page state with html, axtree, url, optional screenshot image_observation, and viewport_size.
  • Common action/observation fields: tool_call_id for call/result linkage, reasoning_content on actions, and reward on actions/observations.
  • Validation model: extra fields are forbidden; tool actions/results are linked by matching tool_call_id; adjacent tool-result observations require an action id; MessageAction.tool_call_id is disallowed; tool-result observations must not use source="user".

ADP also has repository-level conventions that are part of the practical format story:

  • Every dataset supplies raw extraction (extract_raw.py), raw schema (schema_raw.py), standardization (raw_to_standardized.py), samples, and SFT samples.
  • metadata.json can describe OpenAI-style custom tools, enabled code languages, and browser availability.
  • Agent-specific converters under agents/*/std_to_sft.py are first-class outputs, e.g. OpenHands v0, SWE-agent, AgentLab.

ATIF trajectory

ATIF's current root object, per Harbor RFC v1.7 and Pydantic models, is:

{
  "schema_version": "ATIF-v1.7",
  "session_id": "run-scoped-id",
  "trajectory_id": "document-scoped-id",
  "agent": {
    "name": "openhands",
    "version": "...",
    "model_name": "...",
    "tool_definitions": [
      {"type": "function", "function": {"name": "...", "parameters": {}}}
    ],
    "extra": {}
  },
  "steps": [
    {"step_id": 1, "timestamp": "2025-10-11T10:30:00Z", "source": "user", "message": "..."},
    {
      "step_id": 2,
      "source": "agent",
      "model_name": "...",
      "reasoning_content": "...",
      "message": "...",
      "tool_calls": [{"tool_call_id": "call_1", "function_name": "tool", "arguments": {}}],
      "observation": {"results": [{"source_call_id": "call_1", "content": "..."}]},
      "metrics": {}
    }
  ],
  "notes": "...",
  "final_metrics": {},
  "continued_trajectory_ref": "...",
  "extra": {},
  "subagent_trajectories": []
}

Core ATIF objects:

  • Root: schema_version, optional run-scoped session_id, optional document-scoped trajectory_id, required agent, required non-empty steps, optional notes, final_metrics, continued_trajectory_ref, extra, and subagent_trajectories.
  • Agent: required name and version; optional model_name, OpenAI-style tool_definitions, and extra.
  • Step: sequential step_id, optional ISO timestamp, source in system | user | agent, required message as string or multimodal ContentPart[], agent-only model_name, reasoning_effort, reasoning_content, tool_calls, and metrics, plus optional observation, llm_call_count, is_copied_context, and extra.
  • ToolCall: required tool_call_id, function_name, arguments, optional extra.
  • Observation / ObservationResult: results[] with optional source_call_id, content string or multimodal ContentPart[], optional subagent_trajectory_ref[], optional extra.
  • Metrics / FinalMetrics: prompt/completion/cached token counts, cost, prompt/completion token IDs, logprobs, extras.
  • Multimodal support: ContentPart supports text and image, with ImageSource(media_type, path).
  • Subagent support: SubagentTrajectoryRef resolves by embedded trajectory_id or external trajectory_path; session_id is informational.
  • Context management: step.extra.context_management convention for compaction, pruning, injection, and boundary semantics.

Major similarities

Area ADP ATIF Notes
JSON + typed validation Pydantic models under schema/ Pydantic models under src/harbor/models/trajectories/ Both reject unknown fields on core objects.
Root trajectory object Trajectory(id, content, details, ...) Trajectory(agent, steps, final_metrics, ...) Both use versioned root records.
Ordered interaction history Flat content[] event/action/observation stream Sequential steps[] Both preserve chronology.
User/agent/environment concepts TextObservation.source supports user, agent, environment; actions represent agent outputs Step.source supports system, user, agent; observations represent environment/system feedback ATIF has explicit system; ADP has explicit environment.
Structured tool calls ApiAction(function, kwargs) ToolCall(function_name, arguments) Nearly isomorphic at a single-call level.
Code/shell execution First-class CodeAction(language, content) Usually represented as a tool call, e.g. shell/execute function ADP preserves language directly; ATIF depends on tool naming/arguments.
Tool-result linkage Shared tool_call_id across action and later observation ToolCall.tool_call_id linked by ObservationResult.source_call_id within same step Same conceptual link; different granularity.
Reasoning Action.reasoning_content; description fields Step.reasoning_content; reasoning_effort ADP recently aligns naming with ATIF here.
Rewards / RL signals reward on actions and observations token IDs/logprobs and metrics; no direct model-level reward field today Complementary rather than identical.
Multimodal data ImageObservation and WebObservation.image_observation ContentPart(type=image, source=ImageSource) ADP has UI annotations and web state; ATIF has generic mixed content.
Tool availability available_apis names and optional dataset metadata.json OpenAI tool specs agent.tool_definitions OpenAI-style specs ATIF carries definitions in the trajectory; ADP often carries names in trajectory and specs outside.
Extension space details root metadata; dataset-specific raw schemas extra at root, agent, step, tool call, metrics, observation result, subagent ref ATIF has more structured extension points.
SFT usefulness Explicit SFT converters are core repo feature RFC explicitly targets SFT/RL/debug/visualization Shared end goal, different current tooling.

Major differences

1. Event/action stream vs LLM-turn step model

ADP stores a flat sequence where agent actions and environment observations are separate records:

TextObservation(user) -> CodeAction -> TextObservation(environment) -> MessageAction

ATIF stores a turn/step where one agent step may combine the assistant message, one or more tool calls, their observations, metrics, and reasoning:

Step(user) -> Step(agent: message + tool_calls[] + observation.results[] + metrics)

Implication: ATIF can preserve “these multiple tool calls came from one LLM inference” directly. ADP can preserve exact chronological events, but currently cannot represent a single LLM inference that emitted multiple tool calls without splitting them into multiple ApiAction/CodeAction records and losing the grouping unless encoded in details.

2. Root metadata and agent identity

ATIF has first-class agent.name, agent.version, agent.model_name, agent.tool_definitions, session_id, trajectory_id, final_metrics, notes, and continued_trajectory_ref.

ADP's root has id, optional available_apis, and details. This keeps standardized dataset samples compact and dataset-agnostic, but means many runtime-run fields are either absent or pushed into details.

3. System messages and environment source

ATIF has source="system" for system prompts and system-initiated operations. ADP's TextObservation.source only permits user, agent, and environment.

ADP has an explicit environment source for text/image observations and a dedicated WebObservation; ATIF environment feedback is inside Step.observation.results, not a top-level source.

Consequence: ATIF -> ADP needs a policy for system steps. ADP -> ATIF needs a policy for standalone environment observations that are not tool results.

4. Tool-call cardinality and result placement

ADP validates a tool_call_id as one action to exactly one later observation. ATIF validates ObservationResult.source_call_id against tool calls within the same step, and supports multiple tool calls/results on a single agent step.

This is probably the most important structural mismatch for unification:

  • ADP is call/result event oriented.
  • ATIF is LLM-step oriented.

Both are useful. A unified format may need both a “step/turn” grouping and a normalized “event/action/observation” view.

5. CodeAction and WebObservation are richer in ADP

ADP has first-class objects for:

  • CodeAction(language, content) with many language literals.
  • WebObservation(html, axtree, url, image_observation, viewport_size).
  • ImageObservation(annotations[]) with bounding boxes and element types.

ATIF can store these via tool calls, observation content, image content parts, and extra, but it does not currently have standardized fields for accessibility tree, DOM/HTML, viewport size, or UI annotations.

If ADP and ATIF unify, the combined format should avoid losing ADP's web/UI state fidelity.

6. Metrics, timestamps, and token-level data are richer in ATIF

ATIF standardizes:

  • timestamp on steps.
  • Metrics with prompt/completion/cached tokens, cost, prompt token IDs, completion token IDs, logprobs, extra.
  • FinalMetrics at root.
  • llm_call_count, including 0 for deterministic dispatch and >1 for aggregated steps.

ADP only has reward and free-form details for these concerns. ADP is therefore less immediately useful as an evaluation trace or RL rollout log without conventions for metrics.

7. Context management and continuation

ATIF has continued_trajectory_ref, is_copied_context so SFT pipelines can filter copied history, and step.extra.context_management conventions for compaction/pruning/injection and boundary semantics.

ADP currently lacks first-class equivalents. This matters for long-running agents and SFT correctness: copied context should not be treated as newly produced assistant behavior.

8. Subagents and hierarchical workflows

ATIF has first-class subagent_trajectories and SubagentTrajectoryRef in observation results. ADP has no standardized subagent representation. It could carry this in details, but consumers would need dataset-specific logic.

9. Extension philosophy

ADP is strict and compact: core fields plus details, with dataset-specific raw schemas outside the standardized model. ATIF is strict at core objects but provides extra at many levels, which makes it easier to preserve producer-specific metadata without schema churn.

10. Repository scope

ADP includes many dataset adapters and agent-specific SFT converters. Its “format” includes a reproducible data pipeline:

raw dataset -> ADP standardized trajectory -> agent-specific SFT

Harbor/ATIF includes benchmark adapters, evaluators, run logs, validators, and trajectory files for agents run in environments. Its “format” is coupled to execution/evaluation trace capture:

agent run -> ATIF trajectory -> validation/debug/visualization/SFT/RL analysis

These are adjacent but not identical use cases.

Conversion sketch

ADP -> ATIF

Likely mapping:

ADP ATIF
Trajectory.schema_version root extra.adp_schema_version or mapped version metadata
Trajectory.id trajectory_id; maybe also session_id if no better run id exists
details root extra or notes for human-readable notes
dataset metadata.json.custom_tools agent.tool_definitions
available_apis agent.tool_definitions filtered to names, or agent.extra.available_apis
TextObservation(source=user) Step(source=user, message=content)
MessageAction Step(source=agent, message=content, reasoning_content=...)
ApiAction + matched observation Step(source=agent, message=description or "", tool_calls=[...], observation.results=[...])
CodeAction + matched observation Step(source=agent, tool_calls=[{function_name: "execute_code"/"execute_bash", arguments:{language, content}}], observation.results=[...])
TextObservation(source=environment) with tool_call_id ObservationResult(content=..., source_call_id=tool_call_id)
ImageObservation ContentPart(type=image) plus annotations in extra, or ATIF extension for annotations
WebObservation ObservationResult.content/extra.web containing html, axtree, url, viewport, screenshot ref
reward step.extra.reward, observation.results[].extra.reward, or future ATIF reward field

Lossy points:

  • ADP does not always indicate which actions came from the same LLM inference.
  • ADP description vs reasoning_content vs MessageAction.content requires policy: is description visible assistant text, hidden thought, or action rationale?
  • ADP root does not normally include agent identity/model/version/metrics/timestamps.
  • ADP WebObservation has more structured web state than ATIF currently standardizes.

ATIF -> ADP

Likely mapping:

ATIF ADP
schema_version details.atif_schema_version
trajectory_id or session_id Trajectory.id
agent, final_metrics, notes, continued_trajectory_ref, root extra details
Step(source=user) TextObservation(source=user, content=message)
Step(source=system) needs new ADP source/type, or TextObservation(source=environment, name="system") as a lossy fallback
Step(source=agent, no tool_calls) MessageAction(content=message, reasoning_content=reasoning_content)
each ToolCall ApiAction(function=function_name, kwargs=arguments, tool_call_id=...), unless mapped to CodeAction by function name/arguments
each ObservationResult TextObservation / ImageObservation / WebObservation with matching tool_call_id when possible
metrics details or future ADP metrics field
is_copied_context details or future ADP field; SFT converters should filter if true
subagent_trajectory_ref / subagent_trajectories details or future ADP subagent fields

Lossy points:

  • Multiple tool_calls in one ATIF step become multiple ADP actions unless ADP adds grouping.
  • ATIF metrics, timestamps, token IDs, logprobs, and cost have no first-class ADP location.
  • System steps are not representable without overloading environment.
  • Subagent references and embedded trajectories need new ADP constructs or conventions.
  • llm_call_count=0 deterministic dispatch needs SFT filtering semantics in ADP converters.

Proposed unification path

Phase 1: Build adapters and loss reports

Add scripts/tests that convert a small set of trajectories both ways:

  1. ADP sample_std.json -> ATIF.
  2. ATIF examples from Harbor RFC -> ADP.
  3. Validate with both ADP Pydantic models and Harbor's ATIF validator.
  4. Emit a machine-readable “loss report” for fields that cannot be represented exactly.

Useful test cases:

  • ADP coding trajectory with CodeAction and terminal output.
  • ADP web trajectory with WebObservation.
  • ADP GUI trajectory with ImageObservation.annotations.
  • ATIF multi-tool single-step example.
  • ATIF multimodal image example.
  • ATIF subagent/context-management example.

Phase 2: Decide canonical abstraction: event stream, step stream, or both

The formats can converge in two plausible ways:

  1. Make ADP a profile/projection of ATIF. ADP keeps its dataset pipeline and SFT converters, but standardized samples adopt ATIF steps as canonical.
    • Pro: richer runtime metadata, already designed for evaluation/RL.
    • Con: ADP may lose its simple action/observation stream and web/code specializations unless ATIF grows extensions.
  2. Define a shared core with two views. A trajectory has both a step/turn view for LLM inference, metrics, multi-tool grouping, SFT/RL boundaries, and an event/action/observation view for dataset normalization and replay.
    • Pro: preserves strengths of both.
    • Con: more schema complexity and synchronization invariants.

My recommendation is option 2, or at minimum an ATIF-compatible grouping layer over ADP content, because ADP's current samples and converters are event-oriented while ATIF's strongest additions are turn-level metadata and grouping.

Phase 3: Schema changes to consider in ADP

Potential ADP additions, in rough priority order:

  1. Add optional root agent block compatible with ATIF Agent.
  2. Add optional root session_id and/or trajectory_id while preserving existing id for backward compatibility.
  3. Add optional timestamp on content items or a grouped turn object.
  4. Add optional metrics and final_metrics compatible with ATIF names.
  5. Add system as an allowed source or introduce SystemObservation / SystemMessage.
  6. Add a first-class grouping construct for one LLM inference containing multiple actions/tool calls.
  7. Promote metadata.json.custom_tools / available_apis toward ATIF-compatible tool_definitions in the standardized record.
  8. Add extra at action/observation level, or define structured locations for producer-specific fields.
  9. Add is_copied_context, llm_call_count, continued_trajectory_ref, and context-management conventions for long-context agents.
  10. Add subagent trajectory references if ADP wants to support multi-agent training data directly.

Phase 4: Schema changes to consider in ATIF

Potential ATIF additions or conventions to better cover ADP:

  1. Standardize a code execution tool shape, e.g. function_name="execute_code" with {language, content} or a dedicated code action extension.
  2. Standardize web/UI observation fields instead of requiring extra.web conventions: html, axtree, url, viewport_size, screenshot/image reference, and UI annotations / bounding boxes.
  3. Add direct reward fields or an RL subobject if per-step/per-observation rewards are expected.
  4. Clarify SFT conversion semantics for message vs reasoning_content vs tool-call-only agent steps.
  5. Provide an official ATIF profile for dataset-derived trajectories where metrics/timestamps may be absent.

Open questions for maintainers

  1. Should ADP aim to become ATIF-compatible at the standardized trajectory layer, or only provide import/export adapters?
  2. Is ADP's content[] event/action stream considered a permanent canonical abstraction, or could it be replaced/grouped by turn-level steps?
  3. Should ADP preserve first-class CodeAction and WebObservation even if ATIF becomes the root envelope?
  4. Should available_apis evolve into full OpenAI-style tool_definitions in the trajectory, matching ATIF agent.tool_definitions?
  5. Where should token IDs, logprobs, cost, timestamps, and is_copied_context live in ADP if we care about RL and SFT filtering?
  6. How should system prompts and context-management summaries be represented in ADP without overloading environment?
  7. Should ADP SFT converters be taught to filter ATIF is_copied_context=true and llm_call_count=0 if importing ATIF?
  8. Should we coordinate versioning with Harbor so future schema changes can be validated cross-repo?

Concrete next step proposal

Create a small interoperability PR that does not change ADP's canonical schema yet:

  • Add scripts/adp_to_atif.py and scripts/atif_to_adp.py as experimental converters.
  • Add fixtures for one ADP coding trajectory, one ADP web trajectory, one ATIF multi-tool trajectory, and one ATIF multimodal/subagent or context-management trajectory.
  • Add tests that validate conversions and record expected lossy fields.
  • Use the loss report to decide whether ADP should adopt ATIF fields or define a shared core.

This issue was created by an AI agent (OpenHands) on behalf of the user after inspecting the ADP repository and cloning/reading harbor-framework/harbor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions