Summary
We are considering whether ADP and Harbor's Agent Trajectory Interchange Format (ATIF) should converge. I compared the current ADP repository schema and documentation with harbor-framework/harbor cloned at commit 1b1dbc43729edff705071a797bb0f98a90cf62f5, focusing on:
- ADP:
README.md, schema/SCHEMA.md, schema/trajectory.py, schema/action/*, schema/observation/*, schema/dataset_metadata.py, representative sample_std.json files, and the raw -> standardized -> SFT pipeline.
- ATIF/Harbor:
rfcs/0001-trajectory-format.md, src/harbor/models/trajectories/*, src/harbor/models/agent/trajectory_config.py, validator tests, and examples of ATIF writers in adapters.
High-level conclusion: the formats overlap strongly in the core concepts of trajectories, user/agent/tool/environment events, structured tool calls, tool-call result links, reasoning text, and multimodal artifacts, but they currently optimize for different layers of the ecosystem.
- ADP is dataset-normalization and SFT-conversion centered. It turns heterogeneous source datasets into a compact standardized event/action/observation stream, then into agent-specific SFT formats.
- ATIF is runtime logging, evaluation, debugging, visualization, and RL-rollout centered. It records the agent run as ordered steps/LLM turns with rich agent config, metrics, token IDs, timestamps, subagents, context management, and tool definitions.
This suggests unification is feasible, but likely should start with a bidirectional adapter plus schema convergence plan rather than immediately replacing either representation.
Format snapshots
ADP standardized trajectory
ADP's canonical standardized record is a Pydantic Trajectory:
{
"schema_version": "1.3.1",
"id": "example_trajectory_001",
"content": [
{"class_": "text_observation", "content": "...", "source": "user"},
{"class_": "code_action", "tool_call_id": "call_000001", "language": "bash", "content": "ls", "description": "..."},
{"class_": "text_observation", "tool_call_id": "call_000001", "content": "...", "source": "environment"},
{"class_": "message_action", "content": "..."}
],
"available_apis": ["optional", "per-instance", "function", "names"],
"details": {}
}
Core ADP objects:
- Root:
schema_version, id, content, optional available_apis, details.
- Actions:
MessageAction: agent/user-facing text response.
CodeAction: executable code or shell command with language, content, description.
ApiAction: function/tool invocation with function, kwargs, description.
- Observations:
TextObservation: text from user, agent, or environment.
ImageObservation: image path plus optional UI annotations and source.
WebObservation: web page state with html, axtree, url, optional screenshot image_observation, and viewport_size.
- Common action/observation fields:
tool_call_id for call/result linkage, reasoning_content on actions, and reward on actions/observations.
- Validation model: extra fields are forbidden; tool actions/results are linked by matching
tool_call_id; adjacent tool-result observations require an action id; MessageAction.tool_call_id is disallowed; tool-result observations must not use source="user".
ADP also has repository-level conventions that are part of the practical format story:
- Every dataset supplies raw extraction (
extract_raw.py), raw schema (schema_raw.py), standardization (raw_to_standardized.py), samples, and SFT samples.
metadata.json can describe OpenAI-style custom tools, enabled code languages, and browser availability.
- Agent-specific converters under
agents/*/std_to_sft.py are first-class outputs, e.g. OpenHands v0, SWE-agent, AgentLab.
ATIF trajectory
ATIF's current root object, per Harbor RFC v1.7 and Pydantic models, is:
{
"schema_version": "ATIF-v1.7",
"session_id": "run-scoped-id",
"trajectory_id": "document-scoped-id",
"agent": {
"name": "openhands",
"version": "...",
"model_name": "...",
"tool_definitions": [
{"type": "function", "function": {"name": "...", "parameters": {}}}
],
"extra": {}
},
"steps": [
{"step_id": 1, "timestamp": "2025-10-11T10:30:00Z", "source": "user", "message": "..."},
{
"step_id": 2,
"source": "agent",
"model_name": "...",
"reasoning_content": "...",
"message": "...",
"tool_calls": [{"tool_call_id": "call_1", "function_name": "tool", "arguments": {}}],
"observation": {"results": [{"source_call_id": "call_1", "content": "..."}]},
"metrics": {}
}
],
"notes": "...",
"final_metrics": {},
"continued_trajectory_ref": "...",
"extra": {},
"subagent_trajectories": []
}
Core ATIF objects:
- Root:
schema_version, optional run-scoped session_id, optional document-scoped trajectory_id, required agent, required non-empty steps, optional notes, final_metrics, continued_trajectory_ref, extra, and subagent_trajectories.
Agent: required name and version; optional model_name, OpenAI-style tool_definitions, and extra.
Step: sequential step_id, optional ISO timestamp, source in system | user | agent, required message as string or multimodal ContentPart[], agent-only model_name, reasoning_effort, reasoning_content, tool_calls, and metrics, plus optional observation, llm_call_count, is_copied_context, and extra.
ToolCall: required tool_call_id, function_name, arguments, optional extra.
Observation / ObservationResult: results[] with optional source_call_id, content string or multimodal ContentPart[], optional subagent_trajectory_ref[], optional extra.
Metrics / FinalMetrics: prompt/completion/cached token counts, cost, prompt/completion token IDs, logprobs, extras.
- Multimodal support:
ContentPart supports text and image, with ImageSource(media_type, path).
- Subagent support:
SubagentTrajectoryRef resolves by embedded trajectory_id or external trajectory_path; session_id is informational.
- Context management:
step.extra.context_management convention for compaction, pruning, injection, and boundary semantics.
Major similarities
| Area |
ADP |
ATIF |
Notes |
| JSON + typed validation |
Pydantic models under schema/ |
Pydantic models under src/harbor/models/trajectories/ |
Both reject unknown fields on core objects. |
| Root trajectory object |
Trajectory(id, content, details, ...) |
Trajectory(agent, steps, final_metrics, ...) |
Both use versioned root records. |
| Ordered interaction history |
Flat content[] event/action/observation stream |
Sequential steps[] |
Both preserve chronology. |
| User/agent/environment concepts |
TextObservation.source supports user, agent, environment; actions represent agent outputs |
Step.source supports system, user, agent; observations represent environment/system feedback |
ATIF has explicit system; ADP has explicit environment. |
| Structured tool calls |
ApiAction(function, kwargs) |
ToolCall(function_name, arguments) |
Nearly isomorphic at a single-call level. |
| Code/shell execution |
First-class CodeAction(language, content) |
Usually represented as a tool call, e.g. shell/execute function |
ADP preserves language directly; ATIF depends on tool naming/arguments. |
| Tool-result linkage |
Shared tool_call_id across action and later observation |
ToolCall.tool_call_id linked by ObservationResult.source_call_id within same step |
Same conceptual link; different granularity. |
| Reasoning |
Action.reasoning_content; description fields |
Step.reasoning_content; reasoning_effort |
ADP recently aligns naming with ATIF here. |
| Rewards / RL signals |
reward on actions and observations |
token IDs/logprobs and metrics; no direct model-level reward field today |
Complementary rather than identical. |
| Multimodal data |
ImageObservation and WebObservation.image_observation |
ContentPart(type=image, source=ImageSource) |
ADP has UI annotations and web state; ATIF has generic mixed content. |
| Tool availability |
available_apis names and optional dataset metadata.json OpenAI tool specs |
agent.tool_definitions OpenAI-style specs |
ATIF carries definitions in the trajectory; ADP often carries names in trajectory and specs outside. |
| Extension space |
details root metadata; dataset-specific raw schemas |
extra at root, agent, step, tool call, metrics, observation result, subagent ref |
ATIF has more structured extension points. |
| SFT usefulness |
Explicit SFT converters are core repo feature |
RFC explicitly targets SFT/RL/debug/visualization |
Shared end goal, different current tooling. |
Major differences
1. Event/action stream vs LLM-turn step model
ADP stores a flat sequence where agent actions and environment observations are separate records:
TextObservation(user) -> CodeAction -> TextObservation(environment) -> MessageAction
ATIF stores a turn/step where one agent step may combine the assistant message, one or more tool calls, their observations, metrics, and reasoning:
Step(user) -> Step(agent: message + tool_calls[] + observation.results[] + metrics)
Implication: ATIF can preserve “these multiple tool calls came from one LLM inference” directly. ADP can preserve exact chronological events, but currently cannot represent a single LLM inference that emitted multiple tool calls without splitting them into multiple ApiAction/CodeAction records and losing the grouping unless encoded in details.
2. Root metadata and agent identity
ATIF has first-class agent.name, agent.version, agent.model_name, agent.tool_definitions, session_id, trajectory_id, final_metrics, notes, and continued_trajectory_ref.
ADP's root has id, optional available_apis, and details. This keeps standardized dataset samples compact and dataset-agnostic, but means many runtime-run fields are either absent or pushed into details.
3. System messages and environment source
ATIF has source="system" for system prompts and system-initiated operations. ADP's TextObservation.source only permits user, agent, and environment.
ADP has an explicit environment source for text/image observations and a dedicated WebObservation; ATIF environment feedback is inside Step.observation.results, not a top-level source.
Consequence: ATIF -> ADP needs a policy for system steps. ADP -> ATIF needs a policy for standalone environment observations that are not tool results.
4. Tool-call cardinality and result placement
ADP validates a tool_call_id as one action to exactly one later observation. ATIF validates ObservationResult.source_call_id against tool calls within the same step, and supports multiple tool calls/results on a single agent step.
This is probably the most important structural mismatch for unification:
- ADP is call/result event oriented.
- ATIF is LLM-step oriented.
Both are useful. A unified format may need both a “step/turn” grouping and a normalized “event/action/observation” view.
5. CodeAction and WebObservation are richer in ADP
ADP has first-class objects for:
CodeAction(language, content) with many language literals.
WebObservation(html, axtree, url, image_observation, viewport_size).
ImageObservation(annotations[]) with bounding boxes and element types.
ATIF can store these via tool calls, observation content, image content parts, and extra, but it does not currently have standardized fields for accessibility tree, DOM/HTML, viewport size, or UI annotations.
If ADP and ATIF unify, the combined format should avoid losing ADP's web/UI state fidelity.
6. Metrics, timestamps, and token-level data are richer in ATIF
ATIF standardizes:
timestamp on steps.
Metrics with prompt/completion/cached tokens, cost, prompt token IDs, completion token IDs, logprobs, extra.
FinalMetrics at root.
llm_call_count, including 0 for deterministic dispatch and >1 for aggregated steps.
ADP only has reward and free-form details for these concerns. ADP is therefore less immediately useful as an evaluation trace or RL rollout log without conventions for metrics.
7. Context management and continuation
ATIF has continued_trajectory_ref, is_copied_context so SFT pipelines can filter copied history, and step.extra.context_management conventions for compaction/pruning/injection and boundary semantics.
ADP currently lacks first-class equivalents. This matters for long-running agents and SFT correctness: copied context should not be treated as newly produced assistant behavior.
8. Subagents and hierarchical workflows
ATIF has first-class subagent_trajectories and SubagentTrajectoryRef in observation results. ADP has no standardized subagent representation. It could carry this in details, but consumers would need dataset-specific logic.
9. Extension philosophy
ADP is strict and compact: core fields plus details, with dataset-specific raw schemas outside the standardized model. ATIF is strict at core objects but provides extra at many levels, which makes it easier to preserve producer-specific metadata without schema churn.
10. Repository scope
ADP includes many dataset adapters and agent-specific SFT converters. Its “format” includes a reproducible data pipeline:
raw dataset -> ADP standardized trajectory -> agent-specific SFT
Harbor/ATIF includes benchmark adapters, evaluators, run logs, validators, and trajectory files for agents run in environments. Its “format” is coupled to execution/evaluation trace capture:
agent run -> ATIF trajectory -> validation/debug/visualization/SFT/RL analysis
These are adjacent but not identical use cases.
Conversion sketch
ADP -> ATIF
Likely mapping:
| ADP |
ATIF |
Trajectory.schema_version |
root extra.adp_schema_version or mapped version metadata |
Trajectory.id |
trajectory_id; maybe also session_id if no better run id exists |
details |
root extra or notes for human-readable notes |
dataset metadata.json.custom_tools |
agent.tool_definitions |
available_apis |
agent.tool_definitions filtered to names, or agent.extra.available_apis |
TextObservation(source=user) |
Step(source=user, message=content) |
MessageAction |
Step(source=agent, message=content, reasoning_content=...) |
ApiAction + matched observation |
Step(source=agent, message=description or "", tool_calls=[...], observation.results=[...]) |
CodeAction + matched observation |
Step(source=agent, tool_calls=[{function_name: "execute_code"/"execute_bash", arguments:{language, content}}], observation.results=[...]) |
TextObservation(source=environment) with tool_call_id |
ObservationResult(content=..., source_call_id=tool_call_id) |
ImageObservation |
ContentPart(type=image) plus annotations in extra, or ATIF extension for annotations |
WebObservation |
ObservationResult.content/extra.web containing html, axtree, url, viewport, screenshot ref |
reward |
step.extra.reward, observation.results[].extra.reward, or future ATIF reward field |
Lossy points:
- ADP does not always indicate which actions came from the same LLM inference.
- ADP
description vs reasoning_content vs MessageAction.content requires policy: is description visible assistant text, hidden thought, or action rationale?
- ADP root does not normally include agent identity/model/version/metrics/timestamps.
- ADP
WebObservation has more structured web state than ATIF currently standardizes.
ATIF -> ADP
Likely mapping:
| ATIF |
ADP |
schema_version |
details.atif_schema_version |
trajectory_id or session_id |
Trajectory.id |
agent, final_metrics, notes, continued_trajectory_ref, root extra |
details |
Step(source=user) |
TextObservation(source=user, content=message) |
Step(source=system) |
needs new ADP source/type, or TextObservation(source=environment, name="system") as a lossy fallback |
Step(source=agent, no tool_calls) |
MessageAction(content=message, reasoning_content=reasoning_content) |
each ToolCall |
ApiAction(function=function_name, kwargs=arguments, tool_call_id=...), unless mapped to CodeAction by function name/arguments |
each ObservationResult |
TextObservation / ImageObservation / WebObservation with matching tool_call_id when possible |
metrics |
details or future ADP metrics field |
is_copied_context |
details or future ADP field; SFT converters should filter if true |
subagent_trajectory_ref / subagent_trajectories |
details or future ADP subagent fields |
Lossy points:
- Multiple
tool_calls in one ATIF step become multiple ADP actions unless ADP adds grouping.
- ATIF
metrics, timestamps, token IDs, logprobs, and cost have no first-class ADP location.
- System steps are not representable without overloading
environment.
- Subagent references and embedded trajectories need new ADP constructs or conventions.
llm_call_count=0 deterministic dispatch needs SFT filtering semantics in ADP converters.
Proposed unification path
Phase 1: Build adapters and loss reports
Add scripts/tests that convert a small set of trajectories both ways:
- ADP
sample_std.json -> ATIF.
- ATIF examples from Harbor RFC -> ADP.
- Validate with both ADP Pydantic models and Harbor's ATIF validator.
- Emit a machine-readable “loss report” for fields that cannot be represented exactly.
Useful test cases:
- ADP coding trajectory with
CodeAction and terminal output.
- ADP web trajectory with
WebObservation.
- ADP GUI trajectory with
ImageObservation.annotations.
- ATIF multi-tool single-step example.
- ATIF multimodal image example.
- ATIF subagent/context-management example.
Phase 2: Decide canonical abstraction: event stream, step stream, or both
The formats can converge in two plausible ways:
- Make ADP a profile/projection of ATIF. ADP keeps its dataset pipeline and SFT converters, but standardized samples adopt ATIF
steps as canonical.
- Pro: richer runtime metadata, already designed for evaluation/RL.
- Con: ADP may lose its simple action/observation stream and web/code specializations unless ATIF grows extensions.
- Define a shared core with two views. A trajectory has both a step/turn view for LLM inference, metrics, multi-tool grouping, SFT/RL boundaries, and an event/action/observation view for dataset normalization and replay.
- Pro: preserves strengths of both.
- Con: more schema complexity and synchronization invariants.
My recommendation is option 2, or at minimum an ATIF-compatible grouping layer over ADP content, because ADP's current samples and converters are event-oriented while ATIF's strongest additions are turn-level metadata and grouping.
Phase 3: Schema changes to consider in ADP
Potential ADP additions, in rough priority order:
- Add optional root
agent block compatible with ATIF Agent.
- Add optional root
session_id and/or trajectory_id while preserving existing id for backward compatibility.
- Add optional
timestamp on content items or a grouped turn object.
- Add optional
metrics and final_metrics compatible with ATIF names.
- Add
system as an allowed source or introduce SystemObservation / SystemMessage.
- Add a first-class grouping construct for one LLM inference containing multiple actions/tool calls.
- Promote
metadata.json.custom_tools / available_apis toward ATIF-compatible tool_definitions in the standardized record.
- Add
extra at action/observation level, or define structured locations for producer-specific fields.
- Add
is_copied_context, llm_call_count, continued_trajectory_ref, and context-management conventions for long-context agents.
- Add subagent trajectory references if ADP wants to support multi-agent training data directly.
Phase 4: Schema changes to consider in ATIF
Potential ATIF additions or conventions to better cover ADP:
- Standardize a code execution tool shape, e.g.
function_name="execute_code" with {language, content} or a dedicated code action extension.
- Standardize web/UI observation fields instead of requiring
extra.web conventions: html, axtree, url, viewport_size, screenshot/image reference, and UI annotations / bounding boxes.
- Add direct reward fields or an RL subobject if per-step/per-observation rewards are expected.
- Clarify SFT conversion semantics for
message vs reasoning_content vs tool-call-only agent steps.
- Provide an official ATIF profile for dataset-derived trajectories where metrics/timestamps may be absent.
Open questions for maintainers
- Should ADP aim to become ATIF-compatible at the standardized trajectory layer, or only provide import/export adapters?
- Is ADP's
content[] event/action stream considered a permanent canonical abstraction, or could it be replaced/grouped by turn-level steps?
- Should ADP preserve first-class
CodeAction and WebObservation even if ATIF becomes the root envelope?
- Should
available_apis evolve into full OpenAI-style tool_definitions in the trajectory, matching ATIF agent.tool_definitions?
- Where should token IDs, logprobs, cost, timestamps, and
is_copied_context live in ADP if we care about RL and SFT filtering?
- How should system prompts and context-management summaries be represented in ADP without overloading
environment?
- Should ADP SFT converters be taught to filter ATIF
is_copied_context=true and llm_call_count=0 if importing ATIF?
- Should we coordinate versioning with Harbor so future schema changes can be validated cross-repo?
Concrete next step proposal
Create a small interoperability PR that does not change ADP's canonical schema yet:
- Add
scripts/adp_to_atif.py and scripts/atif_to_adp.py as experimental converters.
- Add fixtures for one ADP coding trajectory, one ADP web trajectory, one ATIF multi-tool trajectory, and one ATIF multimodal/subagent or context-management trajectory.
- Add tests that validate conversions and record expected lossy fields.
- Use the loss report to decide whether ADP should adopt ATIF fields or define a shared core.
This issue was created by an AI agent (OpenHands) on behalf of the user after inspecting the ADP repository and cloning/reading harbor-framework/harbor.
Summary
We are considering whether ADP and Harbor's Agent Trajectory Interchange Format (ATIF) should converge. I compared the current ADP repository schema and documentation with
harbor-framework/harborcloned at commit1b1dbc43729edff705071a797bb0f98a90cf62f5, focusing on:README.md,schema/SCHEMA.md,schema/trajectory.py,schema/action/*,schema/observation/*,schema/dataset_metadata.py, representativesample_std.jsonfiles, and the raw -> standardized -> SFT pipeline.rfcs/0001-trajectory-format.md,src/harbor/models/trajectories/*,src/harbor/models/agent/trajectory_config.py, validator tests, and examples of ATIF writers in adapters.High-level conclusion: the formats overlap strongly in the core concepts of trajectories, user/agent/tool/environment events, structured tool calls, tool-call result links, reasoning text, and multimodal artifacts, but they currently optimize for different layers of the ecosystem.
This suggests unification is feasible, but likely should start with a bidirectional adapter plus schema convergence plan rather than immediately replacing either representation.
Format snapshots
ADP standardized trajectory
ADP's canonical standardized record is a Pydantic
Trajectory:{ "schema_version": "1.3.1", "id": "example_trajectory_001", "content": [ {"class_": "text_observation", "content": "...", "source": "user"}, {"class_": "code_action", "tool_call_id": "call_000001", "language": "bash", "content": "ls", "description": "..."}, {"class_": "text_observation", "tool_call_id": "call_000001", "content": "...", "source": "environment"}, {"class_": "message_action", "content": "..."} ], "available_apis": ["optional", "per-instance", "function", "names"], "details": {} }Core ADP objects:
schema_version,id,content, optionalavailable_apis,details.MessageAction: agent/user-facing text response.CodeAction: executable code or shell command withlanguage,content,description.ApiAction: function/tool invocation withfunction,kwargs,description.TextObservation: text fromuser,agent, orenvironment.ImageObservation: image path plus optional UI annotations and source.WebObservation: web page state withhtml,axtree,url, optional screenshotimage_observation, andviewport_size.tool_call_idfor call/result linkage,reasoning_contenton actions, andrewardon actions/observations.tool_call_id; adjacent tool-result observations require an action id;MessageAction.tool_call_idis disallowed; tool-result observations must not usesource="user".ADP also has repository-level conventions that are part of the practical format story:
extract_raw.py), raw schema (schema_raw.py), standardization (raw_to_standardized.py), samples, and SFT samples.metadata.jsoncan describe OpenAI-style custom tools, enabled code languages, and browser availability.agents/*/std_to_sft.pyare first-class outputs, e.g. OpenHands v0, SWE-agent, AgentLab.ATIF trajectory
ATIF's current root object, per Harbor RFC v1.7 and Pydantic models, is:
{ "schema_version": "ATIF-v1.7", "session_id": "run-scoped-id", "trajectory_id": "document-scoped-id", "agent": { "name": "openhands", "version": "...", "model_name": "...", "tool_definitions": [ {"type": "function", "function": {"name": "...", "parameters": {}}} ], "extra": {} }, "steps": [ {"step_id": 1, "timestamp": "2025-10-11T10:30:00Z", "source": "user", "message": "..."}, { "step_id": 2, "source": "agent", "model_name": "...", "reasoning_content": "...", "message": "...", "tool_calls": [{"tool_call_id": "call_1", "function_name": "tool", "arguments": {}}], "observation": {"results": [{"source_call_id": "call_1", "content": "..."}]}, "metrics": {} } ], "notes": "...", "final_metrics": {}, "continued_trajectory_ref": "...", "extra": {}, "subagent_trajectories": [] }Core ATIF objects:
schema_version, optional run-scopedsession_id, optional document-scopedtrajectory_id, requiredagent, required non-emptysteps, optionalnotes,final_metrics,continued_trajectory_ref,extra, andsubagent_trajectories.Agent: requirednameandversion; optionalmodel_name, OpenAI-styletool_definitions, andextra.Step: sequentialstep_id, optional ISO timestamp,sourceinsystem | user | agent, requiredmessageas string or multimodalContentPart[], agent-onlymodel_name,reasoning_effort,reasoning_content,tool_calls, andmetrics, plus optionalobservation,llm_call_count,is_copied_context, andextra.ToolCall: requiredtool_call_id,function_name,arguments, optionalextra.Observation/ObservationResult:results[]with optionalsource_call_id,contentstring or multimodalContentPart[], optionalsubagent_trajectory_ref[], optionalextra.Metrics/FinalMetrics: prompt/completion/cached token counts, cost, prompt/completion token IDs, logprobs, extras.ContentPartsupportstextandimage, withImageSource(media_type, path).SubagentTrajectoryRefresolves by embeddedtrajectory_idor externaltrajectory_path;session_idis informational.step.extra.context_managementconvention for compaction, pruning, injection, and boundary semantics.Major similarities
schema/src/harbor/models/trajectories/Trajectory(id, content, details, ...)Trajectory(agent, steps, final_metrics, ...)content[]event/action/observation streamsteps[]TextObservation.sourcesupportsuser,agent,environment; actions represent agent outputsStep.sourcesupportssystem,user,agent; observations represent environment/system feedbacksystem; ADP has explicitenvironment.ApiAction(function, kwargs)ToolCall(function_name, arguments)CodeAction(language, content)tool_call_idacross action and later observationToolCall.tool_call_idlinked byObservationResult.source_call_idwithin same stepAction.reasoning_content;descriptionfieldsStep.reasoning_content;reasoning_effortrewardon actions and observationsrewardfield todayImageObservationandWebObservation.image_observationContentPart(type=image, source=ImageSource)available_apisnames and optional datasetmetadata.jsonOpenAI tool specsagent.tool_definitionsOpenAI-style specsdetailsroot metadata; dataset-specific raw schemasextraat root, agent, step, tool call, metrics, observation result, subagent refMajor differences
1. Event/action stream vs LLM-turn step model
ADP stores a flat sequence where agent actions and environment observations are separate records:
ATIF stores a turn/step where one agent step may combine the assistant message, one or more tool calls, their observations, metrics, and reasoning:
Implication: ATIF can preserve “these multiple tool calls came from one LLM inference” directly. ADP can preserve exact chronological events, but currently cannot represent a single LLM inference that emitted multiple tool calls without splitting them into multiple
ApiAction/CodeActionrecords and losing the grouping unless encoded indetails.2. Root metadata and agent identity
ATIF has first-class
agent.name,agent.version,agent.model_name,agent.tool_definitions,session_id,trajectory_id,final_metrics,notes, andcontinued_trajectory_ref.ADP's root has
id, optionalavailable_apis, anddetails. This keeps standardized dataset samples compact and dataset-agnostic, but means many runtime-run fields are either absent or pushed intodetails.3. System messages and environment source
ATIF has
source="system"for system prompts and system-initiated operations. ADP'sTextObservation.sourceonly permitsuser,agent, andenvironment.ADP has an explicit
environmentsource for text/image observations and a dedicatedWebObservation; ATIF environment feedback is insideStep.observation.results, not a top-level source.Consequence: ATIF -> ADP needs a policy for system steps. ADP -> ATIF needs a policy for standalone environment observations that are not tool results.
4. Tool-call cardinality and result placement
ADP validates a
tool_call_idas one action to exactly one later observation. ATIF validatesObservationResult.source_call_idagainst tool calls within the same step, and supports multiple tool calls/results on a single agent step.This is probably the most important structural mismatch for unification:
Both are useful. A unified format may need both a “step/turn” grouping and a normalized “event/action/observation” view.
5. CodeAction and WebObservation are richer in ADP
ADP has first-class objects for:
CodeAction(language, content)with many language literals.WebObservation(html, axtree, url, image_observation, viewport_size).ImageObservation(annotations[])with bounding boxes and element types.ATIF can store these via tool calls, observation content, image content parts, and
extra, but it does not currently have standardized fields for accessibility tree, DOM/HTML, viewport size, or UI annotations.If ADP and ATIF unify, the combined format should avoid losing ADP's web/UI state fidelity.
6. Metrics, timestamps, and token-level data are richer in ATIF
ATIF standardizes:
timestampon steps.Metricswith prompt/completion/cached tokens, cost, prompt token IDs, completion token IDs, logprobs,extra.FinalMetricsat root.llm_call_count, including0for deterministic dispatch and>1for aggregated steps.ADP only has
rewardand free-formdetailsfor these concerns. ADP is therefore less immediately useful as an evaluation trace or RL rollout log without conventions for metrics.7. Context management and continuation
ATIF has
continued_trajectory_ref,is_copied_contextso SFT pipelines can filter copied history, andstep.extra.context_managementconventions for compaction/pruning/injection and boundary semantics.ADP currently lacks first-class equivalents. This matters for long-running agents and SFT correctness: copied context should not be treated as newly produced assistant behavior.
8. Subagents and hierarchical workflows
ATIF has first-class
subagent_trajectoriesandSubagentTrajectoryRefin observation results. ADP has no standardized subagent representation. It could carry this indetails, but consumers would need dataset-specific logic.9. Extension philosophy
ADP is strict and compact: core fields plus
details, with dataset-specific raw schemas outside the standardized model. ATIF is strict at core objects but providesextraat many levels, which makes it easier to preserve producer-specific metadata without schema churn.10. Repository scope
ADP includes many dataset adapters and agent-specific SFT converters. Its “format” includes a reproducible data pipeline:
Harbor/ATIF includes benchmark adapters, evaluators, run logs, validators, and trajectory files for agents run in environments. Its “format” is coupled to execution/evaluation trace capture:
These are adjacent but not identical use cases.
Conversion sketch
ADP -> ATIF
Likely mapping:
Trajectory.schema_versionextra.adp_schema_versionor mapped version metadataTrajectory.idtrajectory_id; maybe alsosession_idif no better run id existsdetailsextraornotesfor human-readable notesmetadata.json.custom_toolsagent.tool_definitionsavailable_apisagent.tool_definitionsfiltered to names, oragent.extra.available_apisTextObservation(source=user)Step(source=user, message=content)MessageActionStep(source=agent, message=content, reasoning_content=...)ApiAction+ matched observationStep(source=agent, message=description or "", tool_calls=[...], observation.results=[...])CodeAction+ matched observationStep(source=agent, tool_calls=[{function_name: "execute_code"/"execute_bash", arguments:{language, content}}], observation.results=[...])TextObservation(source=environment)withtool_call_idObservationResult(content=..., source_call_id=tool_call_id)ImageObservationContentPart(type=image)plus annotations inextra, or ATIF extension for annotationsWebObservationObservationResult.content/extra.webcontaininghtml,axtree,url, viewport, screenshot refrewardstep.extra.reward,observation.results[].extra.reward, or future ATIF reward fieldLossy points:
descriptionvsreasoning_contentvsMessageAction.contentrequires policy: isdescriptionvisible assistant text, hidden thought, or action rationale?WebObservationhas more structured web state than ATIF currently standardizes.ATIF -> ADP
Likely mapping:
schema_versiondetails.atif_schema_versiontrajectory_idorsession_idTrajectory.idagent,final_metrics,notes,continued_trajectory_ref, rootextradetailsStep(source=user)TextObservation(source=user, content=message)Step(source=system)TextObservation(source=environment, name="system")as a lossy fallbackStep(source=agent, no tool_calls)MessageAction(content=message, reasoning_content=reasoning_content)ToolCallApiAction(function=function_name, kwargs=arguments, tool_call_id=...), unless mapped toCodeActionby function name/argumentsObservationResultTextObservation/ImageObservation/WebObservationwith matchingtool_call_idwhen possiblemetricsdetailsor future ADP metrics fieldis_copied_contextdetailsor future ADP field; SFT converters should filter if truesubagent_trajectory_ref/subagent_trajectoriesdetailsor future ADP subagent fieldsLossy points:
tool_callsin one ATIF step become multiple ADP actions unless ADP adds grouping.metrics, timestamps, token IDs, logprobs, and cost have no first-class ADP location.environment.llm_call_count=0deterministic dispatch needs SFT filtering semantics in ADP converters.Proposed unification path
Phase 1: Build adapters and loss reports
Add scripts/tests that convert a small set of trajectories both ways:
sample_std.json-> ATIF.Useful test cases:
CodeActionand terminal output.WebObservation.ImageObservation.annotations.Phase 2: Decide canonical abstraction: event stream, step stream, or both
The formats can converge in two plausible ways:
stepsas canonical.My recommendation is option 2, or at minimum an ATIF-compatible grouping layer over ADP content, because ADP's current samples and converters are event-oriented while ATIF's strongest additions are turn-level metadata and grouping.
Phase 3: Schema changes to consider in ADP
Potential ADP additions, in rough priority order:
agentblock compatible with ATIFAgent.session_idand/ortrajectory_idwhile preserving existingidfor backward compatibility.timestampon content items or a grouped turn object.metricsandfinal_metricscompatible with ATIF names.systemas an allowed source or introduceSystemObservation/SystemMessage.metadata.json.custom_tools/available_apistoward ATIF-compatibletool_definitionsin the standardized record.extraat action/observation level, or define structured locations for producer-specific fields.is_copied_context,llm_call_count,continued_trajectory_ref, and context-management conventions for long-context agents.Phase 4: Schema changes to consider in ATIF
Potential ATIF additions or conventions to better cover ADP:
function_name="execute_code"with{language, content}or a dedicated code action extension.extra.webconventions:html,axtree,url,viewport_size, screenshot/image reference, and UI annotations / bounding boxes.messagevsreasoning_contentvs tool-call-only agent steps.Open questions for maintainers
content[]event/action stream considered a permanent canonical abstraction, or could it be replaced/grouped by turn-level steps?CodeActionandWebObservationeven if ATIF becomes the root envelope?available_apisevolve into full OpenAI-styletool_definitionsin the trajectory, matching ATIFagent.tool_definitions?is_copied_contextlive in ADP if we care about RL and SFT filtering?environment?is_copied_context=trueandllm_call_count=0if importing ATIF?Concrete next step proposal
Create a small interoperability PR that does not change ADP's canonical schema yet:
scripts/adp_to_atif.pyandscripts/atif_to_adp.pyas experimental converters.This issue was created by an AI agent (OpenHands) on behalf of the user after inspecting the ADP repository and cloning/reading
harbor-framework/harbor.