Audit: align sample stages and complete trajectories

Dataset quality audit category: **Sample alignment and completeness issues**

The May 2026 dataset audit found **27** issues in this class:

- `sample_alignment`: 15
- `trajectory_completeness`: 8
- `sample_representativeness`: 4

## Problem

Some datasets do not keep the same records across `sample_raw.json`, `sample_std.json`, and `sample_sft.json`, while others produce trajectories that end mid-action or combine multiple independent tasks into one trajectory. This makes examples hard to inspect, regenerate, and trust.

## Examples

- `android_in_the_wild`: stage counts do not match; `sample_raw.json` has 3 records, but `sample_std.json` and `sample_sft.json` have 1.
- `go-browse-wa`: stage counts do not match; `sample_raw.json` has 100 records, but `sample_std.json` and `sample_sft.json` have 5.
- `nemotron_terminal_corpus`: stage counts do not match; `sample_raw.json` has 5 records, but `sample_std.json` and `sample_sft.json` have 4.
- `androidcontrol`: standardized ids (`0`, `20`, `40`) do not match SFT ids (`androidcontrol-0`, `androidcontrol-1`, `androidcontrol-2`).
- `CharlieDreemur_OpenManus-RL`: three of five standardized trajectories end on an `api_action` such as `perform_action` or a weather API call without a following observation or final answer.
- `agenttuning_os`: a second task is appended after a completed first task, making one ADP trajectory contain multiple independent OS problems.

## Suggested work

- Add or strengthen tests requiring sample stage counts and ids to align across raw, standardized, and SFT files.
- Ensure converters preserve the same record order through the pipeline.
- Avoid dropping raw records silently during standardization or SFT conversion; if filtering is intentional, document and encode it deterministically.
- Require trajectories to end with a plausible terminal state, final answer, or documented reason for truncation.
- Split multiple independent tasks into separate trajectories where appropriate.
- Keep representative samples small but broad enough to cover important action/observation edge cases.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit: align sample stages and complete trajectories #218

Problem

Examples

Suggested work

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Audit: align sample stages and complete trajectories #218

Description

Problem

Examples

Suggested work

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions