Dataset quality audit category: Metadata, provenance, and tool-description issues
The May 2026 dataset audit found 28 issues in this class:
metadata_or_provenance: 15
tool_description_quality: 13
Problem
Many datasets underuse details and action descriptions, making it difficult to trace standardized examples back to source split, task id, environment, website, tool inventory, or original record metadata. Some tool calls have empty or uninformative descriptions even when raw data contains richer context.
Examples
SALT-NLP_SWE-chat: numeric metadata in details is stored as strings, for example tool_call_count, turn_count, prompt_count, and session_success.
agenttuning_mind2web: details is empty, so source split, website, action id, and original record provenance are not preserved.
coderforge_preview: the only top-level metadata is details.reward; source repository, split, and tool availability are not consistently exposed.
codescout: some final assistant messages are JSON patch/localization artifacts embedded as MessageAction text rather than structured patch metadata.
codescout: sample records have mixed provenance detail; two examples include source instance metadata while others are generic codescout_default_train_* records.
agenttuning_alfworld: most tool calls have empty descriptions, reducing the usefulness of the action description field for reasoning traces.
Suggested work
- Define minimum recommended provenance fields for dataset samples, such as source dataset, split, upstream id, task/environment, and extraction date where applicable.
- Preserve typed metadata as native JSON values instead of strings when possible.
- Move structured patch, localization, or evaluation metadata out of free-form assistant text when a clearer ADP field exists.
- Ensure action descriptions are populated from meaningful raw thought/tool context, or leave them null rather than empty strings.
- Add documentation/examples for expected
details usage in dataset converters.
- Consider tests or lint checks for empty descriptions and obviously stringified numeric metadata.
Dataset quality audit category: Metadata, provenance, and tool-description issues
The May 2026 dataset audit found 28 issues in this class:
metadata_or_provenance: 15tool_description_quality: 13Problem
Many datasets underuse
detailsand action descriptions, making it difficult to trace standardized examples back to source split, task id, environment, website, tool inventory, or original record metadata. Some tool calls have empty or uninformative descriptions even when raw data contains richer context.Examples
SALT-NLP_SWE-chat: numeric metadata indetailsis stored as strings, for exampletool_call_count,turn_count,prompt_count, andsession_success.agenttuning_mind2web:detailsis empty, so source split, website, action id, and original record provenance are not preserved.coderforge_preview: the only top-level metadata isdetails.reward; source repository, split, and tool availability are not consistently exposed.codescout: some final assistant messages are JSON patch/localization artifacts embedded asMessageActiontext rather than structured patch metadata.codescout: sample records have mixed provenance detail; two examples include source instance metadata while others are genericcodescout_default_train_*records.agenttuning_alfworld: most tool calls have empty descriptions, reducing the usefulness of the actiondescriptionfield for reasoning traces.Suggested work
detailsusage in dataset converters.