Audit: preserve metadata/provenance and improve tool descriptions

Dataset quality audit category: **Metadata, provenance, and tool-description issues**

The May 2026 dataset audit found **28** issues in this class:

- `metadata_or_provenance`: 15
- `tool_description_quality`: 13

## Problem

Many datasets underuse `details` and action descriptions, making it difficult to trace standardized examples back to source split, task id, environment, website, tool inventory, or original record metadata. Some tool calls have empty or uninformative descriptions even when raw data contains richer context.

## Examples

- `SALT-NLP_SWE-chat`: numeric metadata in `details` is stored as strings, for example `tool_call_count`, `turn_count`, `prompt_count`, and `session_success`.
- `agenttuning_mind2web`: `details` is empty, so source split, website, action id, and original record provenance are not preserved.
- `coderforge_preview`: the only top-level metadata is `details.reward`; source repository, split, and tool availability are not consistently exposed.
- `codescout`: some final assistant messages are JSON patch/localization artifacts embedded as `MessageAction` text rather than structured patch metadata.
- `codescout`: sample records have mixed provenance detail; two examples include source instance metadata while others are generic `codescout_default_train_*` records.
- `agenttuning_alfworld`: most tool calls have empty descriptions, reducing the usefulness of the action `description` field for reasoning traces.

## Suggested work

- Define minimum recommended provenance fields for dataset samples, such as source dataset, split, upstream id, task/environment, and extraction date where applicable.
- Preserve typed metadata as native JSON values instead of strings when possible.
- Move structured patch, localization, or evaluation metadata out of free-form assistant text when a clearer ADP field exists.
- Ensure action descriptions are populated from meaningful raw thought/tool context, or leave them null rather than empty strings.
- Add documentation/examples for expected `details` usage in dataset converters.
- Consider tests or lint checks for empty descriptions and obviously stringified numeric metadata.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit: preserve metadata/provenance and improve tool descriptions #219

Problem

Examples

Suggested work

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Audit: preserve metadata/provenance and improve tool descriptions #219

Description

Problem

Examples

Suggested work

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions