Skip to content

Audit: preserve metadata/provenance and improve tool descriptions #219

@neubig

Description

@neubig

Dataset quality audit category: Metadata, provenance, and tool-description issues

The May 2026 dataset audit found 28 issues in this class:

  • metadata_or_provenance: 15
  • tool_description_quality: 13

Problem

Many datasets underuse details and action descriptions, making it difficult to trace standardized examples back to source split, task id, environment, website, tool inventory, or original record metadata. Some tool calls have empty or uninformative descriptions even when raw data contains richer context.

Examples

  • SALT-NLP_SWE-chat: numeric metadata in details is stored as strings, for example tool_call_count, turn_count, prompt_count, and session_success.
  • agenttuning_mind2web: details is empty, so source split, website, action id, and original record provenance are not preserved.
  • coderforge_preview: the only top-level metadata is details.reward; source repository, split, and tool availability are not consistently exposed.
  • codescout: some final assistant messages are JSON patch/localization artifacts embedded as MessageAction text rather than structured patch metadata.
  • codescout: sample records have mixed provenance detail; two examples include source instance metadata while others are generic codescout_default_train_* records.
  • agenttuning_alfworld: most tool calls have empty descriptions, reducing the usefulness of the action description field for reasoning traces.

Suggested work

  • Define minimum recommended provenance fields for dataset samples, such as source dataset, split, upstream id, task/environment, and extraction date where applicable.
  • Preserve typed metadata as native JSON values instead of strings when possible.
  • Move structured patch, localization, or evaluation metadata out of free-form assistant text when a clearer ADP field exists.
  • Ensure action descriptions are populated from meaningful raw thought/tool context, or leave them null rather than empty strings.
  • Add documentation/examples for expected details usage in dataset converters.
  • Consider tests or lint checks for empty descriptions and obviously stringified numeric metadata.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions