Skip to content

Audit: fix SFT roles and action/source representation issues #217

@neubig

Description

@neubig

Dataset quality audit category: Conversation/action representation issues

The May 2026 dataset audit found 53 issues in this class:

  • sft_format_or_role: 17
  • role_or_source_mapping: 15
  • action_representation: 15
  • sft_placeholder: 6

Problem

Several datasets flatten structured behavior into plain text, map environment observations to the wrong source, or assign SFT roles inconsistently. These issues are especially risky because downstream SFT consumers may train on incorrect assistant/tool boundaries.

Examples

  • agenttuning_alfworld: root sample_sft.json marks plain acknowledgements such as OK. I'll follow... as from: "function_call" even though they contain no function-call syntax.
  • agenttuning_db: root SFT sample marks all assistant messages as function_call messages without function-call syntax.
  • agenttuning_mind2web: root SFT sample uses function_call for final-choice text without an actual function call.
  • agenttuning_alfworld: all 94 standardized text observations are marked source: "user", including environment responses immediately after API actions such as You pick up... and On the shelf....
  • agenttuning_db: SQL operations are not represented as CodeAction or ApiAction; SQL is embedded in assistant text or omitted entirely.
  • androidcontrol: root sample_sft.json is a placeholder conversation and is not derived from the standardized mobile trajectories.

Suggested work

  • Ensure SFT messages containing actual function-call syntax use from: "function_call", and plain assistant text does not.
  • Fix converters rather than hand-editing generated sample JSON.
  • Audit TextObservation.source mapping for user/environment/agent boundaries, especially after tool/API calls.
  • Represent executable commands, SQL, browser actions, and API calls with CodeAction or ApiAction where the raw data supports it.
  • Replace placeholder root sample_sft.json files with pipeline-derived SFT samples.
  • Add tests that detect function-call roles without function-call syntax, and function-call syntax outside from: "function_call".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions