Dataset quality audit category: Conversation/action representation issues
The May 2026 dataset audit found 53 issues in this class:
sft_format_or_role: 17
role_or_source_mapping: 15
action_representation: 15
sft_placeholder: 6
Problem
Several datasets flatten structured behavior into plain text, map environment observations to the wrong source, or assign SFT roles inconsistently. These issues are especially risky because downstream SFT consumers may train on incorrect assistant/tool boundaries.
Examples
agenttuning_alfworld: root sample_sft.json marks plain acknowledgements such as OK. I'll follow... as from: "function_call" even though they contain no function-call syntax.
agenttuning_db: root SFT sample marks all assistant messages as function_call messages without function-call syntax.
agenttuning_mind2web: root SFT sample uses function_call for final-choice text without an actual function call.
agenttuning_alfworld: all 94 standardized text observations are marked source: "user", including environment responses immediately after API actions such as You pick up... and On the shelf....
agenttuning_db: SQL operations are not represented as CodeAction or ApiAction; SQL is embedded in assistant text or omitted entirely.
androidcontrol: root sample_sft.json is a placeholder conversation and is not derived from the standardized mobile trajectories.
Suggested work
- Ensure SFT messages containing actual function-call syntax use
from: "function_call", and plain assistant text does not.
- Fix converters rather than hand-editing generated sample JSON.
- Audit
TextObservation.source mapping for user/environment/agent boundaries, especially after tool/API calls.
- Represent executable commands, SQL, browser actions, and API calls with
CodeAction or ApiAction where the raw data supports it.
- Replace placeholder root
sample_sft.json files with pipeline-derived SFT samples.
- Add tests that detect function-call roles without function-call syntax, and function-call syntax outside
from: "function_call".
Dataset quality audit category: Conversation/action representation issues
The May 2026 dataset audit found 53 issues in this class:
sft_format_or_role: 17role_or_source_mapping: 15action_representation: 15sft_placeholder: 6Problem
Several datasets flatten structured behavior into plain text, map environment observations to the wrong source, or assign SFT roles inconsistently. These issues are especially risky because downstream SFT consumers may train on incorrect assistant/tool boundaries.
Examples
agenttuning_alfworld: rootsample_sft.jsonmarks plain acknowledgements such asOK. I'll follow...asfrom: "function_call"even though they contain no function-call syntax.agenttuning_db: root SFT sample marks all assistant messages asfunction_callmessages without function-call syntax.agenttuning_mind2web: root SFT sample usesfunction_callfor final-choice text without an actual function call.agenttuning_alfworld: all 94 standardized text observations are markedsource: "user", including environment responses immediately after API actions such asYou pick up...andOn the shelf....agenttuning_db: SQL operations are not represented asCodeActionorApiAction; SQL is embedded in assistant text or omitted entirely.androidcontrol: rootsample_sft.jsonis a placeholder conversation and is not derived from the standardized mobile trajectories.Suggested work
from: "function_call", and plain assistant text does not.TextObservation.sourcemapping for user/environment/agent boundaries, especially after tool/API calls.CodeActionorApiActionwhere the raw data supports it.sample_sft.jsonfiles with pipeline-derived SFT samples.from: "function_call".