-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Description
Python: Normalize OpenAI function-call arguments at parse time to prevent unicode escape corruption
Problem
When an LLM-powered agent edits source files containing Python/JavaScript unicode escape sequences like \u2192, the OpenAI code path corrupts these sequences due to double JSON parsing.
Root cause
The Anthropic and OpenAI backends handle function-call arguments differently:
- Anthropic: Returns
content_block.inputas a parsed dict. Stored directly —parse_arguments()returns it as-is. 1 JSON parse total. - OpenAI: Returns
tool.function.argumentsas a raw JSON string. Stored as a string, thenparse_arguments()callsjson.loads()again. 2 JSON parses total.
The second json.loads() re-interprets \uXXXX sequences as JSON unicode escapes, corrupting the original intent:
# A source file contains the Python escape: \u2192
# The model correctly generates \\u2192 in its JSON arguments
# Anthropic path (1 parse):
content_block.input = {"old_string": "\\u2192"} # SDK parsed → \u2192 ✓
# OpenAI path (2 parses):
tool.function.arguments = '{"old_string": "\\u2192"}' # stored as string
json.loads(arguments) → {"old_string": "→"} # \u2192 interpreted as unicode escape ✗The same model output that works correctly on Anthropic produces a corrupted value on OpenAI. The \u2192 (literal 6-char Python escape) becomes → (a single Unicode character), causing edit_file to either fail to match or write incorrect content.
Impact
This affects any tool that reads/writes source code containing \uXXXX escape sequences (Python, JavaScript, Java, C#, JSON). In practice, agents enter retry loops (10+ failed edit_file attempts observed) trying different escaping levels, wasting tokens and often ultimately writing corrupted code.
What changed
- Added
normalize_function_call_arguments()helper in_types.pythat eagerly parses JSON-string arguments into dicts at the provider-parsing layer - Applied normalization in
OpenAIChatClient._parse_tool_calls_from_openai()and three non-streaming parse sites inOpenAIResponsesClient - Updated
_prepare_content_for_openai()in the responses client to re-serialize dict arguments back to JSON strings when sending to the API (the chat client already handled this at line 704) - Updated 2 test assertions that expected raw string arguments to expect parsed dicts
Streaming deltas (response.function_call_arguments.delta) are intentionally not normalized since they contain partial JSON fragments.
Validation
uv run python -m pytest packages/core/tests/openai/test_openai_chat_client.py \
packages/core/tests/openai/test_openai_responses_client.py \
-m "not integration" -qAll 183 tests pass.
Before / After comparison
from agent_framework._types import normalize_function_call_arguments
# Model generates \\u2192 in its JSON output — the correct escaping for literal \u2192
args = '{"old_string": "\\\\u2192"}'
# BEFORE: stored as string, then double-parsed
import json
json.loads(args)["old_string"] # → '\\u2192' (2 backslashes — wrong)
# AFTER: normalized once at parse time, parse_arguments() returns dict directly
normalize_function_call_arguments(args)["old_string"] # → '\\u2192' (same parse)
# Then parse_arguments() sees a Mapping and returns it — no second json.loadsThe fix makes the OpenAI path behave identically to the Anthropic path: arguments are parsed once and stored as a dict. parse_arguments() returns the dict directly without a second json.loads() call.
PR
Related
- Python: Normalize provider tool-call argument envelopes across chat backends #4740 / Python: Normalize OpenAI tool-call argument envelopes on parse #4741 — Same problem space (OpenAI argument envelope normalization), but focused on the string-vs-dict type inconsistency rather than the unicode escape corruption specifically.
Code Sample
Error Messages / Stack Traces
Package Versions
rc5
Python Version
Python 3.14
Additional Context
No response