fix: extend RequestBodyJsonObject to accept string values and migrate RequestBodyPlainText with JSON content#1008
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1777590686-fix-request-body-plain-text-json-routing#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1777590686-fix-request-body-plain-text-json-routingPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
…to request_body_json The Connector Builder UI sometimes generates RequestBodyPlainText for raw JSON string bodies. After CDK v7.17.1 (PR #971), RequestBodyPlainText is correctly routed to request_body_data (form-encoded), but this broke connectors where the Builder had misclassified JSON content as plain text. This adds a new manifest migration that detects RequestBodyPlainText where the value is a JSON-like string and converts it back to string-valued request_body_json, which is handled correctly by InterpolatedNestedRequestInputProvider. Co-Authored-By: bot_apk <apk@cognition.ai>
… RequestBodyPlainText with JSON content - Extend RequestBodyJsonObject schema to accept both dict and string values (anyOf: [object, string]) to support JSON string templates with Jinja - Update Pydantic model to Union[Dict[str, Any], str] - Refactor migration to convert RequestBodyPlainText+JSON -> RequestBodyJsonObject (string value) instead of deprecated request_body_json - No routing changes needed: InterpolatedNestedRequestInputProvider already handles both strings and dicts Resolves airbytehq/oncall#12007 Co-Authored-By: bot_apk <apk@cognition.ai>
c894b81 to
4c6d3b3
Compare
E2E Test Report —
|
| Scenario | Branch | Content-Type sent to upstream |
Body sent | Result |
|---|---|---|---|---|
| JSON body, no booleans | main |
application/x-www-form-urlencoded |
sort=field&sort=order&limit=25 |
Bug — body corrupted into form-encoded keys, all values lost |
| JSON body, no booleans | pr-1008 |
application/json |
{"sort": [{"field": "createdAt", "order": "DESC"}], "limit": 25} |
Fixed |
JSON body containing true |
main |
(none) | raw string { "sort": [...], "filter": [..., "value": true]} |
Bug — no application/json, sent as plain text |
JSON body containing true |
pr-1008 |
request never sent | request never sent | Regression — ValueError: Request body json cannot be a string |
Root cause of the regression on pr-1008
The migration in this PR converts RequestBodyPlainText (string value) → RequestBodyJsonObject with a string value. InterpolatedNestedRequestInputProvider then routes string inputs to InterpolatedString.eval(), which calls _literal_eval → ast.literal_eval. ast.literal_eval accepts Python literals (True/False/None) but rejects JSON literals (true/false/null), so for any JSON body containing them it returns the raw string unchanged. The string then fails the guard at airbyte_cdk/sources/declarative/requesters/http_requester.py:405-406:
if isinstance(options, str):
raise ValueError("Request body json cannot be a string")Reproducer: just ast.literal_eval('{"value": true}') → ValueError: malformed node or string.
Suggestion
Have the migration convert the string value to a parsed dict at migration time using json.loads (with a graceful fallback for Jinja-only fragments), so the resulting request_body_json is a Mapping, not a str. That avoids the ast.literal_eval boolean-parsing limitation entirely and bypasses the http_requester isinstance(options, str) guard. Alternatively, change _literal_eval (or its caller) to try json.loads before/instead of ast.literal_eval for body-json contexts.
Test artifacts (mock server, manifest, raw outputs)
mock_server.py — fictional fixture, identical for both branches:
from http.server import BaseHTTPRequestHandler, HTTPServer
import json
RECORDS = [
{"id": "rec-001", "name": "Widget Alpha", "value": "alpha-value"},
{"id": "rec-002", "name": "Widget Beta", "value": "beta-value"},
{"id": "rec-003", "name": "Widget Gamma", "value": "gamma-value"},
]
class MockHandler(BaseHTTPRequestHandler):
def _handle(self):
length = int(self.headers.get("Content-Length", 0))
body = self.rfile.read(length).decode("utf-8", errors="replace") if length else ""
print(f"=== {self.command} {self.path} ===")
for h, v in self.headers.items():
print(f" {h}: {v}")
print(f"Body ({len(body)} bytes): {body!r}")
payload = json.dumps({"data": RECORDS}).encode()
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.send_header("Content-Length", str(len(payload)))
self.end_headers()
self.wfile.write(payload)
do_GET = _handle
do_POST = _handle
HTTPServer(("127.0.0.1", 9999), MockHandler).serve_forever()Test manifest A — JSON body with true (the realistic Shopware-shaped case):
{
"manifest": {
"version": "7.18.1",
"type": "DeclarativeSource",
"__should_migrate": true,
"check": {"type": "CheckStream", "stream_names": ["widgets"]},
"spec": {"type": "Spec", "connection_specification": {"type": "object", "properties": {}, "additionalProperties": false}},
"streams": [{
"type": "DeclarativeStream",
"name": "widgets",
"primary_key": ["id"],
"retriever": {
"type": "SimpleRetriever",
"requester": {
"type": "HttpRequester",
"url_base": "http://127.0.0.1:9999",
"path": "/api/search/widget",
"http_method": "POST",
"request_body": {
"type": "RequestBodyPlainText",
"value": "{ \"sort\": [ { \"field\": \"createdAt\", \"order\": \"DESC\" } ], \"filter\": [ { \"type\": \"equals\", \"field\": \"active\", \"value\": true } ] }"
}
},
"record_selector": {"type": "RecordSelector", "extractor": {"type": "DpathExtractor", "field_path": ["data"]}}
},
"schema_loader": {"type": "InlineSchemaLoader", "schema": {"type": "object", "properties": {"id": {"type": "string"}, "name": {"type": "string"}, "value": {"type": "string"}}}}
}]
},
"config": {},
"stream_name": "widgets",
"record_limit": 10,
"page_limit": 1,
"slice_limit": 1,
"state": []
}Test manifest B is identical except the request_body.value is "{ \"sort\": [ { \"field\": \"createdAt\", \"order\": \"DESC\" } ], \"limit\": 25 }" (no booleans).
Outgoing request observed by the mock server
main + manifest B (no booleans):
=== POST /api/search/widget ===
Content-Type: application/x-www-form-urlencoded
Content-Length: 30
Body (30 bytes): 'sort=field&sort=order&limit=25'
pr-1008 + manifest B (no booleans):
=== POST /api/search/widget ===
Content-Type: application/json
Content-Length: 64
Body (64 bytes): '{"sort": [{"field": "createdAt", "order": "DESC"}], "limit": 25}'
main + manifest A (with true):
=== POST /api/search/widget ===
(no Content-Type header)
Body (131 bytes): '{ "sort": [ { "field": "createdAt", "order": "DESC" } ], "filter": [ { "type": "equals", "field": "active", "value": true } ] }'
pr-1008 + manifest A (with true): no upstream request — fails before send with:
{
"logs": [{
"level": "ERROR",
"message": "An unexpected error occurred in stream widgets: ValueError",
"internal_message": "Request body json cannot be a string"
}],
"slices": [],
...
}Stack trace (truncated) ends at:
File "airbyte_cdk/sources/declarative/requesters/http_requester.py", line 406, in _request_body_json
raise ValueError("Request body json cannot be a string")
Skill validation note (for ai-skills#208 reviewers)
The skill's procedure surfaced this regression cleanly — the integrity rule "bug must reproduce on main, fix must work on PR branch with the SAME mock server and SAME payload" caught the issue immediately when the PR branch returned ValueError instead of the expected JSON-encoded request. The PII guardrails were exercised: no Shopware-specific names, customer URLs, or filter values from oncall airbytehq/oncall#12007 leaked into mock data or the test manifest. Recommended micro-improvement to the skill: an explicit step in the matrix to vary one input dimension (here, JSON-boolean presence) to catch partial fixes that pass on the simple case but fail on the realistic one.
Summary
Extends
RequestBodyJsonObjectto accept both dict and string values, filling a gap in therequest_bodytype system where JSON string templates (Jinja-interpolated strings that evaluate to JSON at runtime) had no valid non-deprecated type.Adds a manifest migration that converts
RequestBodyPlainTextwith JSON-like content toRequestBodyJsonObjectwith a string value, fixing the regression introduced in CDK v7.17.1 where such bodies were incorrectly routed to form-encodedrequest_body_data.Changes:
declarative_component_schema.yaml):RequestBodyJsonObject.valuenow acceptsanyOf: [object, string]declarative_component_schema.py):valueupdated toUnion[Dict[str, Any], str]RequestBodyPlainTextwith JSON-like string values and converts toRequestBodyJsonObject(only the type tag changes, string value preserved)No routing or runtime changes needed —
InterpolatedNestedRequestInputProvideralready handles both strings and dicts.Resolves https://github.com/airbytehq/oncall/issues/12007:
Review & Testing Checklist for Human
RequestBodyJsonObjectschema change (anyOf: [object, string]) is acceptable — this is a schema expansion, not a breaking change, but it changes what the Builder UI schema accepts_is_json_like()— it usesstartswith('{')excluding{{and{%— confirm edge cases are coveredRequestBodyJsonObject+ string value and confirm the request body is sent as JSON (not form-encoded)declarative_component_schema.py) will be regenerated correctly by CI's codegen step — I manually edited the auto-generated file to match the schema changeNotes
Background: The deprecated
request_body_jsonacceptedstring | dict. Its replacementRequestBodyJsonObjectonly accepted dicts. The Builder UI filled the gap by generatingRequestBodyPlainTextfor JSON strings, which worked before v7.17.1 by accident and broke after the routing fix in #971.Follow-up work:
RequestBodyJsonObjectinstead ofRequestBodyPlainTextfor JSON string bodiesHttpRequesterRequestBodyJsonDataToRequestBody) to also migrate string-valuedrequest_body_jsonnow thatRequestBodyJsonObjectaccepts stringsLink to Devin session: https://app.devin.ai/sessions/711687e098424d269c188290dd8cc86d