Skip to content

fix: extend RequestBodyJsonObject to accept string values and migrate RequestBodyPlainText with JSON content#1008

Draft
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1777590686-fix-request-body-plain-text-json-routing
Draft

fix: extend RequestBodyJsonObject to accept string values and migrate RequestBodyPlainText with JSON content#1008
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1777590686-fix-request-body-plain-text-json-routing

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Apr 30, 2026

Summary

Extends RequestBodyJsonObject to accept both dict and string values, filling a gap in the request_body type system where JSON string templates (Jinja-interpolated strings that evaluate to JSON at runtime) had no valid non-deprecated type.

Adds a manifest migration that converts RequestBodyPlainText with JSON-like content to RequestBodyJsonObject with a string value, fixing the regression introduced in CDK v7.17.1 where such bodies were incorrectly routed to form-encoded request_body_data.

Changes:

  1. Schema (declarative_component_schema.yaml): RequestBodyJsonObject.value now accepts anyOf: [object, string]
  2. Pydantic model (declarative_component_schema.py): value updated to Union[Dict[str, Any], str]
  3. Migration: Detects RequestBodyPlainText with JSON-like string values and converts to RequestBodyJsonObject (only the type tag changes, string value preserved)
  4. Tests: 17 unit tests + integration test

No routing or runtime changes needed — InterpolatedNestedRequestInputProvider already handles both strings and dicts.

Resolves https://github.com/airbytehq/oncall/issues/12007:

Review & Testing Checklist for Human

  • Verify the RequestBodyJsonObject schema change (anyOf: [object, string]) is acceptable — this is a schema expansion, not a breaking change, but it changes what the Builder UI schema accepts
  • Verify the JSON detection heuristic in _is_json_like() — it uses startswith('{') excluding {{ and {% — confirm edge cases are covered
  • Test end-to-end with a Builder connector that uses a JSON string body: create a manifest with RequestBodyJsonObject + string value and confirm the request body is sent as JSON (not form-encoded)
  • Confirm the Pydantic model file (declarative_component_schema.py) will be regenerated correctly by CI's codegen step — I manually edited the auto-generated file to match the schema change

Notes

Background: The deprecated request_body_json accepted string | dict. Its replacement RequestBodyJsonObject only accepted dicts. The Builder UI filled the gap by generating RequestBodyPlainText for JSON strings, which worked before v7.17.1 by accident and broke after the routing fix in #971.

Follow-up work:

  • Builder UI fix (platform-internal): generate RequestBodyJsonObject instead of RequestBodyPlainText for JSON string bodies
  • Optional: update migration 3 (HttpRequesterRequestBodyJsonDataToRequestBody) to also migrate string-valued request_body_json now that RequestBodyJsonObject accepts strings

Link to Devin session: https://app.devin.ai/sessions/711687e098424d269c188290dd8cc86d

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1777590686-fix-request-body-plain-text-json-routing#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1777590686-fix-request-body-plain-text-json-routing

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

PyTest Results (Fast)

4 061 tests  +18   4 050 ✅ +18   7m 46s ⏱️ -7s
    1 suites ± 0      11 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 4c6d3b3. ± Comparison against base commit 8e92512.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

PyTest Results (Full)

4 064 tests  +18   4 052 ✅ +18   10m 56s ⏱️ -20s
    1 suites ± 0      12 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 4c6d3b3. ± Comparison against base commit 8e92512.

♻️ This comment has been updated with latest results.

devin-ai-integration Bot and others added 2 commits May 5, 2026 16:50
…to request_body_json

The Connector Builder UI sometimes generates RequestBodyPlainText for raw
JSON string bodies. After CDK v7.17.1 (PR #971), RequestBodyPlainText is
correctly routed to request_body_data (form-encoded), but this broke
connectors where the Builder had misclassified JSON content as plain text.

This adds a new manifest migration that detects RequestBodyPlainText where
the value is a JSON-like string and converts it back to string-valued
request_body_json, which is handled correctly by
InterpolatedNestedRequestInputProvider.

Co-Authored-By: bot_apk <apk@cognition.ai>
… RequestBodyPlainText with JSON content

- Extend RequestBodyJsonObject schema to accept both dict and string values
  (anyOf: [object, string]) to support JSON string templates with Jinja
- Update Pydantic model to Union[Dict[str, Any], str]
- Refactor migration to convert RequestBodyPlainText+JSON -> RequestBodyJsonObject
  (string value) instead of deprecated request_body_json
- No routing changes needed: InterpolatedNestedRequestInputProvider already
  handles both strings and dicts

Resolves airbytehq/oncall#12007

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration devin-ai-integration Bot force-pushed the devin/1777590686-fix-request-body-plain-text-json-routing branch from c894b81 to 4c6d3b3 Compare May 5, 2026 16:56
@devin-ai-integration devin-ai-integration Bot changed the title fix: add migration to convert RequestBodyPlainText with JSON content to request_body_json fix: extend RequestBodyJsonObject to accept string values and migrate RequestBodyPlainText with JSON content May 5, 2026
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

E2E Test Report — source-connector-builder-ui-e2e-testing skill (from airbytehq/ai-skills#208)

Triggered from oncall airbytehq/oncall#12007 (Shopware form-encoded body). Ran the full 6-step procedure against this PR using a fictional mock server on 127.0.0.1:9999 and the Manifest Server's /v1/manifest/test_read endpoint.

TL;DR — fix is partial. This PR resolves the form-encoding bug for JSON bodies that contain only Python-literal-compatible values, but regresses with a ValueError when the JSON body contains JSON booleans or null (true, false, null). Real Shopware filter bodies (and any non-trivial JSON request) commonly contain these.

Test matrix

Scenario Branch Content-Type sent to upstream Body sent Result
JSON body, no booleans main application/x-www-form-urlencoded sort=field&sort=order&limit=25 Bug — body corrupted into form-encoded keys, all values lost
JSON body, no booleans pr-1008 application/json {"sort": [{"field": "createdAt", "order": "DESC"}], "limit": 25} Fixed
JSON body containing true main (none) raw string { "sort": [...], "filter": [..., "value": true]} Bug — no application/json, sent as plain text
JSON body containing true pr-1008 request never sent request never sent Regression — ValueError: Request body json cannot be a string

Root cause of the regression on pr-1008

The migration in this PR converts RequestBodyPlainText (string value) → RequestBodyJsonObject with a string value. InterpolatedNestedRequestInputProvider then routes string inputs to InterpolatedString.eval(), which calls _literal_evalast.literal_eval. ast.literal_eval accepts Python literals (True/False/None) but rejects JSON literals (true/false/null), so for any JSON body containing them it returns the raw string unchanged. The string then fails the guard at airbyte_cdk/sources/declarative/requesters/http_requester.py:405-406:

if isinstance(options, str):
    raise ValueError("Request body json cannot be a string")

Reproducer: just ast.literal_eval('{"value": true}')ValueError: malformed node or string.

Suggestion

Have the migration convert the string value to a parsed dict at migration time using json.loads (with a graceful fallback for Jinja-only fragments), so the resulting request_body_json is a Mapping, not a str. That avoids the ast.literal_eval boolean-parsing limitation entirely and bypasses the http_requester isinstance(options, str) guard. Alternatively, change _literal_eval (or its caller) to try json.loads before/instead of ast.literal_eval for body-json contexts.

Test artifacts (mock server, manifest, raw outputs)

mock_server.py — fictional fixture, identical for both branches:

from http.server import BaseHTTPRequestHandler, HTTPServer
import json

RECORDS = [
    {"id": "rec-001", "name": "Widget Alpha", "value": "alpha-value"},
    {"id": "rec-002", "name": "Widget Beta",  "value": "beta-value"},
    {"id": "rec-003", "name": "Widget Gamma", "value": "gamma-value"},
]

class MockHandler(BaseHTTPRequestHandler):
    def _handle(self):
        length = int(self.headers.get("Content-Length", 0))
        body = self.rfile.read(length).decode("utf-8", errors="replace") if length else ""
        print(f"=== {self.command} {self.path} ===")
        for h, v in self.headers.items():
            print(f"  {h}: {v}")
        print(f"Body ({len(body)} bytes): {body!r}")
        payload = json.dumps({"data": RECORDS}).encode()
        self.send_response(200)
        self.send_header("Content-Type", "application/json")
        self.send_header("Content-Length", str(len(payload)))
        self.end_headers()
        self.wfile.write(payload)
    do_GET = _handle
    do_POST = _handle

HTTPServer(("127.0.0.1", 9999), MockHandler).serve_forever()

Test manifest A — JSON body with true (the realistic Shopware-shaped case):

{
  "manifest": {
    "version": "7.18.1",
    "type": "DeclarativeSource",
    "__should_migrate": true,
    "check": {"type": "CheckStream", "stream_names": ["widgets"]},
    "spec": {"type": "Spec", "connection_specification": {"type": "object", "properties": {}, "additionalProperties": false}},
    "streams": [{
      "type": "DeclarativeStream",
      "name": "widgets",
      "primary_key": ["id"],
      "retriever": {
        "type": "SimpleRetriever",
        "requester": {
          "type": "HttpRequester",
          "url_base": "http://127.0.0.1:9999",
          "path": "/api/search/widget",
          "http_method": "POST",
          "request_body": {
            "type": "RequestBodyPlainText",
            "value": "{   \"sort\": [ { \"field\": \"createdAt\", \"order\": \"DESC\" } ],   \"filter\": [ { \"type\": \"equals\", \"field\": \"active\", \"value\": true } ] }"
          }
        },
        "record_selector": {"type": "RecordSelector", "extractor": {"type": "DpathExtractor", "field_path": ["data"]}}
      },
      "schema_loader": {"type": "InlineSchemaLoader", "schema": {"type": "object", "properties": {"id": {"type": "string"}, "name": {"type": "string"}, "value": {"type": "string"}}}}
    }]
  },
  "config": {},
  "stream_name": "widgets",
  "record_limit": 10,
  "page_limit": 1,
  "slice_limit": 1,
  "state": []
}

Test manifest B is identical except the request_body.value is "{ \"sort\": [ { \"field\": \"createdAt\", \"order\": \"DESC\" } ], \"limit\": 25 }" (no booleans).

Outgoing request observed by the mock server

main + manifest B (no booleans):

=== POST /api/search/widget ===
  Content-Type: application/x-www-form-urlencoded
  Content-Length: 30
Body (30 bytes): 'sort=field&sort=order&limit=25'

pr-1008 + manifest B (no booleans):

=== POST /api/search/widget ===
  Content-Type: application/json
  Content-Length: 64
Body (64 bytes): '{"sort": [{"field": "createdAt", "order": "DESC"}], "limit": 25}'

main + manifest A (with true):

=== POST /api/search/widget ===
  (no Content-Type header)
Body (131 bytes): '{   "sort": [ { "field": "createdAt", "order": "DESC" } ],   "filter": [ { "type": "equals", "field": "active", "value": true } ] }'

pr-1008 + manifest A (with true): no upstream request — fails before send with:

{
  "logs": [{
    "level": "ERROR",
    "message": "An unexpected error occurred in stream widgets: ValueError",
    "internal_message": "Request body json cannot be a string"
  }],
  "slices": [],
  ...
}

Stack trace (truncated) ends at:

File "airbyte_cdk/sources/declarative/requesters/http_requester.py", line 406, in _request_body_json
    raise ValueError("Request body json cannot be a string")

Skill validation note (for ai-skills#208 reviewers)

The skill's procedure surfaced this regression cleanly — the integrity rule "bug must reproduce on main, fix must work on PR branch with the SAME mock server and SAME payload" caught the issue immediately when the PR branch returned ValueError instead of the expected JSON-encoded request. The PII guardrails were exercised: no Shopware-specific names, customer URLs, or filter values from oncall airbytehq/oncall#12007 leaked into mock data or the test manifest. Recommended micro-improvement to the skill: an explicit step in the matrix to vary one input dimension (here, JSON-boolean presence) to catch partial fixes that pass on the simple case but fail on the realistic one.


Devin session

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants