Skip to content

feat(low-code-cdk): allow user-configurable check streams via hidden spec field on CheckStream#1013

Draft
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1777936380-check-stream-config-override
Draft

feat(low-code-cdk): allow user-configurable check streams via hidden spec field on CheckStream#1013
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1777936380-check-stream-config-override

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Summary

Implements airbytehq/airbyte-internal-issues#16299 (per the author's plan): adds an opt-in config_check_streams_path field to the declarative CheckStream component that lets a connection override the streams exercised during check via a hidden array-of-strings property the CDK injects into the connector's spec. Also converts the previous raise ValueError(...) for unknown stream names into (False, message) so users see a clean FAILED connection-status with a clear list of unknowns and their source (config path vs. manifest) instead of a Python traceback.

What changed:

  • YAML schema (source of truth). airbyte_cdk/sources/declarative/declarative_component_schema.yaml — added optional config_check_streams_path: string to the CheckStream definition. Not in required — purely additive.
  • Pydantic model. airbyte_cdk/sources/declarative/models/declarative_component_schema.py — mirrored the new field on the CheckStream Pydantic class. The Dagger regen succeeded locally but pulled in a lot of unrelated drift (e.g. RecordExpander, ScopesJoinStrategy removal, conint rewrite). To keep the diff minimal and reviewable, I reverted the regen and applied the field by hand, mirroring the existing Optional[str] = Field(...) style used throughout the file. Same approach PR feat: add any_of strategy to CheckStream for dynamic check stream selection #997 used for the same reason.
  • Runtime override + bug fix. airbyte_cdk/sources/declarative/checks/check_stream.py
    • New config_check_streams_path: Optional[str] = None field on the dataclass.
    • New _resolve_effective_stream_names helper resolves the override via dpath.get(config, path, separator=".", default=None); falls back to self.stream_names when the value is missing or an empty list; returns (False, "Config field '<path>' must be a list of stream names.") when the value is present but not a list.
    • Replaced the raise ValueError(...) with (False, message) that lists all unknown stream names at once, plus their source (config path '<path>' vs manifest) and the available stream list.
    • Iterates effective_stream_names for the availability-check loop instead of self.stream_names.
  • Factory wiring. airbyte_cdk/sources/declarative/parsers/model_to_component_factory.pycreate_check_stream now passes config_check_streams_path=model.config_check_streams_path through to the CheckStream constructor.
  • Spec injection. airbyte_cdk/sources/declarative/concurrent_declarative_source.py — new _inject_config_check_streams_property() runs after _post_process_manifest() and before Spec creation. When the manifest's check block is CheckStream with config_check_streams_path set and a spec block exists, it injects a hidden array-of-strings property at that key into spec.connection_specification.properties. Not added to required. If the manifest already declares the property, its shape is preserved and airbyte_hidden: true is forced (lenient policy). v1 only supports single-level paths; dotted paths raise a clear ValueError at construction time. Manifests without a spec block are left alone (the runtime override still works because it reads directly from config).
  • Tests.
    • Updated unit_tests/sources/declarative/checks/test_check_stream.py::test_check_stream_with_slices_as_list[test_try_to_check_invalid_stream] to assert the new (False, msg) return shape with the unknown name and the available list, instead of pytest.raises(ValueError).
    • Added 7 unit tests covering: path-not-set (unchanged behavior), path set + key missing → fallback, path set + empty list → fallback, path set + valid override → exercises override (and verifies the manifest's stream is not exercised), path set + one unknown → (False, msg), path set + valid + unknown mix → (False, msg) listing the unknowns, path set + non-list value → (False, "must be a list" msg).
    • Updated the existing parametrized integration test test_non_existent_static_stream to expect Status.FAILED (no longer pytest.raises(ValueError)).
    • Added 7 new tests in unit_tests/sources/declarative/test_concurrent_declarative_source.py: spec-injection happy path, spec-injection with additionalProperties: false, spec-injection preserving manifest-author-declared shape and forcing airbyte_hidden: true, dotted-path raises, end-to-end check with config_check_streams_path exercising s2 instead of s1, end-to-end with unknown override returns Status.FAILED (no raise), and end-to-end fallback to manifest when config key is missing.
  • Changelog. Added an Unreleased entry to CHANGELOG.md covering the opt-in and the ValueError(False, msg) conversion.

Coordination with #997 (still OPEN as draft)

PR #997 ("feat: add any_of strategy to CheckStream") modifies the same files for a related but different feature (issue airbytehq/airbyte-internal-issues#16231). Two important differences for reviewers to settle:

  1. raise ValueError vs (False, msg). PR feat: add any_of strategy to CheckStream for dynamic check stream selection #997 explicitly keeps the existing raise ValueError(...) for stream-name-not-in-catalog, on the rationale that an unknown name in the manifest is a manifest-author bug. Issue airbytehq/airbyte-internal-issues#16299 explicitly proposes converting this to (False, msg) for cleaner UX, distinguishing manifest-source vs config-source unknown-stream cases. This PR implements the proposed conversion and surfaces the source in the error message. If reviewers prefer to keep the raise for the manifest-author case and only return (False, msg) for the config-source case, that's a small targeted change to make.
  2. Branched from origin/main, not feat: add any_of strategy to CheckStream for dynamic check stream selection #997's head. Whichever lands first will create conflicts for the other; happy to rebase once a direction is picked.

Open design questions from issue airbytehq/airbyte-internal-issues#16299 (for reviewers)

  1. Conflict policy when the spec already declares the property. Implemented as lenient (preserve shape, force airbyte_hidden: true). Alternative: raise at construction time. Test test_check_stream_config_override_path_preserves_pre_declared_property_and_forces_hidden covers the lenient behavior.
  2. Single-level vs. dot-path in v1. Implemented as single-level only on the injection side (raises a clear error on dots). Read-side dpath already supports nesting. Future expansion is non-breaking since the Pydantic field stays a str.
  3. airbyte_hidden: true on array-typed fields. Worth a separate sanity check by someone with Connector Builder UI / Cloud UI access — if either of those surfaces doesn't honor airbyte_hidden for arrays, the override field would be visible to end users. The CDK side is correct; the question is purely platform behavior.
  4. additionalProperties: false specs. Test test_check_stream_config_override_path_injects_into_spec_with_additional_properties_false covers this path.
  5. ManifestNormalizer interaction. Injection happens after _post_process_manifest(); the injected fragment is hand-crafted and known-safe so I did not re-run validation against it. If reviewers prefer re-validation, easy to add.
  6. CheckDynamicStream parity. Out of scope for v1, as called out in the issue.

Declarative-First Evaluation

This change is declarative-first — it adds a new field to the CheckStream declarative component's schema and wires it through the existing factory and runtime. The dpath read in the runtime is the same pattern already used by ConfigComponentsResolver, HttpComponentsResolver, and other declarative components. No custom Python component was added.

Test Coverage

Unit + integration tests in two files (15 new test cases plus updates to two existing tests). All 110 tests in the targeted scope pass:

poetry run pytest unit_tests/sources/declarative/checks/test_check_stream.py unit_tests/sources/declarative/test_concurrent_declarative_source.py

Lint, format, and typecheck pass:

poetry run ruff format .          # 778 files left unchanged
poetry run ruff check .           # All checks passed!
poetry run mypy --config-file mypy.ini airbyte_cdk
# Found 1 error in 1 file (pre-existing pytz import-untyped, present on origin/main)

Two test_concurrent_declarative_source.py tests fail flakily on origin/main (test_read_with_concurrent_and_synchronous_streams_with_concurrent_state and test_read_concurrent_declarative_source[test_no_pagination_with_partition_router-...]); confirmed they fail on plain git stash'd origin/main as well, so they are not caused by this PR.

Breaking Change Evaluation

Not a breaking change. New field is optional and additive; runtime fallback preserves existing behavior. The ValueError(False, msg) conversion is a strictly more lenient surface — code that previously caught ValueError from CheckStream.check_connection will simply not see the exception anymore, but the public contract of check_connection (returning Tuple[bool, Any]) is unchanged and the result for the unknown-stream case becomes a normal failed-status rather than an unhandled exception.

Review & Testing Checklist for Human

  • Confirm whether the ValueError(False, msg) conversion is the desired direction given PR feat: add any_of strategy to CheckStream for dynamic check stream selection #997's opposite stance, and whether the source-of-error message format (config path '<path>' vs manifest) is helpful or noisy.
  • Verify Connector Builder UI and Airbyte Cloud UI honor airbyte_hidden: true for type: array properties — if not, the injected field would be visible to end users.
  • Sanity-check the lenient conflict policy in _inject_config_check_streams_property (preserve manifest-author shape, force airbyte_hidden: true); switch to raise-on-conflict if you'd rather be strict.

Notes

Resolves airbytehq/airbyte-internal-issues#16299:

Related: #997 (overlapping file scope, divergent on ValueError handling).

Triage session: https://app.devin.ai/sessions/ecc8d961bb1a4dcc84e429b0343cc646

Link to Devin session: https://app.devin.ai/sessions/b46161d268dc4a61840535521a64d966

…spec field on CheckStream

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1777936380-check-stream-config-override#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1777936380-check-stream-config-override

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

PyTest Results (Fast)

4 057 tests  +14   4 046 ✅ +14   7m 28s ⏱️ -25s
    1 suites ± 0      11 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit bb5e701. ± Comparison against base commit 8e92512.

This pull request removes 2 and adds 16 tests. Note that renamed tests count towards both.
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_slices_as_list[False-test_try_to_check_invalid stream-record3-streams_to_check3-stream_slice3-None]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_slices_as_list[True-test_try_to_check_invalid stream-record3-streams_to_check3-stream_slice3-None]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_mixed_valid_and_unknown_returns_message
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_non_list_returns_must_be_a_list_message
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_unknown_stream_returns_message
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_uses_expected_streams[path_set_but_config_missing_key_falls_back_to_manifest]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_uses_expected_streams[path_set_but_config_value_empty_falls_back_to_manifest]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_uses_expected_streams[path_set_with_valid_override_uses_config_value]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_slices_as_list[False-test_try_to_check_invalid_stream-record3-streams_to_check3-stream_slice3-expectation3]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_slices_as_list[True-test_try_to_check_invalid_stream-record3-streams_to_check3-stream_slice3-expectation3]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_unconfigured_path_keeps_original_behavior
unit_tests.sources.declarative.test_concurrent_declarative_source ‑ test_check_stream_config_override_falls_back_to_manifest_when_config_missing
…

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

PyTest Results (Full)

4 060 tests  +14   4 048 ✅ +14   11m 34s ⏱️ +18s
    1 suites ± 0      12 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit bb5e701. ± Comparison against base commit 8e92512.

This pull request removes 2 and adds 16 tests. Note that renamed tests count towards both.
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_slices_as_list[False-test_try_to_check_invalid stream-record3-streams_to_check3-stream_slice3-None]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_slices_as_list[True-test_try_to_check_invalid stream-record3-streams_to_check3-stream_slice3-None]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_mixed_valid_and_unknown_returns_message
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_non_list_returns_must_be_a_list_message
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_unknown_stream_returns_message
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_uses_expected_streams[path_set_but_config_missing_key_falls_back_to_manifest]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_uses_expected_streams[path_set_but_config_value_empty_falls_back_to_manifest]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_config_override_uses_expected_streams[path_set_with_valid_override_uses_config_value]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_slices_as_list[False-test_try_to_check_invalid_stream-record3-streams_to_check3-stream_slice3-expectation3]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_slices_as_list[True-test_try_to_check_invalid_stream-record3-streams_to_check3-stream_slice3-expectation3]
unit_tests.sources.declarative.checks.test_check_stream ‑ test_check_stream_with_unconfigured_path_keeps_original_behavior
unit_tests.sources.declarative.test_concurrent_declarative_source ‑ test_check_stream_config_override_falls_back_to_manifest_when_config_missing
…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants