feat(low-code-cdk): allow user-configurable check streams via hidden spec field on CheckStream#1013
Conversation
…spec field on CheckStream Co-Authored-By: bot_apk <apk@cognition.ai>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1777936380-check-stream-config-override#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1777936380-check-stream-config-overridePR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
PyTest Results (Fast)4 057 tests +14 4 046 ✅ +14 7m 28s ⏱️ -25s Results for commit bb5e701. ± Comparison against base commit 8e92512. This pull request removes 2 and adds 16 tests. Note that renamed tests count towards both. |
PyTest Results (Full)4 060 tests +14 4 048 ✅ +14 11m 34s ⏱️ +18s Results for commit bb5e701. ± Comparison against base commit 8e92512. This pull request removes 2 and adds 16 tests. Note that renamed tests count towards both. |
Summary
Implements airbytehq/airbyte-internal-issues#16299 (per the author's plan): adds an opt-in
config_check_streams_pathfield to the declarativeCheckStreamcomponent that lets a connection override the streams exercised duringcheckvia a hidden array-of-strings property the CDK injects into the connector'sspec. Also converts the previousraise ValueError(...)for unknown stream names into(False, message)so users see a cleanFAILEDconnection-status with a clear list of unknowns and their source (config path vs. manifest) instead of a Python traceback.What changed:
airbyte_cdk/sources/declarative/declarative_component_schema.yaml— added optionalconfig_check_streams_path: stringto theCheckStreamdefinition. Not inrequired— purely additive.airbyte_cdk/sources/declarative/models/declarative_component_schema.py— mirrored the new field on theCheckStreamPydantic class. The Dagger regen succeeded locally but pulled in a lot of unrelated drift (e.g.RecordExpander,ScopesJoinStrategyremoval,conintrewrite). To keep the diff minimal and reviewable, I reverted the regen and applied the field by hand, mirroring the existingOptional[str] = Field(...)style used throughout the file. Same approach PR feat: addany_ofstrategy toCheckStreamfor dynamic check stream selection #997 used for the same reason.airbyte_cdk/sources/declarative/checks/check_stream.pyconfig_check_streams_path: Optional[str] = Nonefield on the dataclass._resolve_effective_stream_nameshelper resolves the override viadpath.get(config, path, separator=".", default=None); falls back toself.stream_nameswhen the value is missing or an empty list; returns(False, "Config field '<path>' must be a list of stream names.")when the value is present but not a list.raise ValueError(...)with(False, message)that lists all unknown stream names at once, plus their source (config path '<path>'vsmanifest) and the available stream list.effective_stream_namesfor the availability-check loop instead ofself.stream_names.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py—create_check_streamnow passesconfig_check_streams_path=model.config_check_streams_paththrough to theCheckStreamconstructor.airbyte_cdk/sources/declarative/concurrent_declarative_source.py— new_inject_config_check_streams_property()runs after_post_process_manifest()and beforeSpeccreation. When the manifest'scheckblock isCheckStreamwithconfig_check_streams_pathset and aspecblock exists, it injects a hidden array-of-strings property at that key intospec.connection_specification.properties. Not added torequired. If the manifest already declares the property, its shape is preserved andairbyte_hidden: trueis forced (lenient policy). v1 only supports single-level paths; dotted paths raise a clearValueErrorat construction time. Manifests without aspecblock are left alone (the runtime override still works because it reads directly from config).unit_tests/sources/declarative/checks/test_check_stream.py::test_check_stream_with_slices_as_list[test_try_to_check_invalid_stream]to assert the new(False, msg)return shape with the unknown name and the available list, instead ofpytest.raises(ValueError).(False, msg), path set + valid + unknown mix →(False, msg)listing the unknowns, path set + non-list value →(False, "must be a list" msg).test_non_existent_static_streamto expectStatus.FAILED(no longerpytest.raises(ValueError)).unit_tests/sources/declarative/test_concurrent_declarative_source.py: spec-injection happy path, spec-injection withadditionalProperties: false, spec-injection preserving manifest-author-declared shape and forcingairbyte_hidden: true, dotted-path raises, end-to-end check withconfig_check_streams_pathexercisings2instead ofs1, end-to-end with unknown override returnsStatus.FAILED(no raise), and end-to-end fallback to manifest when config key is missing.Unreleasedentry toCHANGELOG.mdcovering the opt-in and theValueError→(False, msg)conversion.Coordination with #997 (still OPEN as draft)
PR #997 ("feat: add
any_ofstrategy toCheckStream") modifies the same files for a related but different feature (issue airbytehq/airbyte-internal-issues#16231). Two important differences for reviewers to settle:raise ValueErrorvs(False, msg). PR feat: addany_ofstrategy toCheckStreamfor dynamic check stream selection #997 explicitly keeps the existingraise ValueError(...)for stream-name-not-in-catalog, on the rationale that an unknown name in the manifest is a manifest-author bug. Issue airbytehq/airbyte-internal-issues#16299 explicitly proposes converting this to(False, msg)for cleaner UX, distinguishing manifest-source vs config-source unknown-stream cases. This PR implements the proposed conversion and surfaces the source in the error message. If reviewers prefer to keep the raise for the manifest-author case and only return(False, msg)for the config-source case, that's a small targeted change to make.origin/main, not feat: addany_ofstrategy toCheckStreamfor dynamic check stream selection #997's head. Whichever lands first will create conflicts for the other; happy to rebase once a direction is picked.Open design questions from issue airbytehq/airbyte-internal-issues#16299 (for reviewers)
airbyte_hidden: true). Alternative: raise at construction time. Testtest_check_stream_config_override_path_preserves_pre_declared_property_and_forces_hiddencovers the lenient behavior.dpathalready supports nesting. Future expansion is non-breaking since the Pydantic field stays astr.airbyte_hidden: trueon array-typed fields. Worth a separate sanity check by someone with Connector Builder UI / Cloud UI access — if either of those surfaces doesn't honorairbyte_hiddenfor arrays, the override field would be visible to end users. The CDK side is correct; the question is purely platform behavior.additionalProperties: falsespecs. Testtest_check_stream_config_override_path_injects_into_spec_with_additional_properties_falsecovers this path.ManifestNormalizerinteraction. Injection happens after_post_process_manifest(); the injected fragment is hand-crafted and known-safe so I did not re-run validation against it. If reviewers prefer re-validation, easy to add.CheckDynamicStreamparity. Out of scope for v1, as called out in the issue.Declarative-First Evaluation
This change is declarative-first — it adds a new field to the
CheckStreamdeclarative component's schema and wires it through the existing factory and runtime. The dpath read in the runtime is the same pattern already used byConfigComponentsResolver,HttpComponentsResolver, and other declarative components. No custom Python component was added.Test Coverage
Unit + integration tests in two files (15 new test cases plus updates to two existing tests). All 110 tests in the targeted scope pass:
Lint, format, and typecheck pass:
Two
test_concurrent_declarative_source.pytests fail flakily on origin/main (test_read_with_concurrent_and_synchronous_streams_with_concurrent_stateandtest_read_concurrent_declarative_source[test_no_pagination_with_partition_router-...]); confirmed they fail on plaingit stash'd origin/main as well, so they are not caused by this PR.Breaking Change Evaluation
Not a breaking change. New field is optional and additive; runtime fallback preserves existing behavior. The
ValueError→(False, msg)conversion is a strictly more lenient surface — code that previously caughtValueErrorfromCheckStream.check_connectionwill simply not see the exception anymore, but the public contract ofcheck_connection(returningTuple[bool, Any]) is unchanged and the result for the unknown-stream case becomes a normal failed-status rather than an unhandled exception.Review & Testing Checklist for Human
ValueError→(False, msg)conversion is the desired direction given PR feat: addany_ofstrategy toCheckStreamfor dynamic check stream selection #997's opposite stance, and whether the source-of-error message format (config path '<path>'vsmanifest) is helpful or noisy.airbyte_hidden: truefortype: arrayproperties — if not, the injected field would be visible to end users._inject_config_check_streams_property(preserve manifest-author shape, forceairbyte_hidden: true); switch to raise-on-conflict if you'd rather be strict.Notes
Resolves airbytehq/airbyte-internal-issues#16299:
Related: #997 (overlapping file scope, divergent on
ValueErrorhandling).Triage session: https://app.devin.ai/sessions/ecc8d961bb1a4dcc84e429b0343cc646
Link to Devin session: https://app.devin.ai/sessions/b46161d268dc4a61840535521a64d966