feat(test): auto-discover smoke-test scenarios from metadata.yaml#999
feat(test): auto-discover smoke-test scenarios from metadata.yaml#999Aaron ("AJ") Steers (aaronsteers) wants to merge 1 commit intomainfrom
Conversation
Adds a 'scenario_loader' module that parses
'data.connectorTestSuitesOptions[].suite==smokeTests' blocks from a
connector's 'metadata.yaml' and converts each scenario into a
'ConnectorTestScenario'. 'configSettings' values support two sigils:
- '${secret-ref:<item>/<section>/<field>}' resolves a credential
written to 'secrets/.env.<section>' by an out-of-band fetcher (e.g.
airbyte-ops-mcp 'secrets fetch'). All three segments are required;
the section segment is the literal 1Password section name (no prefix
stripping). Env vars inside each file are flat: 'ITEM_FIELD'.
- '${relative-date:-P<duration>}' renders an ISO-8601 duration
anchored at today midnight UTC as RFC 3339 ('YYYY-MM-DDTHH:MM:SSZ').
'AIRBYTE_TEST_CREDS_PROVIDER=1pass' and
'AIRBYTE_TEST_CREDS_1PASS_VAULT_NAME' are required only when a
'${secret-ref:...}' sigil is actually present in a scenario. Pure
literal or '${relative-date:...}'-only scenarios resolve without any
provider env vars (source-faker case).
Wires the new loader into 'DockerConnectorTestSuite.get_scenarios()'
so smoke-test scenarios run alongside CAT-derived scenarios, with no
behavior change for connectors that have not opted in.
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1777094203-scenario-loader#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1777094203-scenario-loaderPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
📝 WalkthroughWalkthroughThis change extends the test suite infrastructure to load smoke-test scenarios from a connector's Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner as Test Runner
participant DockerSuite as DockerConnectorTestSuite
participant ScenarioLoader as scenario_loader
participant MetadataFile as metadata.yaml
participant EnvFile as secrets/.env.*
TestRunner->>DockerSuite: get_scenarios()
DockerSuite->>DockerSuite: _get_acceptance_test_scenarios()
Note over DockerSuite: Load from acceptance-test-config.yml
DockerSuite->>ScenarioLoader: load_metadata_smoke_test_scenarios(...)
ScenarioLoader->>MetadataFile: Read metadata.yaml
MetadataFile-->>ScenarioLoader: YAML content
ScenarioLoader->>ScenarioLoader: Extract smokeTests suite scenarios
ScenarioLoader->>ScenarioLoader: Walk configSettings for sigils
alt ${secret-ref:...} detected
ScenarioLoader->>EnvFile: Read secrets/.env.<section>
EnvFile-->>ScenarioLoader: KEY=value pairs
ScenarioLoader->>ScenarioLoader: Resolve to env var name
end
alt ${relative-date:...} detected
ScenarioLoader->>ScenarioLoader: Parse ISO-8601 duration
ScenarioLoader->>ScenarioLoader: Calculate from UTC midnight
ScenarioLoader->>ScenarioLoader: Format as RFC3339
end
ScenarioLoader-->>DockerSuite: list[ConnectorTestScenario]
DockerSuite->>DockerSuite: Deduplicate & concatenate
DockerSuite-->>TestRunner: Combined scenario list
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (5)
airbyte_cdk/test/standard_tests/scenario_loader.py (3)
110-115:object.__setattr__on a frozen Pydantic model — comment is helpful.The
_idworkaround for the frozenConnectorTestScenariois documented inline, which is great. One small thought: since this is the second place in the codebase that needs to override_idafter construction (the CAT path usesidderived fromconfig_path), would a small classmethod likeConnectorTestScenario.with_id(scenario_dict, id_)on the model make this less likely to silently break if the field is ever renamed or migrated toPrivateAttr? Totally optional. wdyt?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@airbyte_cdk/test/standard_tests/scenario_loader.py` around lines 110 - 115, The code currently uses object.__setattr__ to set the private frozen attribute _id on ConnectorTestScenario after construction; add a classmethod on ConnectorTestScenario (e.g., with_id or create_with_id) that accepts the scenario dict (or validated fields) and an id_ string, constructs the model (using model_validate or standard constructor), sets the private/frozen _id via object.__setattr__ inside that method, and returns the instance; then update scenario_loader.py to call ConnectorTestScenario.with_id(resolved_settings, f"smoke-{name}") instead of directly calling object.__setattr__ so the override logic is centralized and resilient to future renames/migrations of the private attribute.
218-221: Defensivetzinfore-attach looks unreachable — keep or drop?
today_midnightis constructed withtzinfo=timezone.utc, and subtracting either atimedeltaorisodate.Durationfrom it preservestzinfo. Sorendered.tzinfo is Noneshould never be true here. Happy to keep it as belt-and-suspenders, but if you'd prefer to slim down the path, dropping lines 219–220 reads a bit cleaner. wdyt?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@airbyte_cdk/test/standard_tests/scenario_loader.py` around lines 218 - 221, The conditional that re-attaches tzinfo to rendered is unreachable because today_midnight is created with tzinfo=timezone.utc and subtracting duration preserves tzinfo, so remove the defensive check and replace the block that sets rendered.tzinfo and conditionally replaces it with a direct return of rendered.strftime("%Y-%m-%dT%H:%M:%SZ"); update the code around the variables today_midnight and rendered (and any references to timezone.utc) accordingly to eliminate the dead branch.
224-257: Consider leveragingpython-dotenvor documenting format constraintsThe hand-rolled parser handles the common cases (
KEY=value,KEY="value", comments, blanks) well for the OPS-CLI-controlled file format. Two edge cases worth considering:
- Inline comments:
KEY=value # trailing commentwould storevalue # trailing commentrather thanvalue. Probably fine if OPS CLI never writes those, but a foot-gun if anyone hand-edits.- Escapes inside quoted values:
KEY="ab\"c"would slice toab\"c(backslash preserved). Again, unlikely but possible if a 1Password secret contains a quote.Since
python-dotenv(>=0.21.0) is already a dependency in the project, would you consider delegating to it for robustness, or alternatively adding a comment documenting the expected format constraints? wdyt?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@airbyte_cdk/test/standard_tests/scenario_loader.py` around lines 224 - 257, The hand-rolled parser in _load_env_file misses inline comments and escaped quotes; replace the manual parse loop with python-dotenv’s parser (e.g., use dotenv_values from dotenv) to load the file into a dict, preserve the existing cache logic (cache_key, cache lookup/store) and the SigilResolutionError when the path does not exist, and ensure the returned mapping replaces the current parsed dict; if you prefer not to switch, alternatively add a clear comment above _load_env_file documenting that inline comments and backslash escapes in quoted values are not supported.unit_tests/test/test_scenario_loader.py (1)
1-438: Coverage looks solid — one suggestion if the embedded-sigil behavior is intentional.The test surface is comprehensive: missing-metadata, missing-suite, literal-only, relative-date, secret-ref happy + 4 failure modes, unknown sigils, two-segment rejection, multi-scenario, and recursion. Nicely done.
If you decide (on the related comment in
scenario_loader.py) that embedded sigils like"prefix-${secret-ref:a/b/c}"should also raiseSigilResolutionError, would you mind adding a test for that case so the contract is locked? Otherwise the silent-passthrough behavior risks regressing later. wdyt?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unit_tests/test/test_scenario_loader.py` around lines 1 - 438, Add a new unit test that asserts embedded sigils (e.g., a value like "prefix-${secret-ref:ashby/test-credentials-api_key/api_key}") raise SigilResolutionError when passed through load_metadata_smoke_test_scenarios; create a test function (e.g., test_embedded_sigil_fails) in test_scenario_loader.py that writes metadata with an embedded "${secret-ref:...}" in configSettings, then calls load_metadata_smoke_test_scenarios and expects pytest.raises(SigilResolutionError) to lock the contract; reference the existing SigilResolutionError and load_metadata_smoke_test_scenarios symbols so the test aligns with the current resolution behavior.airbyte_cdk/test/standard_tests/docker_base.py (1)
140-190: Nicely split — quick question on dedup scope.The refactor cleanly separates CAT and smoke scenario loading. Just a thought:
_dedup_scenariosruns only inside_get_acceptance_test_scenariosand dedups byconfig_path. Smoke scenarios are keyed byconfig_dict+_id(smoke-<name>) so they bypass dedup entirely — meaning a connector that accidentally declares two smoke scenarios with the samenameinmetadata.yamlwould yield two scenarios sharingid=smoke-<name>, which pytest would parametrize as duplicates.Would it be worth either (a) raising on duplicate smoke names in
scenario_loader._build_scenario-callers or (b) deduping byidat theget_scenarioslevel, wdyt?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@airbyte_cdk/test/standard_tests/docker_base.py` around lines 140 - 190, get_scenarios currently only dedups acceptance tests by config_path in _get_acceptance_test_scenarios, so smoke scenarios from _get_smoke_test_scenarios can produce duplicate ids (e.g., same "smoke-<name>"). After combining cat_scenarios and smoke_scenarios in get_scenarios, detect duplicate ConnectorTestScenario.id values (use the .id attribute on each scenario) and either raise a clear exception listing the duplicated ids or remove duplicates; implement a raise to fail fast so callers fix metadata.yaml, referencing get_scenarios, _get_acceptance_test_scenarios, _get_smoke_test_scenarios and ConnectorTestScenario.id.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@airbyte_cdk/test/standard_tests/scenario_loader.py`:
- Around line 41-44: The current anchored regexes _SECRET_REF_PATTERN and
_RELATIVE_DATE_PATTERN only match whole-string sigils and the existing guard
using raw.startswith("${") and raw.endswith("}") lets embedded sigils (e.g.
"prefix-${secret-ref:...}" or URLs with ${...}) pass through unresolved; update
scenario_loader.py to either (preferred) support embedded sigils by applying
re.sub with _SECRET_REF_PATTERN and _RELATIVE_DATE_PATTERN so embedded
occurrences are replaced, or (alternative) tighten the unrecognized-sigil guard
(replace the raw.startswith/raw.endswith check) to search for any r"\$\{[^}]+\}"
and raise an error when any such pattern is found but not matched by the known
regexes; reference _SECRET_REF_PATTERN, _RELATIVE_DATE_PATTERN, and the current
raw.startswith("${") and raw.endswith("}") guard when making the change.
- Around line 65-66: The variable `suites` can be None if
`connectorTestSuitesOptions:` is present but empty in the YAML; change the
assignment of `suites` so it defensively coalesces None to an empty list (e.g.,
use the existing pattern of `...get("connectorTestSuitesOptions") or []`) so
`for suite in suites:` never fails; update the line that defines `suites` in
scenario_loader.py accordingly.
- Around line 196-205: The env var name construction can produce invalid POSIX
names when segments contain dashes; update the code around var_name in
scenario_loader.py (the block using ENV_FILE_PREFIX, SECRETS_DIRNAME,
_load_env_file and raising SigilResolutionError) to normalize dashes to
underscores before uppercasing and joining: e.g., replace "-" with "_" on both
item and field prior to building var_name, then use that normalized var_name to
look up env_vars; also add a brief comment and (optional) a clearer
SigilResolutionError message that shows the normalized var name when not found
so debugging is easier.
In `@unit_tests/test/test_scenario_loader.py`:
- Around line 120-123: The assertion is flaky around UTC midnight because the
test captures "today" after the loader's _resolve_relative_date produced
rendered/parsed; fix by making the test tolerant to that boundary: either freeze
time (use freezegun/time_machine around the call that produces rendered) or
compute now_before = datetime.now(tz=timezone.utc) immediately before invoking
the code that yields rendered and use now_before when computing expected, or
relax the assertion to accept delta.days in {30, 31} (e.g., assert delta.days in
(30, 31)). Reference symbols: rendered, parsed, today, and
_resolve_relative_date to locate the code to change.
---
Nitpick comments:
In `@airbyte_cdk/test/standard_tests/docker_base.py`:
- Around line 140-190: get_scenarios currently only dedups acceptance tests by
config_path in _get_acceptance_test_scenarios, so smoke scenarios from
_get_smoke_test_scenarios can produce duplicate ids (e.g., same "smoke-<name>").
After combining cat_scenarios and smoke_scenarios in get_scenarios, detect
duplicate ConnectorTestScenario.id values (use the .id attribute on each
scenario) and either raise a clear exception listing the duplicated ids or
remove duplicates; implement a raise to fail fast so callers fix metadata.yaml,
referencing get_scenarios, _get_acceptance_test_scenarios,
_get_smoke_test_scenarios and ConnectorTestScenario.id.
In `@airbyte_cdk/test/standard_tests/scenario_loader.py`:
- Around line 110-115: The code currently uses object.__setattr__ to set the
private frozen attribute _id on ConnectorTestScenario after construction; add a
classmethod on ConnectorTestScenario (e.g., with_id or create_with_id) that
accepts the scenario dict (or validated fields) and an id_ string, constructs
the model (using model_validate or standard constructor), sets the
private/frozen _id via object.__setattr__ inside that method, and returns the
instance; then update scenario_loader.py to call
ConnectorTestScenario.with_id(resolved_settings, f"smoke-{name}") instead of
directly calling object.__setattr__ so the override logic is centralized and
resilient to future renames/migrations of the private attribute.
- Around line 218-221: The conditional that re-attaches tzinfo to rendered is
unreachable because today_midnight is created with tzinfo=timezone.utc and
subtracting duration preserves tzinfo, so remove the defensive check and replace
the block that sets rendered.tzinfo and conditionally replaces it with a direct
return of rendered.strftime("%Y-%m-%dT%H:%M:%SZ"); update the code around the
variables today_midnight and rendered (and any references to timezone.utc)
accordingly to eliminate the dead branch.
- Around line 224-257: The hand-rolled parser in _load_env_file misses inline
comments and escaped quotes; replace the manual parse loop with python-dotenv’s
parser (e.g., use dotenv_values from dotenv) to load the file into a dict,
preserve the existing cache logic (cache_key, cache lookup/store) and the
SigilResolutionError when the path does not exist, and ensure the returned
mapping replaces the current parsed dict; if you prefer not to switch,
alternatively add a clear comment above _load_env_file documenting that inline
comments and backslash escapes in quoted values are not supported.
In `@unit_tests/test/test_scenario_loader.py`:
- Around line 1-438: Add a new unit test that asserts embedded sigils (e.g., a
value like "prefix-${secret-ref:ashby/test-credentials-api_key/api_key}") raise
SigilResolutionError when passed through load_metadata_smoke_test_scenarios;
create a test function (e.g., test_embedded_sigil_fails) in
test_scenario_loader.py that writes metadata with an embedded
"${secret-ref:...}" in configSettings, then calls
load_metadata_smoke_test_scenarios and expects
pytest.raises(SigilResolutionError) to lock the contract; reference the existing
SigilResolutionError and load_metadata_smoke_test_scenarios symbols so the test
aligns with the current resolution behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4df6fbae-9bc8-4016-b3e8-b55f1f13b759
📒 Files selected for processing (3)
airbyte_cdk/test/standard_tests/docker_base.pyairbyte_cdk/test/standard_tests/scenario_loader.pyunit_tests/test/test_scenario_loader.py
| _SECRET_REF_PATTERN = re.compile( | ||
| r"^\$\{secret-ref:(?P<item>[^/}]+)/(?P<section>[^/}]+)/(?P<field>[^/}]+)\}$" | ||
| ) | ||
| _RELATIVE_DATE_PATTERN = re.compile(r"^\$\{relative-date:-(?P<duration>P[^}]+)\}$") |
There was a problem hiding this comment.
Embedded sigils slip through silently — intentional?
Both _SECRET_REF_PATTERN and _RELATIVE_DATE_PATTERN are anchored with ^...$, so a value like "prefix-${secret-ref:a/b/c}" or "https://api.example.com/${relative-date:-P30D}/data" would not match either regex. It would also fall through the raw.startswith("${") and raw.endswith("}") guard on line 172, so it gets returned unchanged with the literal ${...} text intact — no error, no resolution.
Given that scenarios are config dicts which often embed timestamps or tokens into URLs, it might be worth either (a) supporting embedded sigils via re.sub, or (b) tightening the unrecognized-sigil check to fail on any string containing ${...} regardless of position. wdyt?
🛡️ Sketch of a stricter "any-${...}" guard
- if raw.startswith("${") and raw.endswith("}"):
+ # Catch embedded or malformed sigils too — they would otherwise pass through
+ # unresolved and silently corrupt scenario configs.
+ if "${" in raw:
raise SigilResolutionError(
f"Unrecognized smoke-test sigil: `{raw}`. Supported forms are "
"`${secret-ref:<item>/<section>/<field>}` and "
"`${relative-date:-P<duration>}`."
)Also applies to: 158-179
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@airbyte_cdk/test/standard_tests/scenario_loader.py` around lines 41 - 44, The
current anchored regexes _SECRET_REF_PATTERN and _RELATIVE_DATE_PATTERN only
match whole-string sigils and the existing guard using raw.startswith("${") and
raw.endswith("}") lets embedded sigils (e.g. "prefix-${secret-ref:...}" or URLs
with ${...}) pass through unresolved; update scenario_loader.py to either
(preferred) support embedded sigils by applying re.sub with _SECRET_REF_PATTERN
and _RELATIVE_DATE_PATTERN so embedded occurrences are replaced, or
(alternative) tighten the unrecognized-sigil guard (replace the
raw.startswith/raw.endswith check) to search for any r"\$\{[^}]+\}" and raise an
error when any such pattern is found but not matched by the known regexes;
reference _SECRET_REF_PATTERN, _RELATIVE_DATE_PATTERN, and the current
raw.startswith("${") and raw.endswith("}") guard when making the change.
| metadata = yaml.safe_load(metadata_path.read_text()) or {} | ||
| suites = metadata.get("data", {}).get("connectorTestSuitesOptions", []) |
There was a problem hiding this comment.
Minor null-safety on connectorTestSuitesOptions.
If a metadata.yaml author writes connectorTestSuitesOptions: with nothing under it, yaml.safe_load returns None for that key, and for suite in suites: will raise TypeError: 'NoneType' object is not iterable. Cheap fix — same pattern you already use on line 73. wdyt?
🛡️ Defensive `or []`
- suites = metadata.get("data", {}).get("connectorTestSuitesOptions", [])
+ suites = metadata.get("data", {}).get("connectorTestSuitesOptions") or []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@airbyte_cdk/test/standard_tests/scenario_loader.py` around lines 65 - 66, The
variable `suites` can be None if `connectorTestSuitesOptions:` is present but
empty in the YAML; change the assignment of `suites` so it defensively coalesces
None to an empty list (e.g., use the existing pattern of
`...get("connectorTestSuitesOptions") or []`) so `for suite in suites:` never
fails; update the line that defines `suites` in scenario_loader.py accordingly.
| env_path = connector_root / SECRETS_DIRNAME / f"{ENV_FILE_PREFIX}{section}" | ||
| env_vars = _load_env_file(path=env_path, cache=env_cache) | ||
|
|
||
| var_name = f"{item.upper()}_{field.upper()}" | ||
| if var_name not in env_vars: | ||
| raise SigilResolutionError( | ||
| f"Env var `{var_name}` not found in `{env_path}` for sigil " | ||
| f"`${{secret-ref:{item}/{section}/{field}}}`." | ||
| ) | ||
| return env_vars[var_name] |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What naming convention does the airbyte-ops-mcp secrets fetchcommand use for env var names written to.env.
files (dashes vs underscores)?
💡 Result:
The airbyte-ops-mcp secrets fetch command writes environment variable names to .env.
Citations:
- 1: https://docs.airbyte.com/developers/pyairbyte/reference/airbyte/mcp/
- 2: https://airbytehq.github.io/PyAirbyte/airbyte/secrets.html
- 3: https://airbytehq.github.io/PyAirbyte/airbyte/mcp.html
- 4: https://pypi.org/project/airbyte-internal-ops/0.28.0/
- 5: https://pypi.org/project/airbyte-internal-ops/0.1.5/
🏁 Script executed:
# First, let's look at the code context around lines 196-205
cd airbyte_cdk && head -220 test/standard_tests/scenario_loader.py | tail -40Repository: airbytehq/airbyte-python-cdk
Length of output: 1449
🏁 Script executed:
# Search for sigil parsing logic to understand how item/field are extracted
rg "secret-ref" -A 3 -B 3 --type pyRepository: airbytehq/airbyte-python-cdk
Length of output: 11451
🏁 Script executed:
# Look for any validation of item/field values
rg "item|field" test/standard_tests/scenario_loader.py -B 2 -A 2 -nRepository: airbytehq/airbyte-python-cdk
Length of output: 151
🏁 Script executed:
# Check if there are any tests that use dashed segments
rg "secret-ref.*-" test/ --type pyRepository: airbytehq/airbyte-python-cdk
Length of output: 118
Normalize dashes in env var name construction or validate at parse time.
The regex pattern (?P<item>[^/}]+) allows dashes in item and field segments, but the env var construction var_name = f"{item.upper()}_{field.upper()}" doesn't normalize them. If a field like api-key is used, it would create API-KEY (invalid for POSIX env var names). While airbyte-ops-mcp itself follows UPPERCASE_WITH_UNDERSCORES convention with no documented dash usage, the code would silently break if someone manually creates .env files with dashed field names or if upstream changes.
Would you prefer to normalize dashes to underscores during construction, or reject dashed segments at parse time with a clear error message? The former is more forgiving; the latter explicitly guides users to the expected format.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@airbyte_cdk/test/standard_tests/scenario_loader.py` around lines 196 - 205,
The env var name construction can produce invalid POSIX names when segments
contain dashes; update the code around var_name in scenario_loader.py (the block
using ENV_FILE_PREFIX, SECRETS_DIRNAME, _load_env_file and raising
SigilResolutionError) to normalize dashes to underscores before uppercasing and
joining: e.g., replace "-" with "_" on both item and field prior to building
var_name, then use that normalized var_name to look up env_vars; also add a
brief comment and (optional) a clearer SigilResolutionError message that shows
the normalized var name when not found so debugging is easier.
| today = datetime.now(tz=timezone.utc) | ||
| parsed = datetime.strptime(rendered, "%Y-%m-%dT%H:%M:%SZ").replace(tzinfo=timezone.utc) | ||
| delta = today.replace(hour=0, minute=0, second=0, microsecond=0) - parsed | ||
| assert delta.days == 30 |
There was a problem hiding this comment.
Potential midnight-UTC flake in relative-date assertion.
The test calls datetime.now(tz=timezone.utc) after the loader already captured its own "today midnight UTC" inside _resolve_relative_date. If this test happens to run with the loader stamping at e.g. 23:59:59.999Z and the assertion executing at 00:00:00.001Z the next UTC day, today.replace(hour=0,...) will roll to the new day's midnight while parsed still anchors to the previous day, making delta.days == 31 and the assertion fail.
Since CI machines and overnight runs do hit those boundaries, would you consider one of:
- Asserting the rendered timestamp matches
(today_at_call_time - 30d)or(today_at_call_time - 30d - 1d)to absorb the boundary, or - Freezing time with
freezegun/time-machinefor deterministic assertion?
wdyt?
🛡️ Sketch of a midnight-tolerant assertion
- today = datetime.now(tz=timezone.utc)
- parsed = datetime.strptime(rendered, "%Y-%m-%dT%H:%M:%SZ").replace(tzinfo=timezone.utc)
- delta = today.replace(hour=0, minute=0, second=0, microsecond=0) - parsed
- assert delta.days == 30
+ parsed = datetime.strptime(rendered, "%Y-%m-%dT%H:%M:%SZ").replace(tzinfo=timezone.utc)
+ today_midnight = datetime.now(tz=timezone.utc).replace(
+ hour=0, minute=0, second=0, microsecond=0
+ )
+ # Allow a 1-day tolerance to absorb a UTC-midnight rollover between the
+ # loader's now() and the test's now().
+ assert (today_midnight - parsed).days in (30, 31)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@unit_tests/test/test_scenario_loader.py` around lines 120 - 123, The
assertion is flaky around UTC midnight because the test captures "today" after
the loader's _resolve_relative_date produced rendered/parsed; fix by making the
test tolerant to that boundary: either freeze time (use freezegun/time_machine
around the call that produces rendered) or compute now_before =
datetime.now(tz=timezone.utc) immediately before invoking the code that yields
rendered and use now_before when computing expected, or relax the assertion to
accept delta.days in {30, 31} (e.g., assert delta.days in (30, 31)). Reference
symbols: rendered, parsed, today, and _resolve_relative_date to locate the code
to change.
There was a problem hiding this comment.
Pull request overview
Adds metadata-driven smoke-test scenario discovery to the CDK standard-tests suite, enabling connectors to opt into 1Password-backed smoke scenarios via metadata.yaml without per-connector CI YAML changes.
Changes:
- Introduces
airbyte_cdk.test.standard_tests.scenario_loaderto parsemetadata.yamlsmoke-test suites and resolve${secret-ref:...}/${relative-date:...}sigils. - Extends
DockerConnectorTestSuite.get_scenarios()to return CAT-derived scenarios plus metadata-derived smoke scenarios. - Adds a comprehensive unit test suite for scenario parsing and sigil resolution behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
airbyte_cdk/test/standard_tests/scenario_loader.py |
New loader that reads smoke-test scenarios from metadata.yaml and resolves supported sigils. |
airbyte_cdk/test/standard_tests/docker_base.py |
Updates scenario discovery to include metadata-derived smoke scenarios in addition to CAT scenarios. |
unit_tests/test/test_scenario_loader.py |
Unit tests covering smoke-test scenario discovery and sigil resolution/error cases. |
Comments suppressed due to low confidence (1)
airbyte_cdk/test/standard_tests/docker_base.py:156
- The
FileNotFoundErrorwarning in_get_acceptance_test_scenariossays "No scenarios will be loaded.", butget_scenarios()now also loads smoke-test scenarios frommetadata.yaml. This message can be misleading for connectors that have smoke scenarios but no CAT config; consider updating the warning text to clarify that only CAT-style scenarios are skipped (and smoke scenarios may still be loaded).
except FileNotFoundError as e:
# Destinations sometimes do not have an acceptance tests file.
warnings.warn(
f"Acceptance test config file not found: {e!s}. No scenarios will be loaded.",
category=UserWarning,
stacklevel=1,
)
return []
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for suite in suites: | ||
| if not isinstance(suite, dict) or suite.get("suite") != SMOKE_TEST_SUITE_NAME: | ||
| continue | ||
|
|
||
| for raw_scenario in suite.get("scenarios", []) or []: | ||
| scenarios.append( | ||
| _build_scenario( | ||
| raw_scenario=raw_scenario, | ||
| connector_root=connector_root, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
suite.get("scenarios") items are passed directly into _build_scenario, but if a connector's metadata.yaml contains a non-mapping entry (e.g. a string/number), _build_scenario will crash with AttributeError on raw_scenario.get(...). Consider validating raw_scenario is a dict in the loop (or inside _build_scenario) and raising SigilResolutionError with a clear message instead of an unhandled exception.
| env_cache: dict[str, dict[str, str]] = {} | ||
| has_secret_ref = _contains_secret_ref(raw_settings) | ||
| if has_secret_ref: | ||
| _validate_provider_env() | ||
|
|
||
| resolved_settings = _resolve_sigils( | ||
| value=raw_settings, | ||
| connector_root=connector_root, | ||
| env_cache=env_cache, | ||
| ) |
There was a problem hiding this comment.
Right now secret-ref scenarios validate provider env vars and resolve secrets during scenario discovery/collection. This will make airbyte-cdk ... --no-creds (which relies on the requires_creds marker) fail during pytest collection for connectors that opt into smoke tests, because the exception is raised before pytest can filter out requires_creds tests. Consider marking these scenarios as requires_creds (e.g., by setting a config_path under secrets/ or adding an explicit flag on ConnectorTestScenario) and deferring ${secret-ref:...} resolution/validation until the scenario is actually executed, so no-creds runs can still collect and skip cleanly.
| env_cache: dict[str, dict[str, str]] = {} | ||
| has_secret_ref = _contains_secret_ref(raw_settings) | ||
| if has_secret_ref: | ||
| _validate_provider_env() |
There was a problem hiding this comment.
🔴 Smoke-test scenarios with ${secret-ref:...} crash test collection, breaking the --no-creds workflow
When a connector's metadata.yaml declares smoke-test scenarios containing ${secret-ref:...} sigils, get_scenarios() eagerly resolves them via _get_smoke_test_scenarios() → load_metadata_smoke_test_scenarios() → _build_scenario() → _validate_provider_env(). If the credentials environment is not configured, this raises SigilResolutionError during pytest_generate_tests (pytest_hooks.py:173), crashing all test collection — not just the tests that need credentials.
The existing --no-creds workflow (-m "not requires_creds") filters tests after collection, but collection itself now fails before filtering can run. Additionally, requires_creds (scenario.py:192-194) checks only self.config_path, so smoke-test scenarios (which use config_dict, not config_path) always return False — meaning even if collection succeeded, these scenarios would not be filtered by --no-creds.
The net effect: any connector that adds a ${secret-ref:...} smoke-test scenario to metadata.yaml will break all test runs for developers/CI without credentials, even for tests that don't require credentials (e.g., test_docker_image_build_and_spec).
Prompt for agents
The problem: _build_scenario eagerly validates provider env and resolves secret-ref sigils. This happens during get_scenarios() which is called during pytest_generate_tests (test collection). If credentials are unavailable, a SigilResolutionError crashes ALL test collection, preventing even non-credentials tests from running. The --no-creds CLI flag cannot help because it applies mark-based filtering after collection.
Additionally, ConnectorTestScenario.requires_creds (in airbyte_cdk/test/models/scenario.py:192-194) only checks config_path for 'secrets' in its path parts. Smoke-test scenarios use config_dict (not config_path), so requires_creds always returns False for them.
Two things need to happen:
1. In load_metadata_smoke_test_scenarios (or _get_smoke_test_scenarios in docker_base.py), catch SigilResolutionError for individual scenarios and either skip them with a warning or defer resolution. This prevents one unresolvable scenario from blocking all test collection.
2. Smoke-test scenarios that were resolved from secret-ref sigils should be marked as requiring credentials, either by extending the requires_creds property in ConnectorTestScenario to also check for a flag set during resolution, or by adding a marker on the scenario during _build_scenario.
Was this helpful? React with 👍 or 👎 to provide feedback.
| return rendered.strftime("%Y-%m-%dT%H:%M:%SZ") | ||
|
|
||
|
|
||
| def _load_env_file( |
There was a problem hiding this comment.
Shouldn't we use the python dotenv library rather than rebuild this from scratch??
There was a problem hiding this comment.
Fair — you're right. Originally deferred for "fewer escape-quirk surprises" but airbyte-ops-mcp already uses python-dotenv to write these files, so the CDK reading them with anything else is asymmetric and a future bug factory. Switching _load_env_file to dotenv.dotenv_values() in the next push.
Summary
Teaches the standard-tests suite to read smoke-test scenarios declared in a connector's
metadata.yaml, so connectors can opt into 1Password-backed scenarios without per-connector CI YAML edits.New module
airbyte_cdk.test.standard_tests.scenario_loader:data.connectorTestSuitesOptions[].suite == 'smokeTests'frommetadata.yamlConnectorTestScenario(withconfig_dictpopulated and a stablesmoke-<name>id)configSettingsrecursively:${secret-ref:<item>/<section>/<field>}— 3-segment-only. Readssecrets/.env.<section>(literal 1P section name, no prefix stripping) and looks up<ITEM_UPPER>_<FIELD_UPPER>. The OPS CLI is responsible for writing those.env.<section>files ahead of test execution.${relative-date:-P<duration>}— ISO-8601 duration anchored at today midnight UTC, rendered as RFC 3339 (YYYY-MM-DDTHH:MM:SSZ).AIRBYTE_TEST_CREDS_PROVIDER=1pass,AIRBYTE_TEST_CREDS_1PASS_VAULT_NAME) only when a${secret-ref:...}sigil is actually present. Pure literal or${relative-date:...}-only scenarios (source-faker style) resolve with no provider env vars.DockerConnectorTestSuite.get_scenarios()now returns CAT-derived scenarios + smoke-test scenarios. Smoke-test scenarios are additive — connectors that have not opted in see no behavior change.Companion to:
airbytehq/airbyte-connector-models#42— schema for the metadata.yaml smoke-test dialect (literal section names, 3-segment sigils only)airbytehq/airbyte-ops-mcp#733— currentsecrets fetchflow; a follow-up will teach it to write per-section.env.<section>files when a connector has a smoke-test blockReview & Testing Checklist for Human
test-credentials-api_key), no.env.defaultfallback.AIRBYTE_TEST_CREDS_PROVIDER+AIRBYTE_TEST_CREDS_1PASS_VAULT_NAMErequired only when secret-ref sigils are present. The vault env var is read by the OPS CLI; CDK only checks it is set, since CDK reads pre-fetched.env.<section>files rather than callingopitself._load_env_fileparsing — supportsKEY=valueandKEY="value"/KEY='value', ignores comments and blank lines. Multi-line values inside quotes are NOT supported here (the OPS CLI multi-line writer in fix: replace flaky setup-poetry action with install-poetry #733 produces them, but at parse time the test runner only consumes single-line values; flag if this is wrong).secrets/.env.test-credentials-api_key, set the two env vars, and confirmpytestparametrizes asmoke-defaultscenario.Notes
.env.<section>file, missing var inside file, unknown sigil, 2-segment sigil rejected, multi-scenario, recursive nested resolution..env.<section>files, monorepo CI workflow updates to add the env vars, and the source-ashbymetadata.yamlblock. Each will follow as its own PR after this one is reviewed.Link to Devin session: https://app.devin.ai/sessions/5c2ccf537b70486b9e6fbc974d26063e
Requested by: Aaron ("AJ") Steers (@aaronsteers)
Summary by CodeRabbit
New Features
Tests