Skip to content

Allow bypassing data quality checks via a config key#1598

Open
Dev-iL wants to merge 2 commits into
apache:mainfrom
SummitSG-LLC:2605/yolo_mode
Open

Allow bypassing data quality checks via a config key#1598
Dev-iL wants to merge 2 commits into
apache:mainfrom
SummitSG-LLC:2605/yolo_mode

Conversation

@Dev-iL
Copy link
Copy Markdown
Collaborator

@Dev-iL Dev-iL commented May 24, 2026

Context

@check_output and @check_output_custom decorators let users attach data validators to function outputs. At graph-construction time, each decorated node is expanded into a subgraph:

  • {name}_raw — runs the actual function
  • {name}_{validator} — one node per validator (tagged hamilton.data_quality.contains_dq_results)
  • {name} — aggregates validation results and returns the raw output

This flag is useful for two main reasons:

  1. Validation may be crucial during development, but carries a real runtime cost in production where the pipeline is already trusted. Previously there was no way to turn it off short of removing the decorators from the source code.
  2. When generating graph visualizations and we don't want to crowd them with validation nodes, passing disable_data_quality_checks=True at construction time causes the _raw and validator nodes not to be added to the graph - so any visualization of that driver's graph will only show the "business logic" nodes.

What was changed

hamilton/function_modifiers/validation.py

BaseDataValidationDecorator (parent of both check_output and check_output_custom) now:

  1. Overrides optional_config() to declare {"disable_data_quality_checks": False}. This registers the key with Hamilton's config-filtering pipeline so it is threaded through to transform_node automatically — no driver plumbing needed.

  2. Adds an early-return guard at the top of transform_node:

    if config.get("disable_data_quality_checks", False):
        return [node_]

    When the flag is set, the node is returned unchanged — no _raw node, no validator nodes, and no expansion happens. The cost is literally zero: the extra nodes are never created.

hamilton/driver.py

Builder gains a convenience method:

def with_data_quality_disabled(self) -> "Builder":
    return self.with_config({"disable_data_quality_checks": True})

This is a thin wrapper over with_config — discoverable by IDE autocomplete and explicit about intent. Because with_config does a dict .update(), a later call like .with_config({"disable_data_quality_checks": False}) will re-enable validation, which is the expected last-write-wins behavior.

Usage

Via the Builder convenience method (recommended):

dr = (
    hamilton.driver.Builder()
    .with_modules(my_pipeline)
    .with_data_quality_disabled()
    .build()
)

Via with_config directly (equivalent, useful when config is assembled dynamically):

dr = (
    hamilton.driver.Builder()
    .with_modules(my_pipeline)
    .with_config({"disable_data_quality_checks": True})
    .build()
)

Legacy Driver constructor:

dr = hamilton.driver.Driver(
    {"disable_data_quality_checks": True},
    my_pipeline,
    adapter=DefaultAdapter(),
)

Tests added

File Test What it covers
tests/function_modifiers/test_validation.py test_check_output_disabled_via_config_returns_original_node check_output_custom returns the original node unchanged when flag is set
tests/function_modifiers/test_validation.py test_check_output_builtin_disabled_via_config_returns_original_node check_output (built-in validators) also respects the flag
tests/test_end_to_end.py test_builder_with_data_quality_disabled_removes_validator_nodes No DQ-tagged nodes appear in list_available_variables() when disabled
tests/test_end_to_end.py test_builder_with_data_quality_disabled_still_executes_correctly Driver executes correctly and returns the function's real output when disabled
tests/test_end_to_end.py test_disable_data_quality_checks_config_key_works_directly Raw with_config path (no convenience method) also suppresses validator nodes

All 13 pre-existing validation tests continue to pass.

Design notes

  • Graph-construction time, not execution time. Disabling at construction eliminates the extra nodes entirely. An execution-time approach (e.g., a lifecycle adapter) would still pay graph construction and scheduling overhead, and would be harder to reason about.

  • Config key, not a subclass. A NoValidationBuilder subclass was considered. It was rejected: this is a single boolean flag, not a fundamentally different execution model. The existing Builder method pattern (with_config, with_adapters, …) is the right weight here. The Builder.with_data_quality_disabled() convenience method is offered as a discoverable alias — if maintainers prefer, the method alone (without the raw config key) or the raw key alone (without the method) is equally viable.

  • optional_config() is the correct hook. Hamilton's resolve_config / filter_config machinery passes only declared config keys to each decorator. Registering the key via optional_config() means the flag is silently ignored (defaulting to False) by all existing drivers that never set it — no breaking change.

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@Dev-iL Dev-iL requested review from elijahbenizzy and skrawcz May 24, 2026 14:53
@Dev-iL Dev-iL force-pushed the 2605/yolo_mode branch 2 times, most recently from 564d9f6 to 6bf80c9 Compare May 24, 2026 15:52
Copy link
Copy Markdown
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice feature! Clean implementation — the one-liner in Builder delegating to with_config is the right approach, and the 2-line check in transform_node is minimal and correct. Tests cover both decorators plus end-to-end via the driver.

One nit: with_data_quality_disabled() returns -> "Builder" (string literal) — should be -> Self to match the convention established in #1560. Otherwise LGTM.

The sklearn test fix is unrelated but correct — fine to include here.

Copy link
Copy Markdown
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread docs/how-tos/run-data-quality-checks.rst Outdated
Comment thread hamilton/function_modifiers/validation.py Outdated
Copy link
Copy Markdown
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just make a constant

@Dev-iL
Copy link
Copy Markdown
Collaborator Author

Dev-iL commented May 25, 2026

One nit: with_data_quality_disabled() returns -> "Builder" (string literal) — should be -> Self to match the convention established in #1560.

Done.

@Dev-iL Dev-iL requested a review from skrawcz May 25, 2026 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants