Allow bypassing data quality checks via a config key#1598
Open
Dev-iL wants to merge 2 commits into
Open
Conversation
564d9f6 to
6bf80c9
Compare
skrawcz
reviewed
May 24, 2026
Contributor
skrawcz
left a comment
There was a problem hiding this comment.
Nice feature! Clean implementation — the one-liner in Builder delegating to with_config is the right approach, and the 2-line check in transform_node is minimal and correct. Tests cover both decorators plus end-to-end via the driver.
One nit: with_data_quality_disabled() returns -> "Builder" (string literal) — should be -> Self to match the convention established in #1560. Otherwise LGTM.
The sklearn test fix is unrelated but correct — fine to include here.
skrawcz
reviewed
May 24, 2026
skrawcz
reviewed
May 24, 2026
skrawcz
requested changes
May 24, 2026
Contributor
skrawcz
left a comment
There was a problem hiding this comment.
let's just make a constant
Collaborator
Author
Done. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
@check_outputand@check_output_customdecorators let users attach data validators to function outputs. At graph-construction time, each decorated node is expanded into a subgraph:{name}_raw— runs the actual function{name}_{validator}— one node per validator (taggedhamilton.data_quality.contains_dq_results){name}— aggregates validation results and returns the raw outputThis flag is useful for two main reasons:
disable_data_quality_checks=Trueat construction time causes the_rawand validator nodes not to be added to the graph - so any visualization of that driver's graph will only show the "business logic" nodes.What was changed
hamilton/function_modifiers/validation.pyBaseDataValidationDecorator(parent of bothcheck_outputandcheck_output_custom) now:Overrides
optional_config()to declare{"disable_data_quality_checks": False}. This registers the key with Hamilton's config-filtering pipeline so it is threaded through totransform_nodeautomatically — no driver plumbing needed.Adds an early-return guard at the top of
transform_node:When the flag is set, the node is returned unchanged — no
_rawnode, no validator nodes, and no expansion happens. The cost is literally zero: the extra nodes are never created.hamilton/driver.pyBuildergains a convenience method:This is a thin wrapper over
with_config— discoverable by IDE autocomplete and explicit about intent. Becausewith_configdoes a dict.update(), a later call like.with_config({"disable_data_quality_checks": False})will re-enable validation, which is the expected last-write-wins behavior.Usage
Via the
Builderconvenience method (recommended):Via
with_configdirectly (equivalent, useful when config is assembled dynamically):Legacy
Driverconstructor:Tests added
tests/function_modifiers/test_validation.pytest_check_output_disabled_via_config_returns_original_nodecheck_output_customreturns the original node unchanged when flag is settests/function_modifiers/test_validation.pytest_check_output_builtin_disabled_via_config_returns_original_nodecheck_output(built-in validators) also respects the flagtests/test_end_to_end.pytest_builder_with_data_quality_disabled_removes_validator_nodeslist_available_variables()when disabledtests/test_end_to_end.pytest_builder_with_data_quality_disabled_still_executes_correctlytests/test_end_to_end.pytest_disable_data_quality_checks_config_key_works_directlywith_configpath (no convenience method) also suppresses validator nodesAll 13 pre-existing validation tests continue to pass.
Design notes
Graph-construction time, not execution time. Disabling at construction eliminates the extra nodes entirely. An execution-time approach (e.g., a lifecycle adapter) would still pay graph construction and scheduling overhead, and would be harder to reason about.
Config key, not a subclass. A
NoValidationBuildersubclass was considered. It was rejected: this is a single boolean flag, not a fundamentally different execution model. The existingBuildermethod pattern (with_config,with_adapters, …) is the right weight here. TheBuilder.with_data_quality_disabled()convenience method is offered as a discoverable alias — if maintainers prefer, the method alone (without the raw config key) or the raw key alone (without the method) is equally viable.optional_config()is the correct hook. Hamilton'sresolve_config/filter_configmachinery passes only declared config keys to each decorator. Registering the key viaoptional_config()means the flag is silently ignored (defaulting toFalse) by all existing drivers that never set it — no breaking change.Checklist