feat: Add example rows to ValidationError for all rule failures#286
feat: Add example rows to ValidationError for all rule failures#286
ValidationError for all rule failures#286Conversation
…and examples Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
primary_key ValidationError
borchero
left a comment
There was a problem hiding this comment.
- Adjust the
format_rule_failuresmethod to optionally include examples. Possibly add a helper method toFailureInfoif that is helpful. - Do not only provide examples for primary key violations but for any type of validation
- Adjust the Rust plugin s.t. the same information is generated during lazy execution
…paths Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
Addressed in commit 462f8ce:
Example output: Note: string quotes differ slightly between eager (Python |
primary_key ValidationErrorValidationError for all rule failures
Validation failures only reported a row count, requiring a debugger session to identify which values caused failures—painful in long-running pipelines.
Changes
src/polars_plugin/validation_error.rs: Extendedformat_rule_failuresto accept an optionalexamplesparameter (dict[str, list[str]]). UpdatedRuleValidationError::to_stringto include example rows for both schema-level and column-level rules.src/polars_plugin/mod.rs: Updatedall_rules_requiredpolars plugin to accept data columns as additional inputs after the rule boolean columns. For each failing rule, computes up to 5 distinct example rows usingAnyValue::Displayand includes them in the lazy-execution error message.dataframely/filter_result.py: Added publicFailureInfo.examples(max_examples=5)helper method that returns distinct example rows (as formatted strings) for each failing rule.dataframely/_plugin.py: Addeddata_columnsparameter toall_rules_required; passes data columns as additional args alongsidenum_rule_columnskwarg.dataframely/schema.py: Eager path callsfailure.examples()and passes results toformat_rule_failures. Lazy path passescls.column_names()asdata_columnstoall_rules_required.dataframely/collection/collection.py: Updatedformat_rule_failurescall to passfailure.examples().dataframely/_native.pyi: Updated type stub forformat_rule_failures.tests/schema/test_validate.py: Updatedtest_invalid_primary_keyto assert examples appear in the error message for both eager and lazy paths.Example
Examples are also included in lazy execution errors raised by the Rust plugin. Note that string values use double quotes in lazy errors (Rust
AnyValue::Displayformat) versus single quotes in eager errors (Pythonstr(dict)format).💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.