Fixes 21329: exclude temporal table period columns from autoClassification sampling#27960
Fixes 21329: exclude temporal table period columns from autoClassification sampling#27960
Conversation
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
🟡 Playwright Results — all passed (15 flaky)✅ 4013 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 86 skipped
🟡 15 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
Query sys.columns for generated_always_type to detect SYSTEM_TIME period columns (ValidFrom/ValidTo) and skip them in both schema reflection (mssql/utils.py) and sample data fetching (AzureSQLSampler). Also moves the catalog round-trip inside the `if columns` guard to avoid the query when column filtering is not in use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds sampler unit tests covering period-column filtering and NOT_COMPUTE_PYODBC exclusion. Adds a PII processor test case for temporal tables using single first-names to avoid non-deterministic NER matches. Corrects customers_sensitive expected tags to include address→PII.NonSensitive, which the classifier now correctly detects. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the isolated sampler unit test with an end-to-end integration test that registers the AzureSQL service, creates a system-versioned table, runs MetadataWorkflow then AutoClassificationWorkflow, and asserts that sample data excludes ValidFrom/ValidTo. Includes SQL permission prerequisites and troubleshooting guide in the module docstring. Teardown controlled by AZURE_SQL_CLEANUP env var (default: true). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Review ✅ Approved 2 resolved / 2 findingsExcludes temporal table period columns from autoClassification sampling to resolve SQL Server compatibility issues and pins spacy versioning. Addresses bypass logic for empty column lists and redundant database queries. ✅ 2 resolved✅ Bug: Filtered columns bypass when all are excluded (empty list falsy)
✅ Performance: Unnecessary DB query when columns is None
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|



Summary
list index out of rangeerror in autoClassification ingestion for AzureSQL tables with Temporal Tables enabled (issue autoClassification ingestion fails due to AzureSQL Temporal Table fields #21329)GENERATED ALWAYS AS ROW START/ENDperiod columns (ValidFrom/ValidTo) that break the autoClassification sampler in two ways:TABLESAMPLEis not supported on temporal tables (SQL Server error 13541), and pyodbc cannot reliably handle period column typesmssql/utils.py): filtersgenerated_always_type IN (1, 2)inget_columnsso temporal period columns are never stored in entity metadataazuresql/sampler.py): adds_get_temporal_column_names()queryingsys.columnsto detect and exclude temporal period columns infetch_sample_data, as a defense-in-depth guardTest plan
test_azuresql_sampling.py: verifiesfetch_sample_datafilters out columns returned by_get_temporal_column_names(mocked)test_cases/azuresql_temporal_table.py): table entity has 5 columns includingValidFrom/ValidTo, sample data has only the 3 regular columns — verifies the processor handles this correctly and detects PII onemailtest_azuresql_temporal_table.py): skipped unlessAZURE_SQL_HOST,AZURE_SQL_DATABASE,AZURE_SQL_USERNAME,AZURE_SQL_PASSWORDenv vars are set; creates a real temporal table and asserts temporal columns are absent from sample dataCloses #21329
🤖 Generated with Claude Code
Summary by Gitar
spacytoingestion/setup.pyto pin compatibility and resolve issues withspacy<3.8.mssql/utils.pyto filter out temporal period columns usinggenerated_always_type._get_temporal_column_namestoAzureSQLSamplerto identify and exclude period columns fromfetch_sample_data.AzureSQLSampler.This will update automatically on new commits.