feat(tests): add HETA 1.2.0 parquet size checks and GeoJSON parity validation by ari-nz · Pull Request #640 · aignostics/python-sdk

ari-nz · 2026-05-12T08:36:51Z

Summary

Adds validation for the 3 new parquet outputs introduced in HETA 1.2.0 (tissue_qc, tissue_segmentation, cell_classification). cell_detection parquet outputs are intentionally excluded as they are being removed from the pipeline.

Updates SPOT_0_EXPECTED_RESULT_FILES and SPOT_1_EXPECTED_RESULT_FILES to include the 3 new parquet entries (12 files total)
Updates cli_test.py and gui_test.py to assert 12 result files instead of 9
Adds parquet↔GeoJSON parity checks: len(pd.read_parquet(...)) must equal len(geojson["features"]) for each paired output

Test plan

Long-running e2e tests download all 12 output files and assert sizes within ±10%
Parity check validates row counts match GeoJSON feature counts for all 3 paired outputs on both staging and production

Copilot

Pull request overview

Adds end-to-end test updates for HETA 1.2.0 outputs by expanding expected result artifacts to include the new parquet polygon exports and validating parquet↔GeoJSON feature parity.

Changes:

Extend SPOT_0_EXPECTED_RESULT_FILES / SPOT_1_EXPECTED_RESULT_FILES to include tissue_qc, tissue_segmentation, and cell_classification parquet outputs (now 12 expected files).
Update GUI/CLI e2e tests to assert 12 downloaded result files instead of 9.
Add parquet↔GeoJSON parity assertions by comparing parquet row counts to GeoJSON features counts for the three paired outputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`tests/constants_test.py`	Updates expected output file lists and byte-size tolerances to include the three new parquet outputs for both production and staging.
`tests/aignostics/application/gui_test.py`	Adjusts expected result file count to 12 and adds parquet↔GeoJSON parity validation after download.
`tests/aignostics/application/cli_test.py`	Adjusts expected result file count to 12 and adds parquet↔GeoJSON parity validation after execution/download.

+        assert len(files_in_results_dir) == 12, (
+            f"Expected 12 files in {results_dir}, but found {len(files_in_results_dir)}: "


Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

- test-app: 0.0.6 → 1.0.0 (new version uses same he-tme input schema) - he-tme: 1.1.0 → 1.1.1 on staging - Remove SPECIAL_APPLICATION_ID/VERSION from staging (no longer needed) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…alization artifact - Re-add SPECIAL_APPLICATION_ID/VERSION to staging pointing to test-app 1.0.0 so e2e_test.py imports resolve on staging - Remove normalization:wsi input artifact from _get_spots_payload_for_special; test-app 1.0.0 only requires whole_slide_image, matching the he-tme schema Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Remove SPECIAL_APPLICATION_ID/VERSION from staging constants entirely - Guard the import in e2e_test.py with try/except so staging doesn't NameError - Add skipif(SPECIAL_APPLICATION_ID is None) to both special-app tests so they are silently skipped on staging but still run on production (0.99.0) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Simpler than a try/except guard: staging defines SPECIAL_APPLICATION_ID and SPECIAL_APPLICATION_VERSION as None, the regular import works, and the existing skipif(SPECIAL_APPLICATION_ID is None) handles the rest. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…e-tme 1.2.0 - Replace SPOT_1 with breast cancer slide 1603ba4c (BREAST/BREAST_CANCER, 6649×6578 at 0.25 MPP); preserve old 9375e3ed data as SPOT_4 - Add VIPS 10x resolution ambiguity note for SPOT_2, SPOT_3, SPOT_4 - Bump HETA_APPLICATION_VERSION to 1.2.0, TEST_APPLICATION_VERSION to 1.0.0 - Remove SPECIAL_APPLICATION concept; restore stress tests against test-app 1.0.0 - Unify payload builders via _build_wsi_input_item / _build_minimal_wsi_input_item - Update SPOT_1_EXPECTED_RESULT_FILES sizes from staging run 43a3bcd2 - Reduce PIPELINE_NODE_ACQUISITION_TIMEOUT_MINUTES to 25

…lidation

Copilot

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

+def _build_minimal_wsi_input_item(gs_url: str, crc32c: str, expires_seconds: int) -> platform.InputItem:
+    """Build a minimal WSI InputItem supplying only the CRC32C and image URL."""
+    return platform.InputItem(
+        external_id=gs_url,
+        input_artifacts=[
+            platform.InputArtifact(
+                name="whole_slide_image",
+                download_url=platform.generate_signed_url(url=gs_url, expires_seconds=expires_seconds),
+                metadata={
+                    "checksum_base64_crc32c": crc32c,
+                    "media_type": "image/tiff",
+                },
+            )
+        ],
+    )




- Use pyarrow.parquet.read_metadata() instead of pd.read_parquet() to get row count from Parquet footer without loading polygon data - Use ijson streaming to count GeoJSON features without loading the full feature array into memory - Replace hard-coded file counts with len(SPOT_x_EXPECTED_RESULT_FILES) to avoid drift when the constants change - Sync qupath/gui_test.py to use len(SPOT_0_EXPECTED_RESULT_FILES) instead of the stale literal 9 - Remove unused _build_minimal_wsi_input_item dead code from e2e_test.py

codecov · 2026-05-19T18:29:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.
see 6 files with indirect coverage changes

…compliance

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

 SPOT_1_GS_URL = (
-    "gs://aignostics-platform-ext-a4f7e9/python-sdk-tests/he-tme/slides/9375e3ed-28d2-4cf3-9fb9-8df9d11a6627.tiff"
+    "gs://aignostics-platform-ext-a4f7e9/python-sdk-tests/he-tme/slides/1603ba4c-398a-49db-926b-c14d8f17dc83.tiff"
 )
-SPOT_1_FILENAME = "9375e3ed-28d2-4cf3-9fb9-8df9d11a6627.tiff"
-SPOT_1_CRC32C = "9l3NNQ=="
-SPOT_1_FILESIZE = 14681750
-SPOT_1_RESOLUTION_MPP = 0.46499982
-SPOT_1_WIDTH = 3728
-SPOT_1_HEIGHT = 3640
-
+SPOT_1_FILENAME = "1603ba4c-398a-49db-926b-c14d8f17dc83.tiff"
+SPOT_1_CRC32C = "MKWV1g=="
+SPOT_1_FILESIZE = 8942460
+SPOT_1_RESOLUTION_MPP = 0.25
+SPOT_1_WIDTH = 6649
+SPOT_1_HEIGHT = 6578
+SPOT_1_TISSUE = "BREAST"
+SPOT_1_DISEASE = "BREAST_CANCER"


+        import pyarrow.parquet as pq
+
+        for parquet_filename, geojson_filename in parquet_geojson_pairs:
+            parquet_path = results_dir / parquet_filename
+            geojson_path = results_dir / geojson_filename
+            parquet_row_count = pq.read_metadata(parquet_path).num_rows


+    import pyarrow.parquet as pq
+
+    for parquet_filename, geojson_filename in parquet_geojson_pairs:
+        parquet_path = results_dir / parquet_filename
+        geojson_path = results_dir / geojson_filename
+        parquet_row_count = pq.read_metadata(parquet_path).num_rows


…le size constant

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

+        import pyarrow.parquet as pq
+
+        for parquet_filename, geojson_filename in parquet_geojson_pairs:
+            parquet_path = results_dir / parquet_filename
+            geojson_path = results_dir / geojson_filename
+            parquet_row_count = pq.read_metadata(parquet_path).num_rows


+    import pyarrow.parquet as pq
+
+    for parquet_filename, geojson_filename in parquet_geojson_pairs:
+        parquet_path = results_dir / parquet_filename
+        geojson_path = results_dir / geojson_filename
+        parquet_row_count = pq.read_metadata(parquet_path).num_rows


 # Plan to have 100.000 slides processed in total, with 100 slides per application run,
 # one application run starting every 5 minutes, with a throughput of 1 slide per minute,
 # given no GPU.
-SPECIAL_APPLICATION_SLIDE_PER_RUN_COUNT = 100
-SPECIAL_APPLICATION_SLIDE_PER_RUN_COUNT_ON_00 = 2000  # Minute 0..9
-SPECIAL_APPLICATION_SLIDE_PER_RUN_COUNT_ON_20 = 2000  # Minute 20..29
-SPECIAL_APPLICATION_SUBMIT_AND_FIND_DUE_DATE_SECONDS = 60 * 60 * 20  # 20 hours
-SPECIAL_APPLICATION_SUBMIT_AND_FIND_DEADLINE_SECONDS = 60 * 60 * 24  # 24 hours
-SPECIAL_APPLICATION_SUBMIT_AND_FIND_DUE_DATE_SECONDS_ON_40 = 60 * 60 * 2  # 2 hours
-SPECIAL_APPLICATION_SUBMIT_AND_FIND_DEADLINE_SECONDS_ON_40 = 60 * 60 * 3  # 3 hours
-SPECIAL_APPLICATION_SUBMIT_AND_FIND_SUBMIT_TIMEOUT_SECONDS = 60 * 30  # 30 minutes
-SPECIAL_APPLICATION_FIND_AND_VALIDATE_TIMEOUT_SECONDS = 60 * 60  # 60 minutes
+TEST_APP_STRESS_SLIDE_PER_RUN_COUNT = 100
+TEST_APP_STRESS_SLIDE_PER_RUN_COUNT_ON_00 = 2000  # Minute 0..9
+TEST_APP_STRESS_SLIDE_PER_RUN_COUNT_ON_20 = 2000  # Minute 20..29
+TEST_APP_STRESS_SUBMIT_AND_FIND_DUE_DATE_SECONDS = 60 * 60 * 20  # 20 hours
+TEST_APP_STRESS_SUBMIT_AND_FIND_DEADLINE_SECONDS = 60 * 60 * 24  # 24 hours
+TEST_APP_STRESS_SUBMIT_AND_FIND_DUE_DATE_SECONDS_ON_40 = 60 * 60 * 2  # 2 hours
+TEST_APP_STRESS_SUBMIT_AND_FIND_DEADLINE_SECONDS_ON_40 = 60 * 60 * 3  # 3 hours
+TEST_APP_STRESS_SUBMIT_AND_FIND_SUBMIT_TIMEOUT_SECONDS = 60 * 30  # 30 minutes
+TEST_APP_STRESS_FIND_AND_VALIDATE_TIMEOUT_SECONDS = 60 * 60  # 60 minutes
+
+
+def _build_wsi_input_item(  # noqa: PLR0913, PLR0917


sonarqubecloud · 2026-05-21T07:17:22Z

Quality Gate failed

Failed conditions
32.2% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

blanca-pablos · 2026-05-21T10:23:03Z

+@pytest.mark.stress_only
+@pytest.mark.long_running
+@pytest.mark.timeout(timeout=TEST_APP_STRESS_SUBMIT_AND_FIND_SUBMIT_TIMEOUT_SECONDS)
+def test_platform_test_app_stress_submit() -> None:


When do these run? They could get very expensive if running through. Should we consider cancelling after acknowledging they have been submitted or so?

blanca-pablos · 2026-05-21T10:24:44Z

+    ]
+    import pyarrow.parquet as pq
+
+    for parquet_filename, geojson_filename in parquet_geojson_pairs:


too complicated maybe but a rough area check could be nice for the segmentation ones

Copilot AI review requested due to automatic review settings May 12, 2026 08:36

ari-nz requested review from a team and helmut-hoffer-von-ankershoffen as code owners May 12, 2026 08:36

ari-nz added the skip:test:long_running Skip long-running tests (≥5min) label May 12, 2026

Copilot started reviewing on behalf of ari-nz May 12, 2026 08:37 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

ari-nz force-pushed the chore/app-version-bumps branch from bd5f44a to 1a3e050 Compare May 12, 2026 13:41

ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from 47de64d to 4bf84bb Compare May 12, 2026 13:42

ari-nz removed the skip:test:long_running Skip long-running tests (≥5min) label May 19, 2026

Copilot AI review requested due to automatic review settings May 19, 2026 11:27

Copilot started reviewing on behalf of ari-nz May 19, 2026 11:27 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from c65c3fc to afc60d1 Compare May 19, 2026 14:19

ari-nz and others added 6 commits May 19, 2026 19:06

feat(tests): add HETA 1.2.0 parquet size checks and GeoJSON parity va…

8896207

…lidation

Copilot AI review requested due to automatic review settings May 19, 2026 17:07

ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from afc60d1 to 8896207 Compare May 19, 2026 17:07

Copilot started reviewing on behalf of ari-nz May 19, 2026 17:08 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

ari-nz changed the base branch from chore/app-version-bumps to main May 19, 2026 17:41

fix(tests): add blank line after lazy pyarrow import for ruff format …

8d1ddfe

…compliance

Copilot AI review requested due to automatic review settings May 20, 2026 09:23

Copilot started reviewing on behalf of ari-nz May 20, 2026 09:23 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

fix(tests): update stale assertions — 16 schemata files and SPOT_1 fi…

b8261e0

…le size constant

ari-nz requested a review from alexa-ca May 20, 2026 15:16

Merge branch 'main' into feat/heta-1.2.0-parquet-validation

af964b0

Copilot AI review requested due to automatic review settings May 20, 2026 15:18

Copilot started reviewing on behalf of ari-nz May 20, 2026 15:18 View session

ari-nz enabled auto-merge May 20, 2026 15:19

alexa-ca approved these changes May 20, 2026

View reviewed changes

Copilot AI reviewed May 20, 2026

View reviewed changes

Merge branch 'main' into feat/heta-1.2.0-parquet-validation

03e47d1

blanca-pablos reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tests): add HETA 1.2.0 parquet size checks and GeoJSON parity validation#640

feat(tests): add HETA 1.2.0 parquet size checks and GeoJSON parity validation#640
ari-nz wants to merge 11 commits into
mainfrom
feat/heta-1.2.0-parquet-validation

ari-nz commented May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

sonarqubecloud Bot commented May 21, 2026

Uh oh!

blanca-pablos May 21, 2026

Uh oh!

blanca-pablos May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		assert len(files_in_results_dir) == 12, (
		f"Expected 12 files in {results_dir}, but found {len(files_in_results_dir)}: "

Conversation

ari-nz commented May 12, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

sonarqubecloud Bot commented May 21, 2026

Quality Gate failed

Uh oh!

blanca-pablos May 21, 2026

Choose a reason for hiding this comment

Uh oh!

blanca-pablos May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented May 19, 2026 •

edited

Loading