Skip to content

feat(tests): add HETA 1.2.0 parquet size checks and GeoJSON parity validation#640

Open
ari-nz wants to merge 11 commits into
mainfrom
feat/heta-1.2.0-parquet-validation
Open

feat(tests): add HETA 1.2.0 parquet size checks and GeoJSON parity validation#640
ari-nz wants to merge 11 commits into
mainfrom
feat/heta-1.2.0-parquet-validation

Conversation

@ari-nz
Copy link
Copy Markdown
Collaborator

@ari-nz ari-nz commented May 12, 2026

Summary

Adds validation for the 3 new parquet outputs introduced in HETA 1.2.0 (tissue_qc, tissue_segmentation, cell_classification). cell_detection parquet outputs are intentionally excluded as they are being removed from the pipeline.

  • Updates SPOT_0_EXPECTED_RESULT_FILES and SPOT_1_EXPECTED_RESULT_FILES to include the 3 new parquet entries (12 files total)
  • Updates cli_test.py and gui_test.py to assert 12 result files instead of 9
  • Adds parquet↔GeoJSON parity checks: len(pd.read_parquet(...)) must equal len(geojson["features"]) for each paired output

Test plan

  • Long-running e2e tests download all 12 output files and assert sizes within ±10%
  • Parity check validates row counts match GeoJSON feature counts for all 3 paired outputs on both staging and production

Copilot AI review requested due to automatic review settings May 12, 2026 08:36
@ari-nz ari-nz added the skip:test:long_running Skip long-running tests (≥5min) label May 12, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end test updates for HETA 1.2.0 outputs by expanding expected result artifacts to include the new parquet polygon exports and validating parquet↔GeoJSON feature parity.

Changes:

  • Extend SPOT_0_EXPECTED_RESULT_FILES / SPOT_1_EXPECTED_RESULT_FILES to include tissue_qc, tissue_segmentation, and cell_classification parquet outputs (now 12 expected files).
  • Update GUI/CLI e2e tests to assert 12 downloaded result files instead of 9.
  • Add parquet↔GeoJSON parity assertions by comparing parquet row counts to GeoJSON features counts for the three paired outputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tests/constants_test.py Updates expected output file lists and byte-size tolerances to include the three new parquet outputs for both production and staging.
tests/aignostics/application/gui_test.py Adjusts expected result file count to 12 and adds parquet↔GeoJSON parity validation after download.
tests/aignostics/application/cli_test.py Adjusts expected result file count to 12 and adds parquet↔GeoJSON parity validation after execution/download.

Comment thread tests/constants_test.py
Comment on lines +443 to +444
assert len(files_in_results_dir) == 12, (
f"Expected 12 files in {results_dir}, but found {len(files_in_results_dir)}: "
Comment thread tests/aignostics/application/cli_test.py Outdated
@ari-nz ari-nz force-pushed the chore/app-version-bumps branch from bd5f44a to 1a3e050 Compare May 12, 2026 13:41
@ari-nz ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from 47de64d to 4bf84bb Compare May 12, 2026 13:42
@ari-nz ari-nz removed the skip:test:long_running Skip long-running tests (≥5min) label May 19, 2026
Copilot AI review requested due to automatic review settings May 19, 2026 11:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@ari-nz ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from c65c3fc to afc60d1 Compare May 19, 2026 14:19
ari-nz and others added 6 commits May 19, 2026 19:06
- test-app: 0.0.6 → 1.0.0 (new version uses same he-tme input schema)
- he-tme: 1.1.0 → 1.1.1 on staging
- Remove SPECIAL_APPLICATION_ID/VERSION from staging (no longer needed)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…alization artifact

- Re-add SPECIAL_APPLICATION_ID/VERSION to staging pointing to test-app 1.0.0
  so e2e_test.py imports resolve on staging
- Remove normalization:wsi input artifact from _get_spots_payload_for_special;
  test-app 1.0.0 only requires whole_slide_image, matching the he-tme schema

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Remove SPECIAL_APPLICATION_ID/VERSION from staging constants entirely
- Guard the import in e2e_test.py with try/except so staging doesn't NameError
- Add skipif(SPECIAL_APPLICATION_ID is None) to both special-app tests
  so they are silently skipped on staging but still run on production (0.99.0)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Simpler than a try/except guard: staging defines SPECIAL_APPLICATION_ID
and SPECIAL_APPLICATION_VERSION as None, the regular import works, and
the existing skipif(SPECIAL_APPLICATION_ID is None) handles the rest.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…e-tme 1.2.0

- Replace SPOT_1 with breast cancer slide 1603ba4c (BREAST/BREAST_CANCER,
  6649×6578 at 0.25 MPP); preserve old 9375e3ed data as SPOT_4
- Add VIPS 10x resolution ambiguity note for SPOT_2, SPOT_3, SPOT_4
- Bump HETA_APPLICATION_VERSION to 1.2.0, TEST_APPLICATION_VERSION to 1.0.0
- Remove SPECIAL_APPLICATION concept; restore stress tests against test-app 1.0.0
- Unify payload builders via _build_wsi_input_item / _build_minimal_wsi_input_item
- Update SPOT_1_EXPECTED_RESULT_FILES sizes from staging run 43a3bcd2
- Reduce PIPELINE_NODE_ACQUISITION_TIMEOUT_MINUTES to 25
Copilot AI review requested due to automatic review settings May 19, 2026 17:07
@ari-nz ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from afc60d1 to 8896207 Compare May 19, 2026 17:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

Comment thread tests/constants_test.py
Comment thread tests/aignostics/platform/e2e_test.py Outdated
Comment on lines 139 to 155
def _build_minimal_wsi_input_item(gs_url: str, crc32c: str, expires_seconds: int) -> platform.InputItem:
"""Build a minimal WSI InputItem supplying only the CRC32C and image URL."""
return platform.InputItem(
external_id=gs_url,
input_artifacts=[
platform.InputArtifact(
name="whole_slide_image",
download_url=platform.generate_signed_url(url=gs_url, expires_seconds=expires_seconds),
metadata={
"checksum_base64_crc32c": crc32c,
"media_type": "image/tiff",
},
)
],
)


Comment thread VERSION
@ari-nz ari-nz changed the base branch from chore/app-version-bumps to main May 19, 2026 17:41
- Use pyarrow.parquet.read_metadata() instead of pd.read_parquet() to
  get row count from Parquet footer without loading polygon data
- Use ijson streaming to count GeoJSON features without loading the
  full feature array into memory
- Replace hard-coded file counts with len(SPOT_x_EXPECTED_RESULT_FILES)
  to avoid drift when the constants change
- Sync qupath/gui_test.py to use len(SPOT_0_EXPECTED_RESULT_FILES)
  instead of the stale literal 9
- Remove unused _build_minimal_wsi_input_item dead code from e2e_test.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.
see 6 files with indirect coverage changes

Copilot AI review requested due to automatic review settings May 20, 2026 09:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment thread tests/constants_test.py
Comment on lines 21 to +31
SPOT_1_GS_URL = (
"gs://aignostics-platform-ext-a4f7e9/python-sdk-tests/he-tme/slides/9375e3ed-28d2-4cf3-9fb9-8df9d11a6627.tiff"
"gs://aignostics-platform-ext-a4f7e9/python-sdk-tests/he-tme/slides/1603ba4c-398a-49db-926b-c14d8f17dc83.tiff"
)
SPOT_1_FILENAME = "9375e3ed-28d2-4cf3-9fb9-8df9d11a6627.tiff"
SPOT_1_CRC32C = "9l3NNQ=="
SPOT_1_FILESIZE = 14681750
SPOT_1_RESOLUTION_MPP = 0.46499982
SPOT_1_WIDTH = 3728
SPOT_1_HEIGHT = 3640

SPOT_1_FILENAME = "1603ba4c-398a-49db-926b-c14d8f17dc83.tiff"
SPOT_1_CRC32C = "MKWV1g=="
SPOT_1_FILESIZE = 8942460
SPOT_1_RESOLUTION_MPP = 0.25
SPOT_1_WIDTH = 6649
SPOT_1_HEIGHT = 6578
SPOT_1_TISSUE = "BREAST"
SPOT_1_DISEASE = "BREAST_CANCER"
Comment on lines +475 to +480
import pyarrow.parquet as pq

for parquet_filename, geojson_filename in parquet_geojson_pairs:
parquet_path = results_dir / parquet_filename
geojson_path = results_dir / geojson_filename
parquet_row_count = pq.read_metadata(parquet_path).num_rows
Comment on lines +1143 to +1148
import pyarrow.parquet as pq

for parquet_filename, geojson_filename in parquet_geojson_pairs:
parquet_path = results_dir / parquet_filename
geojson_path = results_dir / geojson_filename
parquet_row_count = pq.read_metadata(parquet_path).num_rows
@ari-nz ari-nz requested a review from alexa-ca May 20, 2026 15:16
Copilot AI review requested due to automatic review settings May 20, 2026 15:18
@ari-nz ari-nz enabled auto-merge May 20, 2026 15:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment on lines +476 to +481
import pyarrow.parquet as pq

for parquet_filename, geojson_filename in parquet_geojson_pairs:
parquet_path = results_dir / parquet_filename
geojson_path = results_dir / geojson_filename
parquet_row_count = pq.read_metadata(parquet_path).num_rows
Comment on lines +1143 to +1148
import pyarrow.parquet as pq

for parquet_filename, geojson_filename in parquet_geojson_pairs:
parquet_path = results_dir / parquet_filename
geojson_path = results_dir / geojson_filename
parquet_row_count = pq.read_metadata(parquet_path).num_rows
Comment on lines 87 to +101
# Plan to have 100.000 slides processed in total, with 100 slides per application run,
# one application run starting every 5 minutes, with a throughput of 1 slide per minute,
# given no GPU.
SPECIAL_APPLICATION_SLIDE_PER_RUN_COUNT = 100
SPECIAL_APPLICATION_SLIDE_PER_RUN_COUNT_ON_00 = 2000 # Minute 0..9
SPECIAL_APPLICATION_SLIDE_PER_RUN_COUNT_ON_20 = 2000 # Minute 20..29
SPECIAL_APPLICATION_SUBMIT_AND_FIND_DUE_DATE_SECONDS = 60 * 60 * 20 # 20 hours
SPECIAL_APPLICATION_SUBMIT_AND_FIND_DEADLINE_SECONDS = 60 * 60 * 24 # 24 hours
SPECIAL_APPLICATION_SUBMIT_AND_FIND_DUE_DATE_SECONDS_ON_40 = 60 * 60 * 2 # 2 hours
SPECIAL_APPLICATION_SUBMIT_AND_FIND_DEADLINE_SECONDS_ON_40 = 60 * 60 * 3 # 3 hours
SPECIAL_APPLICATION_SUBMIT_AND_FIND_SUBMIT_TIMEOUT_SECONDS = 60 * 30 # 30 minutes
SPECIAL_APPLICATION_FIND_AND_VALIDATE_TIMEOUT_SECONDS = 60 * 60 # 60 minutes
TEST_APP_STRESS_SLIDE_PER_RUN_COUNT = 100
TEST_APP_STRESS_SLIDE_PER_RUN_COUNT_ON_00 = 2000 # Minute 0..9
TEST_APP_STRESS_SLIDE_PER_RUN_COUNT_ON_20 = 2000 # Minute 20..29
TEST_APP_STRESS_SUBMIT_AND_FIND_DUE_DATE_SECONDS = 60 * 60 * 20 # 20 hours
TEST_APP_STRESS_SUBMIT_AND_FIND_DEADLINE_SECONDS = 60 * 60 * 24 # 24 hours
TEST_APP_STRESS_SUBMIT_AND_FIND_DUE_DATE_SECONDS_ON_40 = 60 * 60 * 2 # 2 hours
TEST_APP_STRESS_SUBMIT_AND_FIND_DEADLINE_SECONDS_ON_40 = 60 * 60 * 3 # 3 hours
TEST_APP_STRESS_SUBMIT_AND_FIND_SUBMIT_TIMEOUT_SECONDS = 60 * 30 # 30 minutes
TEST_APP_STRESS_FIND_AND_VALIDATE_TIMEOUT_SECONDS = 60 * 60 # 60 minutes


def _build_wsi_input_item( # noqa: PLR0913, PLR0917
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
32.2% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@pytest.mark.stress_only
@pytest.mark.long_running
@pytest.mark.timeout(timeout=TEST_APP_STRESS_SUBMIT_AND_FIND_SUBMIT_TIMEOUT_SECONDS)
def test_platform_test_app_stress_submit() -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do these run? They could get very expensive if running through. Should we consider cancelling after acknowledging they have been submitted or so?

]
import pyarrow.parquet as pq

for parquet_filename, geojson_filename in parquet_geojson_pairs:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too complicated maybe but a rough area check could be nice for the segmentation ones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants