Skip to content

e2e: hermetic ADK cassettes, matchFileSnapshot migration, seinfeld gzip fix#1966

Open
Stephen Belanger (Qard) wants to merge 11 commits intomainfrom
t3code/e2e-followup
Open

e2e: hermetic ADK cassettes, matchFileSnapshot migration, seinfeld gzip fix#1966
Stephen Belanger (Qard) wants to merge 11 commits intomainfrom
t3code/e2e-followup

Conversation

@Qard
Copy link
Copy Markdown
Contributor

Summary

  • fix(seinfeld): buildResponse was preserving content-encoding: gzip while serving already-decoded body bytes (undici decompresses at the HTTP layer). Callers like Google ADK would attempt a second gunzip and get incorrect header check. Fixed by stripping content-encoding, transfer-encoding, and content-length from replayed/recorded responses. Also switches handleRecord to return buildResponse() instead of realResponse.clone() for non-binary-draft bodies, avoiding empty-body issues from double-tee'd streams.

  • feat(e2e/google-adk): Records hermetic cassettes for both ADK variants (0.6.1 and 1.0.0). Each cassette has two Gemini entries — call 0 returns a functionCall for get_weather, call 1 returns the final answer. A per-scenario cassette-filter.mjs ignores the ?key= query param and all body fields (volatile functionCallId UUIDs), so matching relies solely on callIndex.

  • refactor(e2e): Adds a matchFileSnapshot wrapper in helpers/file-snapshot.ts that is a no-op in canary mode. All scenario test files and assertions modules are migrated from toMatchFileSnapshot to the new helper, so canary runs skip snapshot comparison for non-deterministic live API responses.

  • chore(e2e): Restores DRAIN_DELAY_MS to 2000ms and removes the temporary onRecord debug callback from cassette-preload.mjs. Also adds the installRecordModeGuard function that prevents premature cassette flush during multi-step ADK tool-call flows.

Test plan

  • pnpm run test:e2e:hermetic -t "google adk" passes (32/32) without any Google API key
  • Running hermetic twice produces identical output (no snapshot drift)
  • All wiped cassettes restored from git — hermetic suite for other scenarios unaffected
  • matchFileSnapshot migration verified: zero remaining toMatchFileSnapshot calls in e2e/scenarios/

🤖 Generated with Claude Code

Stephen Belanger and others added 11 commits May 8, 2026 18:21
Node.js undici decompresses gzip/deflate at the HTTP layer before
passing the body to MSW handlers. The stored body bytes are therefore
already plain JSON/text. buildResponse was preserving the original
content-encoding header, which caused callers (e.g. Google ADK) to
attempt a second gunzip of already-decoded bytes, producing a zlib
"incorrect header check" error and making the response unreadable.

Fix: strip content-encoding, transfer-encoding, and content-length
from the Response built by buildResponse (both replay and record
return paths).

Also switch handleRecord to return buildResponse() instead of
realResponse.clone() for non-binary-draft bodies. After
recordResponseDraft() tees the body stream, clone() can return an
empty body on some Node versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DRAIN_DELAY_MS was temporarily raised to 15000ms during ADK cassette
debugging. The root cause (gzip content-encoding bug in seinfeld) is
now fixed, so restore the original 2-second drain delay.

Also remove the temporary onRecord stderr callback that was added for
diagnostics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Record cassettes for both ADK versions (0.6.1 and 1.0.0) and update
snapshots to match. The cassette filter ignores query params (Google
API key) and all body fields (volatile functionCall IDs), relying on
callIndex alone for stable matching.

Both variants now produce two cassette entries: call 0 returns a
functionCall for get_weather, call 1 returns the final answer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The new matchFileSnapshot wrapper in helpers/file-snapshot.ts is a
no-op in canary mode (where snapshot comparison is skipped because live
API responses are non-deterministic). All scenario test files and
assertions modules are migrated to use the new helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- resolveFileSnapshotPath now routes canary-mode tests to
  __snapshots__/canary/ so pinned and canary baselines diverge cleanly
- matchFileSnapshot no longer skips in canary mode — canary tests now
  compare against the canary snapshot set instead of doing nothing
- run-canary-tests.mjs: detect --update flag and pass to vitest so
  snapshot files can be refreshed programmatically
- run-canary-tests-docker.mjs: add COPILOT_API_KEY to ALLOWED_ENV_KEYS
  so the GitHub Copilot scenario receives the token inside the container
- Add update-canary-snapshots.yaml: weekly scheduled workflow that runs
  canary tests with --update and opens a PR if any snapshots changed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…th headers

- RedactionConfig gains omitRequestHeaders?: boolean; when true the
  entire request header map is cleared before other header processing
- PARANOID_REDACTION preset now sets omitRequestHeaders: true so
  cassette files never contain raw credentials by accident
- x-goog-api-key added to AUTH_HEADERS / CREDENTIAL_HEADERS so it is
  recognised as a credential even when not omitted outright
- Update tests to reflect that paranoid preset now drops all headers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Applies the new omitRequestHeaders: true paranoid preset across all
recorded cassettes: every request now has "headers": {} (empty). This
removes the leaked x-goog-api-key value that was committed in
google-adk-v061 and google-adk-v1000, and normalises the format
consistently across all scenarios.

Key order within each entry is also updated to the alphabetical order
produced by sortKeys() so future re-recordings produce clean diffs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Run pnpm run fix:formatting to resolve prettier failures that were
introduced without a pre-commit formatting pass in the previous commits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Commits previously untracked fixture files:

- github-copilot-instrumentation/__snapshots__: span event snapshots for
  the pinned copilot test suite (v0-auto and v0-wrapped)
- claude-agent-sdk-instrumentation/__cassettes__: hermetic cassettes for
  4 claude-agent-sdk versions
- cohere-instrumentation/__cassettes__: additional cassettes for cohere
  v7, v7-20-0, v7-21-0, and v8
- google-genai-instrumentation/__cassettes__: cassettes + binary blobs
  for google-genai v1300, v1440, v1450, v1460
- mistral-instrumentation/__cassettes__: cassettes for 6 mistral versions

All request headers are empty (omitRequestHeaders: true) — no credentials.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-genai)

fix(e2e): bootstrap canary snapshots on first run instead of failing
fix(e2e): skip github-copilot scenario when COPILOT_API_KEY is not set

- Delete 4 cohere stub cassettes (entries: []) that caused replay "Failed to fetch"
- Delete 6 mistral stub cassettes (entries: []) that caused replay "Failed to fetch"
- Delete 4 incomplete google-genai cassettes (only 2 of ~10 requests recorded)
  and their blob directories — also fixes e2e-hermetic timeout since the retry
  delays on ~8 missing requests were consuming the 30-min budget
- matchFileSnapshot now bootstraps __snapshots__/canary/ on first run (write +
  pass) so e2e-canary CI doesn't fail before update-canary-snapshots runs
- github-copilot scenario skips gracefully (describe.skipIf) when COPILOT_API_KEY
  is absent rather than erroring out the whole e2e job

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The bootstrap approach (write-then-compare on re-run) breaks when multiple
test variants share a snapshot path: the first variant writes the file, the
second compares against it and fails because live-API runs naturally diverge.

Canary tests exist to catch live API failures and track snapshot drift over
time. Content comparison within the same CI run is the wrong layer for that:
- The e2e-canary job should pass as long as the instrumentation works end-to-end
- Snapshot drift is surfaced by the update-canary-snapshots PR workflow, which
  runs weekly and opens a PR showing what changed

In canary mode, always write and pass. The pinned hermetic suite retains full
snapshot comparison as before.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant