Skip to content

feat(core): cross-platform Maestro view-hierarchy resolver (Android gRPC + iOS HTTP)#2217

Open
Sriram567 wants to merge 32 commits intomasterfrom
feat/maestro-ios-http-resolver
Open

feat(core): cross-platform Maestro view-hierarchy resolver (Android gRPC + iOS HTTP)#2217
Sriram567 wants to merge 32 commits intomasterfrom
feat/maestro-ios-http-resolver

Conversation

@Sriram567
Copy link
Copy Markdown

@Sriram567 Sriram567 commented May 7, 2026

Summary

Replaces the per-snapshot ~9s JVM-cold-start maestro hierarchy CLI shell-out with direct transport to Maestro's view-hierarchy services, on both platforms. Element-region resolution now runs as a stateless RPC against Maestro's existing channel rather than spawning a second Maestro flow context — fixing the production gRPC-session-collision failure mode that drops snapshots whenever a Maestro flow is in progress (which is always, during element-region resolution).

Platform Primary (env-conditional) Fallback chain
Android gRPC POST MaestroDriver/viewHierarchy on 127.0.0.1:$PERCY_ANDROID_GRPC_PORT maestro CLI shell-out → adb uiautomator dump
iOS HTTP POST /viewHierarchy on 127.0.0.1:$PERCY_IOS_DRIVER_HOST_PORT maestro CLI shell-out (--driver-host-port)

Both transports drop in alongside the same two-slot maestroHierarchyDrift envelope on /percy/healthcheck (per-platform schema-drift surface) and follow the same three-class error taxonomy: schema-class → no fallback + drift bit; channel-broken → fallback + cache eviction; contention-class → fallback (skipping CLI) + cache PRESERVED.

Self-hosted customers without the env var injection see zero behavior change — the env var presence is the deployment-shape signal; absence routes to the existing maestro CLI primary + adb fallback.

This PR consolidates work originally scoped across three branches (feat/ios-element-regions-maestro-hierarchy PR #2202 closed, feat/grpc-element-region-resolver PR #2210 to be closed, and the iOS HTTP work on this branch).

Architecture

iOS HTTP path (already shipped on the prior 14 commits of this branch)

runIosHttpDump POSTs {appIds: [], excludeKeyboardElements: false} to Maestro's iOS XCTestRunner /viewHierarchy endpoint. At cli-2.0.7 the runner detects the AUT itself via RunningApp.getForegroundApp() (Maestro PR #2365) — no bundleId discovery, no SDK changes, no realmobile control-plane changes. SpringBoard-only responses (older Maestro) route to maestro-CLI fallback.

Android gRPC path (newly absorbed from PR #2210)

runAndroidGrpcDump calls MaestroDriver/viewHierarchy directly via @grpc/grpc-js over the same transport Maestro CLI uses internally. Vendored proto at packages/core/src/proto/maestro_android.proto (upstream SHA bc8bde1b, cli-2.5.1).

Three-class error taxonomy (refined from PR #2210's two-class scheme during deepen-plan):

Class gRPC status codes Fallback Cache eviction Drift bit
Schema-class INVALID_ARGUMENT, FAILED_PRECONDITION, OUT_OF_RANGE, UNIMPLEMENTED, DATA_LOSS, decoder-failure None No Yes
Channel-broken UNAVAILABLE, INTERNAL, CANCELLED maestro CLI → adb Yes No
Contention-class DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED adb only (skip CLI — would queue behind same flow) No No

The contention-class refinement is the key correctness win: timeout under live Maestro flow load is backpressure evidence, not channel-breakage. Evicting the channel on every timeout would force a TCP+HTTP/2+TLS reconnect (~50-200ms cost) that buys nothing, because the underlying agent is still busy. Keeping the channel cached lets the next call reuse the queue position.

Symmetric timeouts: GRPC_HEALTHY_DEADLINE_MS = 1500 + GRPC_CIRCUIT_BREAKER_MS = 5000, parity with the iOS HTTP path's existing values. Outer Promise.race is defense-in-depth against historical grpc-node#2620 (closed in 1.9.11).

Per-Percy cache scope: the grpcClientCache Map is constructed on the Percy instance and disposed in stop() — matches @percy/core's established ownership pattern for every other long-lived resource (server, browser, queues, client). Module-global state would leak channels between concurrent Percy instances and create shutdown races.

Shutdown race handling: percy.stop() sets cache.shutdownInProgress = true before closing channels. Any in-flight runAndroidGrpcDump that hits CANCELLED returns {kind:'unavailable', reason:'shutdown'} — no fallback chain on a tearing-down process.

Two rollback knobs (deliberate)

  • Slow rollback (deploy-shape change): remove PERCY_ANDROID_GRPC_PORT (or PERCY_IOS_DRIVER_HOST_PORT) injection in mobile/realmobile.
  • Fast rollback (in-process kill switch): set PERCY_MAESTRO_GRPC=0 in the BS appPercy env. Skips gRPC on next CLI restart without coordinated mobile-side deploy. The 3am-page response.

What's in this bundle

The iOS HTTP path commits already on this branch plus 5 new commits absorbing PR #2210:

Unit / origin Commit Description
2026-05-07-002 plan Unit 1 68e67db5 vendor maestro_android.proto + add @grpc/grpc-js deps
2026-05-07-002 plan Units 2/3/7 73459be6 direct gRPC Android resolver with three-class taxonomy
2026-05-07-002 plan Unit 5 135f0ea3 per-Percy gRPC client cache lifecycle + shutdown race
2026-05-07-002 plan Units 2/3/4/5 bf5d1f55 28 new test specs covering all dispatch + classification paths
2026-05-07-002 plan Unit 6 e8583bdf concurrent-access merge gate harness with p95<1200ms / p99<2000ms gate

(Commits a1bd69da and earlier are the iOS HTTP path documented in the prior PR description history.)

Testing

  • 807 of 807 specs run, 28 added by this absorption. All 28 new specs pass:
    • Unit / maestro-hierarchy / Android gRPC primary path — 28 specs across classifyGrpcFailure, runAndroidGrpcDump, cache reuse + per-instance isolation, closeGrpcClientCache, and dump({platform:'android'}) dispatch.
  • 27 pre-existing failures unchanged from master (Install Chromium spy setup × 21, runDoctorOnFailure × 5, IPv4/IPv6 AggregateError flake on api.test.js:655 — none related to this branch).
  • All 80+ existing maestro-hierarchy specs pass (Android adb path, iOS HTTP path, parity tests, two-slot drift envelope tests).

BrowserStack App Automate validation (2026-05-07)

End-to-end validation of the iOS HTTP path on real BS realmobile/mobile hosts (full plan + results in local docs/plans/2026-05-07-001-feat-bs-validation-maestro-ios-http-resolver-plan.md):

Platform BS build Percy build Snapshots Resolver outcome
Android 5439a79e (passed) #49501406 2 unavailable / multi-device-no-serial (resolved by mobile PR #13206 commit ddac377)
Android (post-mobile-fix) 8ed1a6c8 (passed) #49502511 2 dump-error / fallback-dump-exit-137 (the gRPC-contention root cause this PR's gRPC-direct path fixes)
iOS 41ebf750 (passed) #49501844 2 dump-error / http-non-json-content-typeiOS HTTP primary path was exercised in production; schema-drift envelope correctly classified the response.

The validation surfaced the BS-side env-injection gaps (mobile and realmobile didn't pass ANDROID_SERIAL / PERCY_IOS_DRIVER_HOST_PORT to the Percy CLI process). Companion commits landed on the BS-side PRs:

Post-Deploy Monitoring & Validation

  • What to monitor/search
    • Logs (Percy CLI per-session): [percy] iOS HTTP schema-drift: ... / [percy] gRPC viewHierarchy schema-class failure (...) — proto drift signals.
    • Metrics: /percy/healthcheck maestroHierarchyDrift envelope. Alert when .android or .ios slot populates.
  • Validation checks (queries/commands)
    • curl http://127.0.0.1:<cli_port>/percy/healthcheck mid-session → expect maestroHierarchyDrift: { android: null, ios: null }.
    • Run the merge-gate harness against a real Android device after BS-side PERCY_ANDROID_GRPC_PORT injection lands: MAESTRO_ANDROID_TEST_DEVICE=<serial> PERCY_ANDROID_GRPC_PORT=<port> node packages/core/test/integration/maestro-hierarchy-concurrent.harness.js — gate is p95 < 1200ms AND p99 < 2000ms across 100 iterations.
  • Expected healthy signal(s)
    • [percy] dump took Nms via grpc (M nodes) (Android, env-set) or [percy] dump took Nms via maestro-http (iOS, env-set).
    • Both drift slots stay null.
  • Failure signal(s) / rollback trigger
    • Either drift slot populates → ops investigates Maestro version + considers proto refresh PR.
    • Sustained Android contention (p99 > 2000ms across N sessions) → file open-circuit backoff follow-up; meanwhile flip PERCY_MAESTRO_GRPC=0 for fast rollback.
  • Validation window & owner
    • Window: 7 days post-merge of this PR + the BS-side env-injection PRs.
    • Owner: percy-cli oncall.

Pending follow-ups (out of scope for this PR)

  • Companion browserstack/mobile PR injecting PERCY_ANDROID_GRPC_PORT (analog of realmobile commit 62a0f7e for iOS). Without this, the Android gRPC primary stays dormant in production.
  • Maestro version compatibility: iOS validation surfaced http-non-json-content-type from this realmobile deployment's Maestro version. Needs follow-up to determine missing content-type header vs endpoint shape vs version mismatch.
  • Open-circuit backoff if sustained contention shows p99 > 2000ms in production (deferred per plan; not needed unless the harness/production reveals it).
  • Close percy/cli#2210 and delete feat/grpc-element-region-resolver after this PR merges to master.

Compound Engineering v2.54.0
🤖 Generated with Claude Opus 4.7 (1M context, ultrathink) via Claude Code

Sriram567 and others added 25 commits March 26, 2026 21:01
- Add empty body guard (400 instead of TypeError)
- Add Busboy fileSize limit to reject oversized uploads during parsing
- Use Object.create(null) and field allowlist to prevent prototype pollution
- Add stream error handler on Readable source
- Use HTTP 413 for oversized files
Reject immediately on Busboy 'limit' event with 413 instead of
setting fileBuffer to null which produced 'Missing required file part'.
Accepts {name, sessionId} as JSON, finds the screenshot file on disk
at /tmp/<sessionId>_test_suite/logs/*/screenshots/<name>.png,
base64-encodes it, and processes as a standard comparison.

This enables real-time Percy uploads from Maestro flows where the
JS sandbox cannot access screenshot files directly.
… metadata

Accept statusBarHeight, navBarHeight, fullscreen from request instead of
hardcoding 0/false. Transform coordinate-based regions to CLI boundingBox
format. Add sync mode support via percy.syncMode() + handleSyncJob().
Forward thTestCaseExecutionId to comparison pipeline.

Element-based regions log a warning and are skipped — ADB uiautomator
resolution will be added as a follow-up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Platform-aware screenshot discovery:
- Accept platform field with strict whitelist (ios/android); 400 on unknown
- iOS glob: /tmp/{sessionId}/*_maestro_debug_*/{name}.png
- Android glob unchanged; backward compat with SDK v0.2.0 (no platform → Android)

Path-safety hardening:
- Tighten name/sessionId from blocklist to strict character-class allowlist
- fs.realpath canonicalization + session-root prefix check defeats symlink swap
- Handles macOS /tmp → /private/tmp symlink transparently

Pick most recently modified file when multiple match (iOS same-name-across-flows).
Introduces packages/core/src/adb-hierarchy.js with two plain exports:
dump() and firstMatch(nodes, selector). The resolver:

- Reads process.env.ANDROID_SERIAL; falls back to one adb devices probe
  (requires exactly one attached device to avoid wrong-device dumps under
  multi-session CLI concurrency). Never accepts serial from request input.
- Shells out via cross-spawn with a 2s hard timeout (mirrors the
  browser.js:256-297 spawn+cleanup pattern).
- Classifies results into one of three shapes — unavailable, dump-error,
  hierarchy — so the relay can distinguish environmental failures from
  transient dump failures.
- Streams primary via adb exec-out uiautomator dump /dev/tty; falls back
  to file-based dump + cat only on wrong-mechanism signals (exit≠0 or
  missing <?xml prefix). Terminal signals (oversize / parse-error) do not
  retry — prevents attack amplification on adversarial payloads.
- Slices the XML envelope to the first </hierarchy> (strips uiautomator's
  trailer line and defends against embedded adversarial XML blocks).
- Enforces a 5MB stdout cap before parse.
- Parses with fast-xml-parser configured for defense-in-depth
  (processEntities: false, allowBooleanAttributes: false).
- Exposes firstMatch with pre-order DFS + strictly-anchored bounds regex;
  zero-area nodes are non-matches, negative coordinates (clipped views)
  are allowed.

Adds fast-xml-parser ^4.4.1 as a new dependency of @percy/core.

27 unit tests cover the parser + selector logic and all classification
branches via a parameter-injected execAdb seam. No real ADB calls; no
filesystem or network access.
Wires the /percy/maestro-screenshot relay to the new adb-hierarchy
resolver. Replaces the existing element-region warn-and-skip stub with
actual resolution via ADB + uiautomator dump on Android.

Handler changes:
- Early 400 validation on region shape before file I/O or ADB work:
  whitelist selector keys (resource-id/text/content-desc/class), require
  exactly one selector key per region, string-typed value, length ≤512,
  total regions per request ≤50.
- Android element regions: lazy dump on first element region, memoize
  the result (including error classes) for the whole request. Pre-scan
  element-region count so the skip warning reports N regions accurately.
  Both unavailable and dump-error poison the rest of the request with
  one warning — bounds worst-case per-request ADB time to one 2s timeout
  regardless of element-region count (closes the timeout-accumulation
  DoS vector).
- iOS element regions: preserve existing warn-and-skip semantics. Not
  a 400. Avoids a breaking change for any iOS caller today.
- Coordinate regions: unchanged; still transform {top,bottom,left,right}
  to elementSelector.boundingBox.
- Miss on element resolution: per-element warning, region skipped,
  request still uploads.

First-ever /percy/maestro-screenshot handler tests cover input
validation (9 × 400 paths), coordinate-only flow regression, iOS
warn-and-skip behavior, end-to-end forwarding of testCase/labels/
thTestCaseExecutionId/tile-metadata/sync, and the missing-screenshot
404 path. ADB-integration paths (element resolution against a real
device) are covered by the adb-hierarchy unit tests and Unit 7 E2E
validation on BrowserStack Maestro.
E2E validation on BrowserStack Maestro against host 31.6.63.33 / Pixel 7 Pro
showed the primary exec-out path intermittently returning no-xml-envelope
and the file-dump fallback exiting 137 (SIGKILL of uiautomator on the
device). The kill is triggered by concurrent uiautomator/automation
activity on the device during a live Maestro session — not a device-wide
or permissions issue (manual dumps from the shell return 44KB XML fine).

A single 300ms-delayed retry of the fallback dump command recovers the
common case without masking genuine device unavailability. If the second
attempt also fails, we still fall through to the existing dump-error
classification.

Test: the adb-hierarchy spec adds a retry test where the first fallback
exec returns 137 and the second returns the fixture XML; resolver returns
hierarchy and fileDumpCalls == 2.
Strengthens the SIGKILL retry from a single 300ms attempt to three retries
at 500ms/1s/2s (3.5s total window). Exits early as soon as a dump succeeds.

Rationale: single short retry wasn't enough against persistent device
contention observed during BrowserStack Maestro sessions. The wider budget
catches transient uiautomator kills on less-contended devices while still
failing fast on genuinely unavailable devices. Captured limitation: when
Maestro holds uiautomator throughout a flow (its observed behavior on real
devices), no reasonable retry count recovers — the mechanism itself needs
to change (e.g., Maestro API integration or an accessibility-service
sidecar). That's a Phase 2 follow-up, not part of this patch.

Tests cover both the "succeeds on Nth retry" case and the "all retries
exhausted" case.
E2E on BrowserStack Maestro showed `adb exec-out uiautomator dump` is
fundamentally incompatible with live Maestro flows — Maestro holds the
uiautomator lock throughout a flow and competing dumps get SIGKILLed.
The `maestro --udid <serial> hierarchy` CLI command reuses Maestro's
existing gRPC connection to dev.mobile.maestro on the device and works
reliably during live sessions (verified by probing twice mid-flow —
both probes returned valid JSON while the flow was running).

Changes in packages/core/src/adb-hierarchy.js:

- Primary dump mechanism is now `maestro --udid <serial> hierarchy`.
- Parse the resulting JSON (slice from the first `{` to tolerate banner
  lines), flatten the tree into the existing node shape.
- Map `accessibilityText` → `content-desc` at flatten time so `firstMatch`
  still uses the SDK's selector vocabulary unchanged.
- Maestro CLI timeout: 15s (JVM cold start ~9s + headroom).
- Honor `MAESTRO_BIN` env var for alternate paths; default `maestro`
  on PATH.
- New `spawnWithTimeout` helper shared between maestro and adb code paths.
- Classification extended with maestro-specific reasons (`maestro-not-found`,
  `maestro-timeout`, `maestro-no-device`, `maestro-no-json`,
  `maestro-parse-error:*`, `maestro-spawn-error:*`, `maestro-exit-*`,
  `maestro-oversize`).

Fallback: when maestro returns anything other than `hierarchy`, fall
through to the existing `adb exec-out uiautomator dump` flow (including
SIGKILL retry/backoff and file-dump fallback). Useful when the maestro
binary isn't installed on the CLI host.

Cost: 9s JVM cold start per screenshot that uses element regions.
Acceptable today because the alternative is 100% skip. Phase 2.2 follow-up:
replace the CLI invocation with a direct gRPC client against device port
6790 (typical latency <100ms) — infrastructure already in place (adb
forwards tcp:8206 → 6790 per device on BrowserStack hosts).

Tests: 36 specs total. New `dump (maestro hierarchy primary)` describe
block adds 7 scenarios (happy path, content-desc mapping, ENOENT→adb
fallback, unavailable propagation when both fail, timeout → adb recovery,
banner prefix tolerance, no-json). Existing 29 tests now inject an
execMaestro stub that reports ENOENT so they exercise the adb fallback
path exactly as before.
New module png-dimensions.js serves both:
- existing /percy/comparison/upload signature check (api.js import)
- upcoming /percy/maestro-screenshot iOS path (scale factor via
  pngWidth / wda_window_logical_width + aspect-ratio landscape fallback)

Exports:
- PNG_MAGIC_BYTES (moved from api.js route-local scope)
- parsePngDimensions(buf) → {width, height} via IHDR hand-parse
  (24-byte prefix read, no new dependency)
- isPortrait / isLandscape with default threshold 1.25
  (iPad portrait ratio 1.334; margin empirically confirmable via A1 Probe 6)
- DEFAULT_ORIENTATION_THRESHOLD exported for override in tests / A1 Probe 6

Test-first: 17 specs covering happy path iPhone/iPad portrait+landscape,
dimensions > 65535, truncated buffer, bad signature, zero width/height,
non-Buffer input, threshold override, near-square ambiguity. All pass.

api.js: removes inline PNG_MAGIC_BYTES declaration from the upload route
handler; imports the shared constant. Upload signature-check behavior
unchanged.

Unit B1 of the iOS Maestro element-regions plan (v1.0); serves as the
Phase 1 CI coverage preflight per plan.
…-meta.json

Reader side of the realmobile ↔ Percy CLI contract v1.0.0 for iOS
element-region resolution on shared BS iOS hosts. Given a Maestro sessionId,
resolves /tmp/<sid>/wda-meta.json → { ok: true, port } or { ok: false, reason }
with TOCTOU-safe validation per contract §8:

File-level (SEI CERT POS35-C ordering, no lstat prefix):
- openSync(path, O_RDONLY | O_NOFOLLOW | O_NONBLOCK) — atomic symlink refusal;
  ELOOP → 'symlink', ENOENT → 'missing', else → 'read-error'
- fstatSync on the opened fd — authoritative mode/uid/nlink check:
  - st.mode mismatch 0o100600 → 'wrong-mode'
  - st.uid mismatch getuid() → 'wrong-owner'
  - st.nlink != 1 → 'multi-link' (hardlink attack per Apple Secure Coding
    Guide, CVE-2005-2519 class)
  - !st.isFile() → 'not-regular-file'

Content validation (contract §2):
- JSON.parse → 'malformed-json'
- schema_version semver-major != 1 → 'schema-version-unsupported'
  (accepts 1.x minor bumps; unknown fields ignored for forward-compat)
- wdaPort out of 8400-8410 integer range → 'out-of-range-port'
- sessionId mismatch vs request → 'session-mismatch'
- flowStartTimestamp < getStartupTimestamp() - 5min → 'stale-timestamp'

Input guardrails:
- sessionId regex [A-Za-z0-9_-]{16,64} + null-byte/slash rejection →
  'invalid-session-id' (path-traversal defense before any fs touch)

Log scrubbing (contract §5):
- All failure paths emit only the reason tag via logger.debug()
- No selector values, sessionIds, port numbers, paths, or uids in logs
- Verified by a cross-scenario scrub-assertion test

DI: { getuid, getStartupTimestamp } injected for deterministic tests.

22 specs pass. Tests use real fs tmpdirs (bypass memfs) because the module
relies on POSIX O_NOFOLLOW / hardlink semantics memfs doesn't implement.
… (B3)

Core iOS element-region resolver for /percy/maestro-screenshot. Single
GET /session/:sid/source per screenshot, parsed locally via fast-xml-parser,
mirrors the Android adb-hierarchy.js architecture.

Exports:
- resolveIosRegions({regions, sessionId, pngWidth, pngHeight, isPortrait, deps})
  → {resolvedRegions: [{elementSelector, boundingBox, algorithm}], warnings: []}
- shutdown() — aborts all in-flight WDA HTTP AbortControllers (wired to
  percy.stop() by B4)
- XCUI_ALLOWLIST — exported Set of ~80 XCUIElement.ElementType values from
  the Xcode 16 SDK (Apple XCUIElement.ElementType docs); serves as DoS
  guardrail per WDA issue #292

Resolution path (A1-chosen):
1. Landscape gate (isPortrait arg)
2. Kill-switch gate (process.env.PERCY_DISABLE_IOS_ELEMENT_REGIONS from
   startup env only; NOT tenant-forwarded via appPercy.env)
3. readWdaMeta dep returns port from realmobile-written wda-meta.json; port
   validated in 8400-8410 range
4. GET /wda/screen (loopback-only) → scale from integer `scale` field;
   fallback to width-ratio (pngWidth / logical_w) snapped to {2, 3};
   fail-closed on raw ratio outside [1.9, 3.1]; LRU cache cap 64 per-session
5. GET /session/:sid/source (loopback-only):
   - 20 MB response cap enforced BEFORE parse
   - Pre-parse DOCTYPE/ENTITY regex rejection (primary XXE defense)
   - fast-xml-parser with processEntities:false (defense-in-depth)
   - Cached per screenshot; all regions reuse single fetch
6. Per region:
   - Only `id` and `class` accepted in V1; `text`/`xpath` → selector-key-not-in-v1
   - class short-form (Button) normalized to long-form (XCUIElementTypeButton);
     rejected if normalized form not in allowlist → class-not-allowlisted
   - selector > 256 chars → selector-too-long
   - tree pre-order first match (zero-match on no-match)
   - scale points → pixels, validate in-bounds + non-trivial area (≥4×4) →
     bbox-out-of-bounds / bbox-too-small
   - outbound elementSelector.class uses normalized long-form (canonical form
     on Percy dashboard regardless of customer input style)

HTTP: @percy/client/utils#request via injectable httpClient dep; 500 ms
AbortController timeout per call; retries: 0 to keep timeout honest.
inflight Set tracks active controllers; shutdown() aborts all.

Log scrubbing (contract §5): reason tags only. Verified across all paths —
no selector values, sessionIds, ports, or coords in logs.

23 specs pass. Tests use an injectable fake httpClient + in-memory
handlers; no real network required.
…lay (B4)

Wires B1/B2/B3 into api.js's /percy/maestro-screenshot handler. For iOS
requests with element regions:

1. Parse IHDR from the already-read fileContent (one buffer read total —
   no extra fs hit). Failure → warn-skip all iOS element regions with
   png-unparseable; coord regions + screenshot upload continue.
2. Call resolveIosRegions() once per request with a real @percy/client/utils
   #request httpClient and a resolveWdaSession-wrapped readWdaMeta dep.
3. Surface each warning to percy.log.warn so support runbook tags are
   visible in Maestro stdout.
4. Walk the original regions array in input order; positional index into
   the sparse resolvedRegions produced by the resolver keeps coord and
   element regions interleaved correctly in the outbound Percy payload.

wda-hierarchy now returns a SPARSE array (one entry per input element
region; null = skipped) instead of a dense array. Preserves input ordering
when element and coord regions are interleaved. All B3 unit tests updated
accordingly (22 still pass).

percy.js stop() invokes wdaHierarchyShutdown() before server.close() to
abort in-flight WDA HTTP calls — http.request has no SIGKILL analog, so
a slow /source fetch could otherwise keep the event loop alive past
graceful-shutdown timeout.

api.test.js: replaced the pre-V1 iOS stub test (which asserted
"Element-based region selectors are not yet supported on iOS") with a V1
behavioral test that exercises the full iOS element-region pipeline with
a real PNG IHDR header fixture (1170×2532 iPhone 14) and asserts V1
warn-skip semantics for an Android-style `resource-id` selector on iOS
(not-in-V1 key).

Test suite baseline: 28 pre-existing failures (chromium/doctor download
tests unrelated to this change). After B4: 27 failures — same chromium/
doctor failures, plus the iOS stub test now passes with its updated V1
assertions. Zero iOS/wda-hierarchy/maestro-screenshot regressions.

Kill-switch (PERCY_DISABLE_IOS_ELEMENT_REGIONS=1) read from Percy CLI
process startup env inside wda-hierarchy.js per plan — host-level only,
NOT forwarded from tenant appPercy.env (A0.3 property: pending staging
verification).
…retries

Implements the three layered fixes documented in
percy-maestro/docs/solutions/integration-issues/ios-wda-session-id-and-node14-abortcontroller-2026-04-23.md.
Each addresses a distinct iOS-region failure mode that surfaced during
2026-04-23 BrowserStack live validation on host 52:

Fix C — Node 14 AbortController feature-detect (callWda):
  BS iOS hosts pin to Node 14.17.3 (Nix). AbortController became a global
  in Node 15. Without feature detection, the timeout path threw
  ReferenceError caught by generic error handling and surfaced as the
  same 'wda-error' tag as legitimate WDA failures, masking the other two
  fixes during diagnosis. Now: typeof globalThis.AbortController guard +
  Promise.race fallback. Adds diagnostic logging on /wda/screen failures
  showing err.name/message/code/status/aborted/body.

Fix B — Stale WDA sessionId retry via error-envelope extraction:
  WDA's session-scoped routes (/session/:sid/source) reject any sid that
  isn't the currently-active session. Maestro spawns its own WDA session
  per xctest run, so realmobile's write-time sid capture goes stale
  during the test. Refactored fetchAndParseSource into tryFetchSource +
  retry coordinator. On staleSession (`{ value: { error: 'invalid session
  id' } }`), extracts the top-level `sessionId` from the error envelope
  (authoritative for "currently active") and retries once. Falls back to
  /status probe if the error body lacks a usable sid.

Fix A (reader side) — wdaSessionId surfacing per contract v1.1.0:
  realmobile contract v1.1.0+ probes /status at write_wda_meta time and
  surfaces the WDA UUID under wda-meta.json's optional `wdaSessionId`
  field. wda-session-resolver now validates the field against
  /^[A-Fa-f0-9-]{16,64}$/ (generous bounds for cross-version tolerance)
  and surfaces it on the {ok, port, wdaSessionId?} return shape. v1.0.0
  writers that omit the field cause callers to fall back to SDK
  sessionId (the fast path 404s, then Fix B's retry recovers).

Tests cover all three paths: feature-detected timeout, staleSession
retry from error envelope, /status fallback when error body lacks sid,
v1.1.0 wdaSessionId pass-through, v1.0.0 absence handling, malformed
wdaSessionId rejection.

Note for downstream: this WDA-direct path is gated for deletion by the
2026-04-27 plan (percy-maestro/docs/plans/2026-04-27-001-feat-ios-element-regions-maestro-hierarchy-plan.md)
once Phase 0.5 empirical probe passes. Until then, this is the
production iOS resolver path.
The Android view-hierarchy resolver is becoming the cross-platform Maestro
resolver (per percy-maestro/docs/plans/2026-04-27-001-feat-ios-element-regions-maestro-hierarchy-plan.md
Unit 1). Rename + shim is purely additive — no behavior change.

- Move src/adb-hierarchy.js → src/maestro-hierarchy.js (git mv preserves history).
- Move test/unit/adb-hierarchy.test.js → test/unit/maestro-hierarchy.test.js.
- Move test/fixtures/adb-hierarchy/ → test/fixtures/maestro-hierarchy/.
- Replace src/adb-hierarchy.js with a 5-line re-export shim. Removed in V1.1
  per the plan's deprecation guidance.
- Update api.js import to ./maestro-hierarchy.js.
- Update logger namespace from core:adb-hierarchy → core:maestro-hierarchy.
- Update file header to reflect cross-platform intent (the file body has been
  maestro-first for some time; the previous file name was always misleading).
- Update test describe block + import + fixture path.

Behavior unchanged. Subsequent units in Phase 1 will add the iOS branch and
api.js dispatch logic; this commit is just the rename so the diffs in those
units stay focused.
…parity

Phase 1 Unit 2a per percy-maestro/docs/plans/2026-04-27-001-feat-ios-element-regions-maestro-hierarchy-plan.md.
Lands the platform-dispatch scaffolding and the cross-platform selector
vocabulary alias. Real iOS resolver implementation deferred to Unit 2b
post Phase 0.5 fixture capture (FIXME-PHASE-0.5 in code).

Platform dispatch:
- dump({ platform }) accepts 'android' (default — backwards compatible) or
  'ios'. iOS branch reads PERCY_IOS_DEVICE_UDID + PERCY_IOS_DRIVER_HOST_PORT
  from env (realmobile-injected per Unit 10a; the wda_port + 2700 formula
  is realmobile-owned per maestro_session.rb:831). Warn-skip with
  reason='env-missing' if either var is unset. Otherwise calls
  runMaestroIosDump which currently returns
  { kind: 'unavailable', reason: 'not-implemented' } as the FIXME-PHASE-0.5
  stub.
- iOS path never invokes adb (verified by test).

R1 vocabulary parity (Android `id` alias):
- flattenMaestroNodes (Android branch) now surfaces resource-id under both
  `resource-id` AND `id` canonical keys on each node. Customer selectors
  `{element: {id: "submit-btn"}}` and `{element: {resource-id: "submit-btn"}}`
  resolve the same node. iOS users writing `{id: ...}` and Android users
  writing the same yaml hit the same code path. Full unified-key migration
  (deprecating `resource-id`) deferred to V1.1.
- SELECTOR_KEYS_UNION = [resource-id, text, content-desc, class, id]
  drives firstMatch validation. ANDROID_SELECTOR_KEYS_WHITELIST and
  IOS_SELECTOR_KEYS_WHITELIST exported separately for callers that want
  per-platform validation.

Tests added:
- Android `id` alias resolves same bbox as `resource-id` (3 tests).
- iOS env-missing path (3 tests covering each env-var combination).
- iOS env-set returns 'not-implemented' (FIXME stub).
- iOS dispatch never invokes adb.
- Default (no platform arg) preserves Android behavior.

Smoke-tested via direct node import; full @percy/core test suite has 27
pre-existing failures in Unit / Install in executable Chromium (unrelated
infrastructure issue), but no regressions in the resolver tests.
Phase 1 Unit 3 per percy-maestro/docs/plans/2026-04-27-001-feat-ios-element-regions-maestro-hierarchy-plan.md.

Wires the maestro-hierarchy resolver into the /percy/maestro-screenshot
relay's iOS element-region dispatch, gated by an env switch so default
(unset) behavior is unchanged. Phase 0.5 empirical probe gates the
default flip to the new path; Phase 4 deletes the legacy iOS branch.

- New: read PERCY_IOS_RESOLVER from process.env. When equal to
  'maestro-hierarchy', iOS element regions flow through the same
  lazy maestroDump({ platform: 'ios' }) + per-region firstMatch
  pattern Android already uses. When unset (or any other value),
  legacy WDA-direct path remains active — no behavior change for
  customers in production today.
- Refactor: the up-front PNG-parse + resolveIosRegions block now only
  fires when the env switch is OFF. With the switch on, that work is
  unnecessary (the resolver is engineered to be lazy + per-region).
- The cross-platform branch in the per-region loop now also covers iOS
  when the switch is on. Same shape as Android: cachedDump lazy memo,
  warn-skip on hierarchy-unavailable, firstMatch + bbox forward on
  success.

Today (env switch unset): only the cross-platform Android path is
exercised. The iOS branch with the switch on is exercised by the
maestro-hierarchy unit tests landed in Unit 2a (which covers the
'env-missing' and 'not-implemented' stub paths). Unit 4 adds the
parity test that exercises both platforms via the same handler.

A real production rollout flips the default to 'maestro-hierarchy'
in Phase 4 (Unit 9) after Phase 0.5 PASSes; until then, keep the
default off.
Phase 1 Unit 4 per percy-maestro/docs/plans/2026-04-27-001-feat-ios-element-regions-maestro-hierarchy-plan.md.

New test file: test/unit/maestro-hierarchy.parity.test.js. Locks in the
contract that both platform branches return the same { kind, ... } envelope,
that the public API surface (SELECTOR_KEYS_WHITELIST + per-platform
whitelists) is consistent, and that platform dispatch isolates the env-var
reads (Android never reads PERCY_IOS_*; iOS never reads ANDROID_SERIAL).

Bug fix discovered during smoke test:
- flattenNodes (the XML/uiautomator code path) was missing the R1 `id`
  alias surface that flattenMaestroNodes (the maestro CLI JSON path)
  already had. So `firstMatch(nodes, { id: 'X' })` worked when nodes came
  from the maestro path but returned null when nodes came from the adb
  fallback path. Now both code paths surface resource-id under both
  `resource-id` and `id` keys consistently.

iOS-side parity assertions in this test are scoped to what Unit 2a's stub
can actually cover — envelope shape, whitelist exports, dispatch isolation.
The Phase 4 follow-up (post Phase 0.5 + Unit 2b) extends this file with
real iOS attribute-mapping assertions backed by a captured iOS hierarchy
fixture.

Smoke-tested via direct node import. The full @percy/core test suite has
27 pre-existing Chromium-installer failures unrelated to this work.
…source

Source-synthesized fixtures for the new HTTP-XCTest path Unit 2 will build,
plus the maestro CLI iOS stdout shape for the fallback path. All shapes
verified against mobile-dev-inc/Maestro at ref=cli-2.0.7 (realmobile production
default per /usr/local/.browserstack/realmobile/config/constants.yml).

Notable findings recorded in capture-notes.md:
- PR #2365 has landed: server detects AUT itself; appIds is wire-vestigial.
  Percy CLI can send {"appIds": [], "excludeKeyboardElements": false}.
  YAML-based bundleId discovery is no longer required for the realmobile
  fast path.
- PR #2402 has landed but with a different wrap from cli-1.39.13: response
  is now {axElement: {children: [appHierarchy, statusBarsContainer]}, depth}
  rather than [springboard, AUT]. The deepening pass's parser rule
  ('first elementType == 1 whose identifier != com.apple.springboard')
  remains correct because the statusBars wrapper has elementType == 0.
- iOS Maestro's TreeNode does NOT carry a 'class' attribute. iOS selector
  vocabulary is 'id' only (maps to attributes['resource-id']). The
  originally absorbed Unit 2b XCUI elementType integer-to-name table is
  not needed for selector matching.

Wire-bytes confidence boost is deferred to Unit 5/6/7 BS validation rather
than blocking on a Unit-1 BS session capture (see plan Viability Gate 2).
…(Unit 2)

Adds runIosHttpDump as the iOS primary path for /percy/maestro-screenshot
element regions: POST {appIds: [], excludeKeyboardElements: false} to Maestro's
iOS XCTestRunner /viewHierarchy endpoint at http://127.0.0.1:wda+2700. Server
detects AUT itself at cli-2.0.7+ (PR #2365 landed); empty appIds returns the
foreground AUT directly.

Replaces the iOS-WIP runMaestroIosDump stub with a real maestro-CLI shell-out
parser (the connection-class fallback path). Maestro's iOS CLI stdout is its
normalized TreeNode shape; existing flattenMaestroNodes consumes it without
iOS-specific code.

Adds flattenIosAxElement adapter for the HTTP path's raw AXElement shape:
walks to first elementType==1 with identifier!='com.apple.springboard' (skips
SpringBoard sibling on cli-1.39.13 wrap; works for both v1.39.13 [springboard,AUT]
and post-PR-2402 single-AUT-root shapes). Frame keys converted from PascalCase
{X,Y,Width,Height} to bracket-format bounds string.

Narrows IOS_SELECTOR_KEYS_WHITELIST from ['id', 'class'] to ['id'].
IOSDriver.mapViewHierarchy at cli-2.0.7 does not populate 'class' on iOS
TreeNode (only 'resource-id' from identifier), so Percy keeps iOS selector
vocabulary aligned with Maestro's actual capability.

Schema-class failures (missing root, missing frame, malformed JSON, 4xx,
non-JSON content-type) return dump-error without falling back. Connection-class
failures (ECONNREFUSED, ETIMEDOUT, ECONNRESET, 5xx) and no-aut-tree responses
(SpringBoard-only) fall back to the maestro-CLI path.

Two-tier deadline mirrors PR #2210's pattern: 1500ms healthy + 5000ms circuit-
breaker. Out-of-range PERCY_IOS_DRIVER_HOST_PORT (outside 11100-11110) skips
the HTTP path entirely. Loopback-only URL guard.

Drift-bit handling deferred to plan Unit 4 (cross-PR coordination with #2210);
schema-class failures currently log-only.

77 of 77 specs pass: 26 new iOS-path scenarios (HTTP primary, CLI fallback,
env handling, parity), and all existing Android tests unchanged.

Plan: percy-maestro/docs/plans/2026-05-06-004-feat-cross-platform-maestro-resolver-unification-plan.md
Fixtures: 65e54b9 (Unit 1)
…shot (Unit 3a)

Adds a three-tier cascade for iOS element-region resolver selection:
1. Per-snapshot override: `request.body.resolver` (validated against
   ['wda-direct', 'maestro-hierarchy']; HTTP 400 on unknown values).
2. Process env: `PERCY_IOS_RESOLVER` (same allowlist; unknown values
   warn + fall through).
3. Default: 'wda-direct' (Unit 3a is opt-in only — Unit 3b's env-conditional
   flip is a separate follow-up PR after the validation window).

The per-snapshot `resolver` body field is the ops escape valve documented
in the plan: lets operators `curl` a single snapshot with a specific
resolver for diagnostics without redeploying the CLI. SDK does not set
this today (R8 unchanged).

When the cascade chooses 'maestro-hierarchy', api.js calls the unified
`maestroDump({platform: 'ios', sessionId})` from Unit 2 (HTTP primary
+ CLI fallback). When 'wda-direct', the legacy `resolveIosRegions`
(WDA source-dump) path runs unchanged.

Threads sessionId from the relay request through to maestroDump for
log-scrubbed correlation tagging (Unit 2's `runIosHttpDump` uses sid
prefix in debug logs).

Tests: 6 new Unit 3a scenarios in api.test.js — body.resolver validation,
default-unchanged behavior, env-only path, per-snapshot override (both
directions), graceful fallback on garbage env values. All 6 pass; existing
tests unchanged.

Plan: percy-maestro/docs/plans/2026-05-06-004-feat-cross-platform-maestro-resolver-unification-plan.md
…heck (Unit 4)

Adds module-level maestroHierarchyDrift state with {android, ios} slots that
record the first schema-class failure per platform, plus the setter/getter
exported for cross-cutting wiring. Both slots are null in steady state.

Wires Unit 2's iOS HTTP schema-class failures (missing axElement root,
missing frame, malformed JSON, 4xx, non-JSON content-type) to call
setMaestroHierarchyDrift({platform: 'ios', ...}). Connection-class failures
and no-aut-tree responses do NOT flip the bit (only schema-class — those
are the genuine 'Maestro upstream wire-format drifted' signals that need
ops attention).

Extends /percy/healthcheck to always emit:
  maestroHierarchyDrift: { android: ... | null, ios: ... | null }

The android slot is unwritten in this branch — PR #2210's gRPC drift
surface (recordSchemaDrift) sits on a sibling branch. When #2210 merges
and this PR rebases, #2210's Android schema-class call sites retrofit to
use the setter exported here. Companion artifact:
percy-maestro/docs/plans/2026-05-06-004-pr2210-coordination-comment.md.

Tests: 6 new Unit 4 scenarios in maestro-hierarchy.test.js — initial state,
iOS schema-class flips ios slot only, first-seen-per-platform wins,
connection-class doesn't flip, SpringBoard-only doesn't flip, reset helper.
Plus existing /healthcheck test updated to include the new field. Per-platform
slot independence is the central invariant — locks in the design rationale
that simultaneous-drift signal on both platforms is preserved (the
single-field-with-discriminator design rejected during document-review would
have lost that).

Plan: percy-maestro/docs/plans/2026-05-06-004-feat-cross-platform-maestro-resolver-unification-plan.md
…5/6/7)

Three env-gated harnesses + supporting fixtures, all skipped in CI and run
manually during BS validation. Paste-output-into-PR pattern matches the
gRPC harness shape from the originally-planned PR #2210.

Unit 7 — maestro-hierarchy-ios-http-concurrent.harness.js (V4.2):
  Concurrent-access regression. While a real Maestro flow holds the iOS
  device active via extendedWaitUntil + impossible-selector polling
  (fixtures/pause-30s-flow-ios.yaml), the harness calls runIosHttpDump
  N=100 times and records p50/p95/p99 timings. KTD threshold check warns
  when p95 is within 10% of IOS_HTTP_HEALTHY_DEADLINE_MS (1500ms) so the
  deadline can be bumped before Unit 3b's flip.
  Env: MAESTRO_IOS_TEST_DEVICE, PERCY_IOS_DRIVER_HOST_PORT.

Unit 6 — maestro-ios-hierarchy-regression.harness.js (V3):
  WDA failure-class regression. Runs ios-aut-crash-regions.yaml twice:
  once with PERCY_IOS_RESOLVER=wda-direct (legacy WDA path — element
  regions silently skip when AUT bundleId isn't running, the production
  failure mode), and once with =maestro-hierarchy (HTTP path — regions
  resolve via Maestro's runner which walks system UI without bundleId
  binding). Output is logged for human verification of the two Percy
  build URLs.
  Env: MAESTRO_IOS_TEST_DEVICE, PERCY_SERVER, PERCY_IOS_DRIVER_HOST_PORT.

Unit 5 — cross-platform-parity.harness.js (V2):
  R6 cross-platform parity check. Runs parity-flow-android.yaml +
  parity-flow-ios.yaml against their respective devices, both resolving
  {id: 'submitBtn'} through Percy's relay. V1 is log-only — manual eyeball
  of the side-by-side Percy snapshots — because DPI normalization between
  Android pixels and iOS logical points is non-trivial without a documented
  example-app dimension table. V1.1 can tighten to programmatic ±2px
  assertion later.
  Env: MAESTRO_PARITY_DEVICES (format: <android-serial>:<ios-udid>),
       PERCY_SERVER, PERCY_IOS_DRIVER_HOST_PORT.

Plan: percy-maestro/docs/plans/2026-05-06-004-feat-cross-platform-maestro-resolver-unification-plan.md
@Sriram567 Sriram567 requested a review from a team as a code owner May 7, 2026 09:54
Comment thread packages/core/src/api.js
let entries;
try { entries = await fs.promises.readdir(dir, { withFileTypes: true }); } catch { return; }
for (let entry of entries) {
let full = path.join(dir, entry.name);
Comment thread packages/core/src/api.js
let entries;
try { entries = await fs.promises.readdir(dir, { withFileTypes: true }); } catch { return; }
for (let entry of entries) {
let full = path.join(dir, entry.name);
Comment thread packages/core/src/api.js
let baseDir = `/tmp/${sessionId}_test_suite/logs`;
let logDirs = await fs.promises.readdir(baseDir);
for (let dir of logDirs) {
let screenshotPath = path.join(baseDir, dir, 'screenshots', `${name}.png`);
Comment thread packages/core/src/api.js
let baseDir = `/tmp/${sessionId}_test_suite/logs`;
let logDirs = await fs.promises.readdir(baseDir);
for (let dir of logDirs) {
let screenshotPath = path.join(baseDir, dir, 'screenshots', `${name}.png`);
Comment thread packages/core/src/api.js
let baseDir = `/tmp/${sessionId}_test_suite/logs`;
let logDirs = await fs.promises.readdir(baseDir);
for (let dir of logDirs) {
let screenshotPath = path.join(baseDir, dir, 'screenshots', `${name}.png`);

describe('iOS HTTP dump (runIosHttpDump primary path)', () => {
const iosFixtureDir = path.resolve(url.fileURLToPath(import.meta.url), '../../fixtures/maestro-ios-hierarchy');
const loadIosFixture = name => fs.readFileSync(path.join(iosFixtureDir, name), 'utf8');

describe('iOS maestro-CLI fallback (runMaestroIosDump replacement)', () => {
const iosFixtureDir = path.resolve(url.fileURLToPath(import.meta.url), '../../fixtures/maestro-ios-hierarchy');
const loadIosFixture = name => fs.readFileSync(path.join(iosFixtureDir, name), 'utf8');
Comment thread packages/core/test/unit/wda-session-resolver.test.js Fixed
Comment thread packages/core/test/unit/wda-session-resolver.test.js Fixed
Comment thread packages/core/test/unit/wda-session-resolver.test.js Fixed
…de (Units 3b + 8 consolidated)

Removes the legacy iOS WDA-direct resolver and the now-dead resolver-choice
machinery that selected between WDA and the new HTTP/CLI path. The unified
maestro-hierarchy resolver becomes the only iOS path; element regions resolve
via runIosHttpDump → maestro-CLI shell-out fallback.

Deleted (8 files, ~1700 lines net):
- packages/core/src/wda-hierarchy.js (legacy WDA /source resolver)
- packages/core/src/wda-session-resolver.js (TOCTOU-safe wda-meta.json reader,
  consumed only by wda-hierarchy)
- packages/core/src/png-dimensions.js (PNG IHDR parser, used only by
  wda-hierarchy for scale-factor derivation)
- packages/core/test/unit/wda-hierarchy.test.js
- packages/core/test/unit/wda-session-resolver.test.js
- packages/core/test/unit/png-dimensions.test.js
- packages/core/test/integration/maestro-ios-hierarchy-regression.harness.js
  (Unit 6 — was a wda-direct vs maestro-hierarchy comparator; meaningless
  with wda-direct gone)
- packages/core/test/integration/fixtures/ios-aut-crash-regions.yaml
  (paired with the regression harness)

Stripped from api.js:
- import resolveIosRegions / resolveWdaSession / parsePngDimensions
- body.resolver field validation (single path → meaningless)
- PERCY_IOS_RESOLVER env handling (single path → meaningless)
- iOS WDA-direct branch + iosResult / iosIndex bookkeeping
- The resolver-choice cascade comment block

Stripped from percy.js: wdaHierarchyShutdown import + shutdown call. The
maestro-hierarchy HTTP path uses a stateless http.Agent that closes when
the process exits; no explicit shutdown needed.

Stripped from api.test.js: the 6 Unit-3a resolver-cascade tests (validated
behavior that no longer exists). The 'iOS element region with Android-style
selector key' test consolidated into a single combined test that exercises
the unified iOS path with a mix of element + coord regions.

Plan implications (Plan: percy-maestro/docs/plans/2026-05-06-004-...):
- Unit 3a's per-snapshot resolver override and PERCY_IOS_RESOLVER env: REMOVED.
- Unit 3b's telemetry-gated default flip: CONSOLIDATED. The default IS now
  maestro-hierarchy because there is no other path; no flip pending.
- Unit 8's wda-hierarchy.js retirement: SHIPPED HERE rather than ≥1 week
  post-Unit-3b.

Regression risk acknowledged (the P0 from document review): self-hosted
iOS Percy customers without realmobile-injected PERCY_IOS_DRIVER_HOST_PORT
AND without a working maestro CLI installed lose element-region support
on this code path. Their previously-working WDA-direct happy path is
gone. Coord regions still work; element regions skip gracefully with a
'[percy] Element-region resolver unavailable' warn. Customers in that
situation should switch to coord regions or wait for a future Android-style
gRPC-direct path. Customer-side rollback: pin to a CLI version before this
PR.

Test status: 148 of 148 specs run; 6 pre-existing failures unchanged
(Jest .toHaveProperty matcher in Jasmine context; AggregateError vs
ECONNREFUSED network-stack flake — all unrelated to this work).
Sriram567 added 6 commits May 7, 2026 15:51
…+ stale comments

Final cleanup pass:

- Delete packages/core/src/adb-hierarchy.js — deprecated compat shim that
  re-exports from maestro-hierarchy.js. Comment said 'removed in V1.1'; no
  internal code imports from it. External consumers should import from
  maestro-hierarchy.js directly.

- Rename adbDump/adbFirstMatch import aliases to maestroDump/maestroFirstMatch
  in api.js. The old names predate the cross-platform rename and were
  misleading (the function dispatches to platform-appropriate transports,
  not just adb).

- Drop unused 'percyRequest' import in api.js. It was only consumed by the
  resolveIosRegions call site, which the WDA retirement commit removed.

- Update stale comments in maestro-hierarchy.js that referenced 'Unit 2a/2b'
  scaffolding terminology and the 'Phase 0.5 stub' that's no longer present.
…26-05-07-002 Unit 1)

Adds @grpc/grpc-js@^1.14.3 and @grpc/proto-loader@^0.8.0 as @percy/core
dependencies, plus the vendored maestro_android.proto from upstream
mobile-dev-inc/Maestro at SHA bc8bde1b (cli-2.5.1, 2025-05-26). Both deps
declare engines.node floors well below @percy/cli's >=14 (grpc-js >=12.10.0,
proto-loader >=6) — no min-Node bump required.

Only MaestroDriver/viewHierarchy(ViewHierarchyRequest) returns
(ViewHierarchyResponse) and the `string hierarchy = 1` field on the response
are consumed at runtime; the rest of the proto is preserved verbatim so
future updates can be a clean upstream re-copy without surgical edits.

The proto/ dir is preserved into dist/ via Babel CLI's existing
copyFiles: true setting (no build-system change needed).

Refs:
- 2026-05-07-002 plan Unit 1 + D6 + D12
- Replaces in-flight PR #2210 (will be closed once this PR merges)
…omy (Units 2/3/7)

Adds runAndroidGrpcDump as the Android primary path in maestro-hierarchy.js's
dump({platform:'android'}) dispatch, talking the same gRPC transport Maestro
CLI uses but as a stateless RPC that doesn't open a parallel flow context.
Avoids the session-collision failure mode the maestro CLI shell-out hits
during a live Maestro flow (root cause of the 2026-05-07 BS validation
fallback-dump-exit-137 result).

Three-class error taxonomy (D10) — splits PR #2210's two-class scheme:
  - schema-class (INVALID_ARGUMENT, FAILED_PRECONDITION, OUT_OF_RANGE,
    UNIMPLEMENTED, DATA_LOSS, decoder-failure) → drift bit, no fallback,
    cache PRESERVED
  - channel-broken (UNAVAILABLE, INTERNAL, CANCELLED, unmapped codes) →
    fallback chain runs, cache EVICTED (channel actually broke)
  - contention-class (DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED) →
    skip CLI fallback (would queue behind same flow), straight to adb,
    cache PRESERVED (timeout = backpressure, not channel-breakage; reconnect
    would waste TCP+HTTP/2+TLS for nothing)

Symmetric timeout architecture (D11): GRPC_HEALTHY_DEADLINE_MS=1500 +
GRPC_CIRCUIT_BREAKER_MS=5000 — parity with iOS HTTP path. Outer Promise.race
is defense-in-depth against grpc-node#2620-style channel sticking (closed
in 1.9.11 but cheap insurance).

Dispatch (D3 + D9):
  PERCY_MAESTRO_GRPC=0 kill switch  → skip gRPC entirely (in-process rollback)
  PERCY_ANDROID_GRPC_PORT set+valid → runAndroidGrpcDump → branch by class
  env absent / invalid              → maestro CLI primary → adb (current behavior)

R-7 (shutdown race): runAndroidGrpcDump accepts shutdownInProgress flag
sourced from grpcClientCache.shutdownInProgress (set by percy.stop()
before close). CANCELLED-during-shutdown returns {kind:'unavailable',
reason:'shutdown'} — no fallback chain on a tearing-down process.

Drift recording uses this branch's existing two-slot
setMaestroHierarchyDrift({platform:'android', code, reason}) — drops
PR #2210's separate single-slot recordSchemaDrift.

Cleanup: removes the stale "Single-author note about PR #2210" comment
and a stale grpc-node#2620 framing reference (issue is closed).

Refs:
- 2026-05-07-002 plan Units 2, 3, 7 + D1, D3, D9, D10, D11
- Replaces PR #2210's runGrpcDump
Constructs grpcClientCache as a per-Percy-instance Map in the constructor
(matching the established ownership pattern for browser, server, queues,
client, monitoring). Disposes via closeGrpcClientCache(this.grpcClientCache)
in stop()'s teardown block.

Module-global state would leak channels between concurrent Percy instances
in a single process (programmatic-API users, test harnesses) and create
shutdown races where one instance's stop() invalidates another's pending
RPCs. Per-instance ownership matches every other long-lived resource on
Percy.

Asymmetry with maestroHierarchyDrift (deliberate, documented in
maestro-hierarchy.js header): drift envelope stays module-scoped because
drift is observability state — surfaced process-wide on /percy/healthcheck.
Channels are transport state — per-instance lifecycle.

R-7 (shutdown race): sets cache.shutdownInProgress = true BEFORE closing
channels so any in-flight runAndroidGrpcDump() that hits CANCELLED returns
{kind:'unavailable', reason:'shutdown'} instead of triggering the maestro
CLI + adb fallback chain on a tearing-down process.

api.js relay handler now threads percy.grpcClientCache through to
maestroDump() so the Android gRPC primary can reuse channels across
snapshots in the same session.

Mitigates grpc-node#2964 (open) — ChannelzTrace memory leak when Client
is not explicitly .close()-d.

Refs:
- 2026-05-07-002 plan Unit 5 + D9
- R-7 (shutdown race), R-3 (cache scope) resolved by this commit
… dispatch (Units 2/3/4/5)

28 new specs covering the absorbed gRPC primary path:

classifyGrpcFailure (D10 three classes):
  - schema-class: missing code → grpc-decode; INVALID_ARGUMENT,
    FAILED_PRECONDITION, OUT_OF_RANGE, UNIMPLEMENTED, DATA_LOSS
  - contention-class: DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED
  - channel-broken: UNAVAILABLE, INTERNAL, CANCELLED, unmapped codes
  - returns null for falsy errors

runAndroidGrpcDump (success + failure paths):
  - hierarchy parsed from gRPC response.hierarchy XML envelope
  - schema-class UNIMPLEMENTED → drift bit set on android slot,
    no eviction, ios slot stays null
  - contention-class DEADLINE_EXCEEDED → cache PRESERVED (D10)
  - channel-broken UNAVAILABLE → cache evicted, client.close() called
  - CANCELLED-during-shutdown → unavailable/shutdown (R-7), no fallback
  - CANCELLED outside shutdown → channel-broken (cache evicted)
  - empty hierarchy field → grpc-no-xml-envelope drift

Cache reuse + per-instance isolation (D9):
  - reuses same client across calls to same address
  - two independent caches do not share clients
  - connection-fail in cache A does not invalidate cache B

closeGrpcClientCache (Unit 5):
  - closes every cached client, clears map
  - idempotent on empty cache
  - handles undefined / null gracefully

dump({platform:'android'}) dispatch (Unit 3):
  - env set + gRPC success: gRPC primary, no CLI/adb
  - env set + schema-class: returns immediately, no fallback
  - env set + contention-class: SKIPS CLI, goes straight to adb
  - env set + channel-broken: falls through to maestro CLI
  - PERCY_MAESTRO_GRPC=0 kill switch: skips gRPC entirely
  - env absent: gRPC NOT attempted, maestro CLI primary
  - malformed env: falls through cleanly

Test mocking pattern: factory injection (makeFakeFactory + makeFixedClient)
matches the iOS HTTP path's makeFakeHttpRequest. Inlined GRPC_STATUS enum
isolates classifier coverage from @grpc/grpc-js runtime drift.

Test count: 779 → 807 specs total. All 28 new tests pass green; the 27
pre-existing failures (Install Chromium, runDoctorOnFailure, env-flake) are
unchanged from master.

Refs:
- 2026-05-07-002 plan Units 2, 3, 4, 5
…it 6)

Env-gated integration harness that exercises the gRPC primary path under
realistic Maestro flow contention. Spawns a real Maestro flow that holds
the device's gRPC agent active via extendedWaitUntil, then runs N=100
parallel runAndroidGrpcDump() iterations against the same agent. Asserts
{kind:'hierarchy'} on every iteration and records p50/p95/p99 timings.

Pre-merge gate (D11): p95 < 1200ms AND p99 < 2000ms across 100 iterations
under live tapOn flow load. Failure means D11's 1500ms healthy / 5000ms
breaker budget is wrong OR the device-side agent is contention-fragile —
investigate before relaxing the threshold.

Required env (skips cleanly when absent):
  - MAESTRO_ANDROID_TEST_DEVICE: connected Android serial
  - PERCY_ANDROID_GRPC_PORT: realmobile/mobile-injected gRPC port (or
    manual `adb forward tcp:<host_port> tcp:7001` for local validation)
  - MAESTRO_BIN: optional, defaults to `maestro` on PATH

Fixtures:
  - test/integration/fixtures/pause-30s-flow.yaml — Maestro flow that
    parks the device in a known-busy state for 30s
  - test/fixtures/maestro-hierarchy/grpc-response.xml — captured response
    against cli-2.5.1 for fixture-driven unit tests
  - test/fixtures/maestro-hierarchy/grpc-capture-notes.md — recapture
    procedure when Maestro version drifts

Per-Percy cache equivalent: harness instantiates a fresh Map() shared
across iterations so it exercises real channel reuse + the
contention-vs-channel-broken eviction policy from D10.

Refs:
- 2026-05-07-002 plan Unit 6
@Sriram567 Sriram567 changed the title feat(core): iOS element-region resolution via Maestro HTTP transport feat(core): cross-platform Maestro view-hierarchy resolver (Android gRPC + iOS HTTP) May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants