feat(core): cross-platform Maestro view-hierarchy resolver (Android gRPC + iOS HTTP)#2217
Open
feat(core): cross-platform Maestro view-hierarchy resolver (Android gRPC + iOS HTTP)#2217
Conversation
- Add empty body guard (400 instead of TypeError) - Add Busboy fileSize limit to reject oversized uploads during parsing - Use Object.create(null) and field allowlist to prevent prototype pollution - Add stream error handler on Readable source - Use HTTP 413 for oversized files
Reject immediately on Busboy 'limit' event with 413 instead of setting fileBuffer to null which produced 'Missing required file part'.
Accepts {name, sessionId} as JSON, finds the screenshot file on disk
at /tmp/<sessionId>_test_suite/logs/*/screenshots/<name>.png,
base64-encodes it, and processes as a standard comparison.
This enables real-time Percy uploads from Maestro flows where the
JS sandbox cannot access screenshot files directly.
… metadata Accept statusBarHeight, navBarHeight, fullscreen from request instead of hardcoding 0/false. Transform coordinate-based regions to CLI boundingBox format. Add sync mode support via percy.syncMode() + handleSyncJob(). Forward thTestCaseExecutionId to comparison pipeline. Element-based regions log a warning and are skipped — ADB uiautomator resolution will be added as a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Platform-aware screenshot discovery:
- Accept platform field with strict whitelist (ios/android); 400 on unknown
- iOS glob: /tmp/{sessionId}/*_maestro_debug_*/{name}.png
- Android glob unchanged; backward compat with SDK v0.2.0 (no platform → Android)
Path-safety hardening:
- Tighten name/sessionId from blocklist to strict character-class allowlist
- fs.realpath canonicalization + session-root prefix check defeats symlink swap
- Handles macOS /tmp → /private/tmp symlink transparently
Pick most recently modified file when multiple match (iOS same-name-across-flows).
Introduces packages/core/src/adb-hierarchy.js with two plain exports: dump() and firstMatch(nodes, selector). The resolver: - Reads process.env.ANDROID_SERIAL; falls back to one adb devices probe (requires exactly one attached device to avoid wrong-device dumps under multi-session CLI concurrency). Never accepts serial from request input. - Shells out via cross-spawn with a 2s hard timeout (mirrors the browser.js:256-297 spawn+cleanup pattern). - Classifies results into one of three shapes — unavailable, dump-error, hierarchy — so the relay can distinguish environmental failures from transient dump failures. - Streams primary via adb exec-out uiautomator dump /dev/tty; falls back to file-based dump + cat only on wrong-mechanism signals (exit≠0 or missing <?xml prefix). Terminal signals (oversize / parse-error) do not retry — prevents attack amplification on adversarial payloads. - Slices the XML envelope to the first </hierarchy> (strips uiautomator's trailer line and defends against embedded adversarial XML blocks). - Enforces a 5MB stdout cap before parse. - Parses with fast-xml-parser configured for defense-in-depth (processEntities: false, allowBooleanAttributes: false). - Exposes firstMatch with pre-order DFS + strictly-anchored bounds regex; zero-area nodes are non-matches, negative coordinates (clipped views) are allowed. Adds fast-xml-parser ^4.4.1 as a new dependency of @percy/core. 27 unit tests cover the parser + selector logic and all classification branches via a parameter-injected execAdb seam. No real ADB calls; no filesystem or network access.
Wires the /percy/maestro-screenshot relay to the new adb-hierarchy
resolver. Replaces the existing element-region warn-and-skip stub with
actual resolution via ADB + uiautomator dump on Android.
Handler changes:
- Early 400 validation on region shape before file I/O or ADB work:
whitelist selector keys (resource-id/text/content-desc/class), require
exactly one selector key per region, string-typed value, length ≤512,
total regions per request ≤50.
- Android element regions: lazy dump on first element region, memoize
the result (including error classes) for the whole request. Pre-scan
element-region count so the skip warning reports N regions accurately.
Both unavailable and dump-error poison the rest of the request with
one warning — bounds worst-case per-request ADB time to one 2s timeout
regardless of element-region count (closes the timeout-accumulation
DoS vector).
- iOS element regions: preserve existing warn-and-skip semantics. Not
a 400. Avoids a breaking change for any iOS caller today.
- Coordinate regions: unchanged; still transform {top,bottom,left,right}
to elementSelector.boundingBox.
- Miss on element resolution: per-element warning, region skipped,
request still uploads.
First-ever /percy/maestro-screenshot handler tests cover input
validation (9 × 400 paths), coordinate-only flow regression, iOS
warn-and-skip behavior, end-to-end forwarding of testCase/labels/
thTestCaseExecutionId/tile-metadata/sync, and the missing-screenshot
404 path. ADB-integration paths (element resolution against a real
device) are covered by the adb-hierarchy unit tests and Unit 7 E2E
validation on BrowserStack Maestro.
E2E validation on BrowserStack Maestro against host 31.6.63.33 / Pixel 7 Pro showed the primary exec-out path intermittently returning no-xml-envelope and the file-dump fallback exiting 137 (SIGKILL of uiautomator on the device). The kill is triggered by concurrent uiautomator/automation activity on the device during a live Maestro session — not a device-wide or permissions issue (manual dumps from the shell return 44KB XML fine). A single 300ms-delayed retry of the fallback dump command recovers the common case without masking genuine device unavailability. If the second attempt also fails, we still fall through to the existing dump-error classification. Test: the adb-hierarchy spec adds a retry test where the first fallback exec returns 137 and the second returns the fixture XML; resolver returns hierarchy and fileDumpCalls == 2.
Strengthens the SIGKILL retry from a single 300ms attempt to three retries at 500ms/1s/2s (3.5s total window). Exits early as soon as a dump succeeds. Rationale: single short retry wasn't enough against persistent device contention observed during BrowserStack Maestro sessions. The wider budget catches transient uiautomator kills on less-contended devices while still failing fast on genuinely unavailable devices. Captured limitation: when Maestro holds uiautomator throughout a flow (its observed behavior on real devices), no reasonable retry count recovers — the mechanism itself needs to change (e.g., Maestro API integration or an accessibility-service sidecar). That's a Phase 2 follow-up, not part of this patch. Tests cover both the "succeeds on Nth retry" case and the "all retries exhausted" case.
E2E on BrowserStack Maestro showed `adb exec-out uiautomator dump` is
fundamentally incompatible with live Maestro flows — Maestro holds the
uiautomator lock throughout a flow and competing dumps get SIGKILLed.
The `maestro --udid <serial> hierarchy` CLI command reuses Maestro's
existing gRPC connection to dev.mobile.maestro on the device and works
reliably during live sessions (verified by probing twice mid-flow —
both probes returned valid JSON while the flow was running).
Changes in packages/core/src/adb-hierarchy.js:
- Primary dump mechanism is now `maestro --udid <serial> hierarchy`.
- Parse the resulting JSON (slice from the first `{` to tolerate banner
lines), flatten the tree into the existing node shape.
- Map `accessibilityText` → `content-desc` at flatten time so `firstMatch`
still uses the SDK's selector vocabulary unchanged.
- Maestro CLI timeout: 15s (JVM cold start ~9s + headroom).
- Honor `MAESTRO_BIN` env var for alternate paths; default `maestro`
on PATH.
- New `spawnWithTimeout` helper shared between maestro and adb code paths.
- Classification extended with maestro-specific reasons (`maestro-not-found`,
`maestro-timeout`, `maestro-no-device`, `maestro-no-json`,
`maestro-parse-error:*`, `maestro-spawn-error:*`, `maestro-exit-*`,
`maestro-oversize`).
Fallback: when maestro returns anything other than `hierarchy`, fall
through to the existing `adb exec-out uiautomator dump` flow (including
SIGKILL retry/backoff and file-dump fallback). Useful when the maestro
binary isn't installed on the CLI host.
Cost: 9s JVM cold start per screenshot that uses element regions.
Acceptable today because the alternative is 100% skip. Phase 2.2 follow-up:
replace the CLI invocation with a direct gRPC client against device port
6790 (typical latency <100ms) — infrastructure already in place (adb
forwards tcp:8206 → 6790 per device on BrowserStack hosts).
Tests: 36 specs total. New `dump (maestro hierarchy primary)` describe
block adds 7 scenarios (happy path, content-desc mapping, ENOENT→adb
fallback, unavailable propagation when both fail, timeout → adb recovery,
banner prefix tolerance, no-json). Existing 29 tests now inject an
execMaestro stub that reports ENOENT so they exercise the adb fallback
path exactly as before.
New module png-dimensions.js serves both:
- existing /percy/comparison/upload signature check (api.js import)
- upcoming /percy/maestro-screenshot iOS path (scale factor via
pngWidth / wda_window_logical_width + aspect-ratio landscape fallback)
Exports:
- PNG_MAGIC_BYTES (moved from api.js route-local scope)
- parsePngDimensions(buf) → {width, height} via IHDR hand-parse
(24-byte prefix read, no new dependency)
- isPortrait / isLandscape with default threshold 1.25
(iPad portrait ratio 1.334; margin empirically confirmable via A1 Probe 6)
- DEFAULT_ORIENTATION_THRESHOLD exported for override in tests / A1 Probe 6
Test-first: 17 specs covering happy path iPhone/iPad portrait+landscape,
dimensions > 65535, truncated buffer, bad signature, zero width/height,
non-Buffer input, threshold override, near-square ambiguity. All pass.
api.js: removes inline PNG_MAGIC_BYTES declaration from the upload route
handler; imports the shared constant. Upload signature-check behavior
unchanged.
Unit B1 of the iOS Maestro element-regions plan (v1.0); serves as the
Phase 1 CI coverage preflight per plan.
…-meta.json
Reader side of the realmobile ↔ Percy CLI contract v1.0.0 for iOS
element-region resolution on shared BS iOS hosts. Given a Maestro sessionId,
resolves /tmp/<sid>/wda-meta.json → { ok: true, port } or { ok: false, reason }
with TOCTOU-safe validation per contract §8:
File-level (SEI CERT POS35-C ordering, no lstat prefix):
- openSync(path, O_RDONLY | O_NOFOLLOW | O_NONBLOCK) — atomic symlink refusal;
ELOOP → 'symlink', ENOENT → 'missing', else → 'read-error'
- fstatSync on the opened fd — authoritative mode/uid/nlink check:
- st.mode mismatch 0o100600 → 'wrong-mode'
- st.uid mismatch getuid() → 'wrong-owner'
- st.nlink != 1 → 'multi-link' (hardlink attack per Apple Secure Coding
Guide, CVE-2005-2519 class)
- !st.isFile() → 'not-regular-file'
Content validation (contract §2):
- JSON.parse → 'malformed-json'
- schema_version semver-major != 1 → 'schema-version-unsupported'
(accepts 1.x minor bumps; unknown fields ignored for forward-compat)
- wdaPort out of 8400-8410 integer range → 'out-of-range-port'
- sessionId mismatch vs request → 'session-mismatch'
- flowStartTimestamp < getStartupTimestamp() - 5min → 'stale-timestamp'
Input guardrails:
- sessionId regex [A-Za-z0-9_-]{16,64} + null-byte/slash rejection →
'invalid-session-id' (path-traversal defense before any fs touch)
Log scrubbing (contract §5):
- All failure paths emit only the reason tag via logger.debug()
- No selector values, sessionIds, port numbers, paths, or uids in logs
- Verified by a cross-scenario scrub-assertion test
DI: { getuid, getStartupTimestamp } injected for deterministic tests.
22 specs pass. Tests use real fs tmpdirs (bypass memfs) because the module
relies on POSIX O_NOFOLLOW / hardlink semantics memfs doesn't implement.
… (B3)
Core iOS element-region resolver for /percy/maestro-screenshot. Single
GET /session/:sid/source per screenshot, parsed locally via fast-xml-parser,
mirrors the Android adb-hierarchy.js architecture.
Exports:
- resolveIosRegions({regions, sessionId, pngWidth, pngHeight, isPortrait, deps})
→ {resolvedRegions: [{elementSelector, boundingBox, algorithm}], warnings: []}
- shutdown() — aborts all in-flight WDA HTTP AbortControllers (wired to
percy.stop() by B4)
- XCUI_ALLOWLIST — exported Set of ~80 XCUIElement.ElementType values from
the Xcode 16 SDK (Apple XCUIElement.ElementType docs); serves as DoS
guardrail per WDA issue #292
Resolution path (A1-chosen):
1. Landscape gate (isPortrait arg)
2. Kill-switch gate (process.env.PERCY_DISABLE_IOS_ELEMENT_REGIONS from
startup env only; NOT tenant-forwarded via appPercy.env)
3. readWdaMeta dep returns port from realmobile-written wda-meta.json; port
validated in 8400-8410 range
4. GET /wda/screen (loopback-only) → scale from integer `scale` field;
fallback to width-ratio (pngWidth / logical_w) snapped to {2, 3};
fail-closed on raw ratio outside [1.9, 3.1]; LRU cache cap 64 per-session
5. GET /session/:sid/source (loopback-only):
- 20 MB response cap enforced BEFORE parse
- Pre-parse DOCTYPE/ENTITY regex rejection (primary XXE defense)
- fast-xml-parser with processEntities:false (defense-in-depth)
- Cached per screenshot; all regions reuse single fetch
6. Per region:
- Only `id` and `class` accepted in V1; `text`/`xpath` → selector-key-not-in-v1
- class short-form (Button) normalized to long-form (XCUIElementTypeButton);
rejected if normalized form not in allowlist → class-not-allowlisted
- selector > 256 chars → selector-too-long
- tree pre-order first match (zero-match on no-match)
- scale points → pixels, validate in-bounds + non-trivial area (≥4×4) →
bbox-out-of-bounds / bbox-too-small
- outbound elementSelector.class uses normalized long-form (canonical form
on Percy dashboard regardless of customer input style)
HTTP: @percy/client/utils#request via injectable httpClient dep; 500 ms
AbortController timeout per call; retries: 0 to keep timeout honest.
inflight Set tracks active controllers; shutdown() aborts all.
Log scrubbing (contract §5): reason tags only. Verified across all paths —
no selector values, sessionIds, ports, or coords in logs.
23 specs pass. Tests use an injectable fake httpClient + in-memory
handlers; no real network required.
…lay (B4) Wires B1/B2/B3 into api.js's /percy/maestro-screenshot handler. For iOS requests with element regions: 1. Parse IHDR from the already-read fileContent (one buffer read total — no extra fs hit). Failure → warn-skip all iOS element regions with png-unparseable; coord regions + screenshot upload continue. 2. Call resolveIosRegions() once per request with a real @percy/client/utils #request httpClient and a resolveWdaSession-wrapped readWdaMeta dep. 3. Surface each warning to percy.log.warn so support runbook tags are visible in Maestro stdout. 4. Walk the original regions array in input order; positional index into the sparse resolvedRegions produced by the resolver keeps coord and element regions interleaved correctly in the outbound Percy payload. wda-hierarchy now returns a SPARSE array (one entry per input element region; null = skipped) instead of a dense array. Preserves input ordering when element and coord regions are interleaved. All B3 unit tests updated accordingly (22 still pass). percy.js stop() invokes wdaHierarchyShutdown() before server.close() to abort in-flight WDA HTTP calls — http.request has no SIGKILL analog, so a slow /source fetch could otherwise keep the event loop alive past graceful-shutdown timeout. api.test.js: replaced the pre-V1 iOS stub test (which asserted "Element-based region selectors are not yet supported on iOS") with a V1 behavioral test that exercises the full iOS element-region pipeline with a real PNG IHDR header fixture (1170×2532 iPhone 14) and asserts V1 warn-skip semantics for an Android-style `resource-id` selector on iOS (not-in-V1 key). Test suite baseline: 28 pre-existing failures (chromium/doctor download tests unrelated to this change). After B4: 27 failures — same chromium/ doctor failures, plus the iOS stub test now passes with its updated V1 assertions. Zero iOS/wda-hierarchy/maestro-screenshot regressions. Kill-switch (PERCY_DISABLE_IOS_ELEMENT_REGIONS=1) read from Percy CLI process startup env inside wda-hierarchy.js per plan — host-level only, NOT forwarded from tenant appPercy.env (A0.3 property: pending staging verification).
…retries
Implements the three layered fixes documented in
percy-maestro/docs/solutions/integration-issues/ios-wda-session-id-and-node14-abortcontroller-2026-04-23.md.
Each addresses a distinct iOS-region failure mode that surfaced during
2026-04-23 BrowserStack live validation on host 52:
Fix C — Node 14 AbortController feature-detect (callWda):
BS iOS hosts pin to Node 14.17.3 (Nix). AbortController became a global
in Node 15. Without feature detection, the timeout path threw
ReferenceError caught by generic error handling and surfaced as the
same 'wda-error' tag as legitimate WDA failures, masking the other two
fixes during diagnosis. Now: typeof globalThis.AbortController guard +
Promise.race fallback. Adds diagnostic logging on /wda/screen failures
showing err.name/message/code/status/aborted/body.
Fix B — Stale WDA sessionId retry via error-envelope extraction:
WDA's session-scoped routes (/session/:sid/source) reject any sid that
isn't the currently-active session. Maestro spawns its own WDA session
per xctest run, so realmobile's write-time sid capture goes stale
during the test. Refactored fetchAndParseSource into tryFetchSource +
retry coordinator. On staleSession (`{ value: { error: 'invalid session
id' } }`), extracts the top-level `sessionId` from the error envelope
(authoritative for "currently active") and retries once. Falls back to
/status probe if the error body lacks a usable sid.
Fix A (reader side) — wdaSessionId surfacing per contract v1.1.0:
realmobile contract v1.1.0+ probes /status at write_wda_meta time and
surfaces the WDA UUID under wda-meta.json's optional `wdaSessionId`
field. wda-session-resolver now validates the field against
/^[A-Fa-f0-9-]{16,64}$/ (generous bounds for cross-version tolerance)
and surfaces it on the {ok, port, wdaSessionId?} return shape. v1.0.0
writers that omit the field cause callers to fall back to SDK
sessionId (the fast path 404s, then Fix B's retry recovers).
Tests cover all three paths: feature-detected timeout, staleSession
retry from error envelope, /status fallback when error body lacks sid,
v1.1.0 wdaSessionId pass-through, v1.0.0 absence handling, malformed
wdaSessionId rejection.
Note for downstream: this WDA-direct path is gated for deletion by the
2026-04-27 plan (percy-maestro/docs/plans/2026-04-27-001-feat-ios-element-regions-maestro-hierarchy-plan.md)
once Phase 0.5 empirical probe passes. Until then, this is the
production iOS resolver path.
The Android view-hierarchy resolver is becoming the cross-platform Maestro resolver (per percy-maestro/docs/plans/2026-04-27-001-feat-ios-element-regions-maestro-hierarchy-plan.md Unit 1). Rename + shim is purely additive — no behavior change. - Move src/adb-hierarchy.js → src/maestro-hierarchy.js (git mv preserves history). - Move test/unit/adb-hierarchy.test.js → test/unit/maestro-hierarchy.test.js. - Move test/fixtures/adb-hierarchy/ → test/fixtures/maestro-hierarchy/. - Replace src/adb-hierarchy.js with a 5-line re-export shim. Removed in V1.1 per the plan's deprecation guidance. - Update api.js import to ./maestro-hierarchy.js. - Update logger namespace from core:adb-hierarchy → core:maestro-hierarchy. - Update file header to reflect cross-platform intent (the file body has been maestro-first for some time; the previous file name was always misleading). - Update test describe block + import + fixture path. Behavior unchanged. Subsequent units in Phase 1 will add the iOS branch and api.js dispatch logic; this commit is just the rename so the diffs in those units stay focused.
…parity
Phase 1 Unit 2a per percy-maestro/docs/plans/2026-04-27-001-feat-ios-element-regions-maestro-hierarchy-plan.md.
Lands the platform-dispatch scaffolding and the cross-platform selector
vocabulary alias. Real iOS resolver implementation deferred to Unit 2b
post Phase 0.5 fixture capture (FIXME-PHASE-0.5 in code).
Platform dispatch:
- dump({ platform }) accepts 'android' (default — backwards compatible) or
'ios'. iOS branch reads PERCY_IOS_DEVICE_UDID + PERCY_IOS_DRIVER_HOST_PORT
from env (realmobile-injected per Unit 10a; the wda_port + 2700 formula
is realmobile-owned per maestro_session.rb:831). Warn-skip with
reason='env-missing' if either var is unset. Otherwise calls
runMaestroIosDump which currently returns
{ kind: 'unavailable', reason: 'not-implemented' } as the FIXME-PHASE-0.5
stub.
- iOS path never invokes adb (verified by test).
R1 vocabulary parity (Android `id` alias):
- flattenMaestroNodes (Android branch) now surfaces resource-id under both
`resource-id` AND `id` canonical keys on each node. Customer selectors
`{element: {id: "submit-btn"}}` and `{element: {resource-id: "submit-btn"}}`
resolve the same node. iOS users writing `{id: ...}` and Android users
writing the same yaml hit the same code path. Full unified-key migration
(deprecating `resource-id`) deferred to V1.1.
- SELECTOR_KEYS_UNION = [resource-id, text, content-desc, class, id]
drives firstMatch validation. ANDROID_SELECTOR_KEYS_WHITELIST and
IOS_SELECTOR_KEYS_WHITELIST exported separately for callers that want
per-platform validation.
Tests added:
- Android `id` alias resolves same bbox as `resource-id` (3 tests).
- iOS env-missing path (3 tests covering each env-var combination).
- iOS env-set returns 'not-implemented' (FIXME stub).
- iOS dispatch never invokes adb.
- Default (no platform arg) preserves Android behavior.
Smoke-tested via direct node import; full @percy/core test suite has 27
pre-existing failures in Unit / Install in executable Chromium (unrelated
infrastructure issue), but no regressions in the resolver tests.
Phase 1 Unit 3 per percy-maestro/docs/plans/2026-04-27-001-feat-ios-element-regions-maestro-hierarchy-plan.md.
Wires the maestro-hierarchy resolver into the /percy/maestro-screenshot
relay's iOS element-region dispatch, gated by an env switch so default
(unset) behavior is unchanged. Phase 0.5 empirical probe gates the
default flip to the new path; Phase 4 deletes the legacy iOS branch.
- New: read PERCY_IOS_RESOLVER from process.env. When equal to
'maestro-hierarchy', iOS element regions flow through the same
lazy maestroDump({ platform: 'ios' }) + per-region firstMatch
pattern Android already uses. When unset (or any other value),
legacy WDA-direct path remains active — no behavior change for
customers in production today.
- Refactor: the up-front PNG-parse + resolveIosRegions block now only
fires when the env switch is OFF. With the switch on, that work is
unnecessary (the resolver is engineered to be lazy + per-region).
- The cross-platform branch in the per-region loop now also covers iOS
when the switch is on. Same shape as Android: cachedDump lazy memo,
warn-skip on hierarchy-unavailable, firstMatch + bbox forward on
success.
Today (env switch unset): only the cross-platform Android path is
exercised. The iOS branch with the switch on is exercised by the
maestro-hierarchy unit tests landed in Unit 2a (which covers the
'env-missing' and 'not-implemented' stub paths). Unit 4 adds the
parity test that exercises both platforms via the same handler.
A real production rollout flips the default to 'maestro-hierarchy'
in Phase 4 (Unit 9) after Phase 0.5 PASSes; until then, keep the
default off.
Phase 1 Unit 4 per percy-maestro/docs/plans/2026-04-27-001-feat-ios-element-regions-maestro-hierarchy-plan.md.
New test file: test/unit/maestro-hierarchy.parity.test.js. Locks in the
contract that both platform branches return the same { kind, ... } envelope,
that the public API surface (SELECTOR_KEYS_WHITELIST + per-platform
whitelists) is consistent, and that platform dispatch isolates the env-var
reads (Android never reads PERCY_IOS_*; iOS never reads ANDROID_SERIAL).
Bug fix discovered during smoke test:
- flattenNodes (the XML/uiautomator code path) was missing the R1 `id`
alias surface that flattenMaestroNodes (the maestro CLI JSON path)
already had. So `firstMatch(nodes, { id: 'X' })` worked when nodes came
from the maestro path but returned null when nodes came from the adb
fallback path. Now both code paths surface resource-id under both
`resource-id` and `id` keys consistently.
iOS-side parity assertions in this test are scoped to what Unit 2a's stub
can actually cover — envelope shape, whitelist exports, dispatch isolation.
The Phase 4 follow-up (post Phase 0.5 + Unit 2b) extends this file with
real iOS attribute-mapping assertions backed by a captured iOS hierarchy
fixture.
Smoke-tested via direct node import. The full @percy/core test suite has
27 pre-existing Chromium-installer failures unrelated to this work.
…source
Source-synthesized fixtures for the new HTTP-XCTest path Unit 2 will build,
plus the maestro CLI iOS stdout shape for the fallback path. All shapes
verified against mobile-dev-inc/Maestro at ref=cli-2.0.7 (realmobile production
default per /usr/local/.browserstack/realmobile/config/constants.yml).
Notable findings recorded in capture-notes.md:
- PR #2365 has landed: server detects AUT itself; appIds is wire-vestigial.
Percy CLI can send {"appIds": [], "excludeKeyboardElements": false}.
YAML-based bundleId discovery is no longer required for the realmobile
fast path.
- PR #2402 has landed but with a different wrap from cli-1.39.13: response
is now {axElement: {children: [appHierarchy, statusBarsContainer]}, depth}
rather than [springboard, AUT]. The deepening pass's parser rule
('first elementType == 1 whose identifier != com.apple.springboard')
remains correct because the statusBars wrapper has elementType == 0.
- iOS Maestro's TreeNode does NOT carry a 'class' attribute. iOS selector
vocabulary is 'id' only (maps to attributes['resource-id']). The
originally absorbed Unit 2b XCUI elementType integer-to-name table is
not needed for selector matching.
Wire-bytes confidence boost is deferred to Unit 5/6/7 BS validation rather
than blocking on a Unit-1 BS session capture (see plan Viability Gate 2).
…(Unit 2)
Adds runIosHttpDump as the iOS primary path for /percy/maestro-screenshot
element regions: POST {appIds: [], excludeKeyboardElements: false} to Maestro's
iOS XCTestRunner /viewHierarchy endpoint at http://127.0.0.1:wda+2700. Server
detects AUT itself at cli-2.0.7+ (PR #2365 landed); empty appIds returns the
foreground AUT directly.
Replaces the iOS-WIP runMaestroIosDump stub with a real maestro-CLI shell-out
parser (the connection-class fallback path). Maestro's iOS CLI stdout is its
normalized TreeNode shape; existing flattenMaestroNodes consumes it without
iOS-specific code.
Adds flattenIosAxElement adapter for the HTTP path's raw AXElement shape:
walks to first elementType==1 with identifier!='com.apple.springboard' (skips
SpringBoard sibling on cli-1.39.13 wrap; works for both v1.39.13 [springboard,AUT]
and post-PR-2402 single-AUT-root shapes). Frame keys converted from PascalCase
{X,Y,Width,Height} to bracket-format bounds string.
Narrows IOS_SELECTOR_KEYS_WHITELIST from ['id', 'class'] to ['id'].
IOSDriver.mapViewHierarchy at cli-2.0.7 does not populate 'class' on iOS
TreeNode (only 'resource-id' from identifier), so Percy keeps iOS selector
vocabulary aligned with Maestro's actual capability.
Schema-class failures (missing root, missing frame, malformed JSON, 4xx,
non-JSON content-type) return dump-error without falling back. Connection-class
failures (ECONNREFUSED, ETIMEDOUT, ECONNRESET, 5xx) and no-aut-tree responses
(SpringBoard-only) fall back to the maestro-CLI path.
Two-tier deadline mirrors PR #2210's pattern: 1500ms healthy + 5000ms circuit-
breaker. Out-of-range PERCY_IOS_DRIVER_HOST_PORT (outside 11100-11110) skips
the HTTP path entirely. Loopback-only URL guard.
Drift-bit handling deferred to plan Unit 4 (cross-PR coordination with #2210);
schema-class failures currently log-only.
77 of 77 specs pass: 26 new iOS-path scenarios (HTTP primary, CLI fallback,
env handling, parity), and all existing Android tests unchanged.
Plan: percy-maestro/docs/plans/2026-05-06-004-feat-cross-platform-maestro-resolver-unification-plan.md
Fixtures: 65e54b9 (Unit 1)
…shot (Unit 3a)
Adds a three-tier cascade for iOS element-region resolver selection:
1. Per-snapshot override: `request.body.resolver` (validated against
['wda-direct', 'maestro-hierarchy']; HTTP 400 on unknown values).
2. Process env: `PERCY_IOS_RESOLVER` (same allowlist; unknown values
warn + fall through).
3. Default: 'wda-direct' (Unit 3a is opt-in only — Unit 3b's env-conditional
flip is a separate follow-up PR after the validation window).
The per-snapshot `resolver` body field is the ops escape valve documented
in the plan: lets operators `curl` a single snapshot with a specific
resolver for diagnostics without redeploying the CLI. SDK does not set
this today (R8 unchanged).
When the cascade chooses 'maestro-hierarchy', api.js calls the unified
`maestroDump({platform: 'ios', sessionId})` from Unit 2 (HTTP primary
+ CLI fallback). When 'wda-direct', the legacy `resolveIosRegions`
(WDA source-dump) path runs unchanged.
Threads sessionId from the relay request through to maestroDump for
log-scrubbed correlation tagging (Unit 2's `runIosHttpDump` uses sid
prefix in debug logs).
Tests: 6 new Unit 3a scenarios in api.test.js — body.resolver validation,
default-unchanged behavior, env-only path, per-snapshot override (both
directions), graceful fallback on garbage env values. All 6 pass; existing
tests unchanged.
Plan: percy-maestro/docs/plans/2026-05-06-004-feat-cross-platform-maestro-resolver-unification-plan.md
…heck (Unit 4)
Adds module-level maestroHierarchyDrift state with {android, ios} slots that
record the first schema-class failure per platform, plus the setter/getter
exported for cross-cutting wiring. Both slots are null in steady state.
Wires Unit 2's iOS HTTP schema-class failures (missing axElement root,
missing frame, malformed JSON, 4xx, non-JSON content-type) to call
setMaestroHierarchyDrift({platform: 'ios', ...}). Connection-class failures
and no-aut-tree responses do NOT flip the bit (only schema-class — those
are the genuine 'Maestro upstream wire-format drifted' signals that need
ops attention).
Extends /percy/healthcheck to always emit:
maestroHierarchyDrift: { android: ... | null, ios: ... | null }
The android slot is unwritten in this branch — PR #2210's gRPC drift
surface (recordSchemaDrift) sits on a sibling branch. When #2210 merges
and this PR rebases, #2210's Android schema-class call sites retrofit to
use the setter exported here. Companion artifact:
percy-maestro/docs/plans/2026-05-06-004-pr2210-coordination-comment.md.
Tests: 6 new Unit 4 scenarios in maestro-hierarchy.test.js — initial state,
iOS schema-class flips ios slot only, first-seen-per-platform wins,
connection-class doesn't flip, SpringBoard-only doesn't flip, reset helper.
Plus existing /healthcheck test updated to include the new field. Per-platform
slot independence is the central invariant — locks in the design rationale
that simultaneous-drift signal on both platforms is preserved (the
single-field-with-discriminator design rejected during document-review would
have lost that).
Plan: percy-maestro/docs/plans/2026-05-06-004-feat-cross-platform-maestro-resolver-unification-plan.md
…5/6/7) Three env-gated harnesses + supporting fixtures, all skipped in CI and run manually during BS validation. Paste-output-into-PR pattern matches the gRPC harness shape from the originally-planned PR #2210. Unit 7 — maestro-hierarchy-ios-http-concurrent.harness.js (V4.2): Concurrent-access regression. While a real Maestro flow holds the iOS device active via extendedWaitUntil + impossible-selector polling (fixtures/pause-30s-flow-ios.yaml), the harness calls runIosHttpDump N=100 times and records p50/p95/p99 timings. KTD threshold check warns when p95 is within 10% of IOS_HTTP_HEALTHY_DEADLINE_MS (1500ms) so the deadline can be bumped before Unit 3b's flip. Env: MAESTRO_IOS_TEST_DEVICE, PERCY_IOS_DRIVER_HOST_PORT. Unit 6 — maestro-ios-hierarchy-regression.harness.js (V3): WDA failure-class regression. Runs ios-aut-crash-regions.yaml twice: once with PERCY_IOS_RESOLVER=wda-direct (legacy WDA path — element regions silently skip when AUT bundleId isn't running, the production failure mode), and once with =maestro-hierarchy (HTTP path — regions resolve via Maestro's runner which walks system UI without bundleId binding). Output is logged for human verification of the two Percy build URLs. Env: MAESTRO_IOS_TEST_DEVICE, PERCY_SERVER, PERCY_IOS_DRIVER_HOST_PORT. Unit 5 — cross-platform-parity.harness.js (V2): R6 cross-platform parity check. Runs parity-flow-android.yaml + parity-flow-ios.yaml against their respective devices, both resolving {id: 'submitBtn'} through Percy's relay. V1 is log-only — manual eyeball of the side-by-side Percy snapshots — because DPI normalization between Android pixels and iOS logical points is non-trivial without a documented example-app dimension table. V1.1 can tighten to programmatic ±2px assertion later. Env: MAESTRO_PARITY_DEVICES (format: <android-serial>:<ios-udid>), PERCY_SERVER, PERCY_IOS_DRIVER_HOST_PORT. Plan: percy-maestro/docs/plans/2026-05-06-004-feat-cross-platform-maestro-resolver-unification-plan.md
| let entries; | ||
| try { entries = await fs.promises.readdir(dir, { withFileTypes: true }); } catch { return; } | ||
| for (let entry of entries) { | ||
| let full = path.join(dir, entry.name); |
| let entries; | ||
| try { entries = await fs.promises.readdir(dir, { withFileTypes: true }); } catch { return; } | ||
| for (let entry of entries) { | ||
| let full = path.join(dir, entry.name); |
| let baseDir = `/tmp/${sessionId}_test_suite/logs`; | ||
| let logDirs = await fs.promises.readdir(baseDir); | ||
| for (let dir of logDirs) { | ||
| let screenshotPath = path.join(baseDir, dir, 'screenshots', `${name}.png`); |
| let baseDir = `/tmp/${sessionId}_test_suite/logs`; | ||
| let logDirs = await fs.promises.readdir(baseDir); | ||
| for (let dir of logDirs) { | ||
| let screenshotPath = path.join(baseDir, dir, 'screenshots', `${name}.png`); |
| let baseDir = `/tmp/${sessionId}_test_suite/logs`; | ||
| let logDirs = await fs.promises.readdir(baseDir); | ||
| for (let dir of logDirs) { | ||
| let screenshotPath = path.join(baseDir, dir, 'screenshots', `${name}.png`); |
|
|
||
| describe('iOS HTTP dump (runIosHttpDump primary path)', () => { | ||
| const iosFixtureDir = path.resolve(url.fileURLToPath(import.meta.url), '../../fixtures/maestro-ios-hierarchy'); | ||
| const loadIosFixture = name => fs.readFileSync(path.join(iosFixtureDir, name), 'utf8'); |
|
|
||
| describe('iOS maestro-CLI fallback (runMaestroIosDump replacement)', () => { | ||
| const iosFixtureDir = path.resolve(url.fileURLToPath(import.meta.url), '../../fixtures/maestro-ios-hierarchy'); | ||
| const loadIosFixture = name => fs.readFileSync(path.join(iosFixtureDir, name), 'utf8'); |
…de (Units 3b + 8 consolidated) Removes the legacy iOS WDA-direct resolver and the now-dead resolver-choice machinery that selected between WDA and the new HTTP/CLI path. The unified maestro-hierarchy resolver becomes the only iOS path; element regions resolve via runIosHttpDump → maestro-CLI shell-out fallback. Deleted (8 files, ~1700 lines net): - packages/core/src/wda-hierarchy.js (legacy WDA /source resolver) - packages/core/src/wda-session-resolver.js (TOCTOU-safe wda-meta.json reader, consumed only by wda-hierarchy) - packages/core/src/png-dimensions.js (PNG IHDR parser, used only by wda-hierarchy for scale-factor derivation) - packages/core/test/unit/wda-hierarchy.test.js - packages/core/test/unit/wda-session-resolver.test.js - packages/core/test/unit/png-dimensions.test.js - packages/core/test/integration/maestro-ios-hierarchy-regression.harness.js (Unit 6 — was a wda-direct vs maestro-hierarchy comparator; meaningless with wda-direct gone) - packages/core/test/integration/fixtures/ios-aut-crash-regions.yaml (paired with the regression harness) Stripped from api.js: - import resolveIosRegions / resolveWdaSession / parsePngDimensions - body.resolver field validation (single path → meaningless) - PERCY_IOS_RESOLVER env handling (single path → meaningless) - iOS WDA-direct branch + iosResult / iosIndex bookkeeping - The resolver-choice cascade comment block Stripped from percy.js: wdaHierarchyShutdown import + shutdown call. The maestro-hierarchy HTTP path uses a stateless http.Agent that closes when the process exits; no explicit shutdown needed. Stripped from api.test.js: the 6 Unit-3a resolver-cascade tests (validated behavior that no longer exists). The 'iOS element region with Android-style selector key' test consolidated into a single combined test that exercises the unified iOS path with a mix of element + coord regions. Plan implications (Plan: percy-maestro/docs/plans/2026-05-06-004-...): - Unit 3a's per-snapshot resolver override and PERCY_IOS_RESOLVER env: REMOVED. - Unit 3b's telemetry-gated default flip: CONSOLIDATED. The default IS now maestro-hierarchy because there is no other path; no flip pending. - Unit 8's wda-hierarchy.js retirement: SHIPPED HERE rather than ≥1 week post-Unit-3b. Regression risk acknowledged (the P0 from document review): self-hosted iOS Percy customers without realmobile-injected PERCY_IOS_DRIVER_HOST_PORT AND without a working maestro CLI installed lose element-region support on this code path. Their previously-working WDA-direct happy path is gone. Coord regions still work; element regions skip gracefully with a '[percy] Element-region resolver unavailable' warn. Customers in that situation should switch to coord regions or wait for a future Android-style gRPC-direct path. Customer-side rollback: pin to a CLI version before this PR. Test status: 148 of 148 specs run; 6 pre-existing failures unchanged (Jest .toHaveProperty matcher in Jasmine context; AggregateError vs ECONNREFUSED network-stack flake — all unrelated to this work).
…+ stale comments Final cleanup pass: - Delete packages/core/src/adb-hierarchy.js — deprecated compat shim that re-exports from maestro-hierarchy.js. Comment said 'removed in V1.1'; no internal code imports from it. External consumers should import from maestro-hierarchy.js directly. - Rename adbDump/adbFirstMatch import aliases to maestroDump/maestroFirstMatch in api.js. The old names predate the cross-platform rename and were misleading (the function dispatches to platform-appropriate transports, not just adb). - Drop unused 'percyRequest' import in api.js. It was only consumed by the resolveIosRegions call site, which the WDA retirement commit removed. - Update stale comments in maestro-hierarchy.js that referenced 'Unit 2a/2b' scaffolding terminology and the 'Phase 0.5 stub' that's no longer present.
…26-05-07-002 Unit 1) Adds @grpc/grpc-js@^1.14.3 and @grpc/proto-loader@^0.8.0 as @percy/core dependencies, plus the vendored maestro_android.proto from upstream mobile-dev-inc/Maestro at SHA bc8bde1b (cli-2.5.1, 2025-05-26). Both deps declare engines.node floors well below @percy/cli's >=14 (grpc-js >=12.10.0, proto-loader >=6) — no min-Node bump required. Only MaestroDriver/viewHierarchy(ViewHierarchyRequest) returns (ViewHierarchyResponse) and the `string hierarchy = 1` field on the response are consumed at runtime; the rest of the proto is preserved verbatim so future updates can be a clean upstream re-copy without surgical edits. The proto/ dir is preserved into dist/ via Babel CLI's existing copyFiles: true setting (no build-system change needed). Refs: - 2026-05-07-002 plan Unit 1 + D6 + D12 - Replaces in-flight PR #2210 (will be closed once this PR merges)
…omy (Units 2/3/7)
Adds runAndroidGrpcDump as the Android primary path in maestro-hierarchy.js's
dump({platform:'android'}) dispatch, talking the same gRPC transport Maestro
CLI uses but as a stateless RPC that doesn't open a parallel flow context.
Avoids the session-collision failure mode the maestro CLI shell-out hits
during a live Maestro flow (root cause of the 2026-05-07 BS validation
fallback-dump-exit-137 result).
Three-class error taxonomy (D10) — splits PR #2210's two-class scheme:
- schema-class (INVALID_ARGUMENT, FAILED_PRECONDITION, OUT_OF_RANGE,
UNIMPLEMENTED, DATA_LOSS, decoder-failure) → drift bit, no fallback,
cache PRESERVED
- channel-broken (UNAVAILABLE, INTERNAL, CANCELLED, unmapped codes) →
fallback chain runs, cache EVICTED (channel actually broke)
- contention-class (DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED) →
skip CLI fallback (would queue behind same flow), straight to adb,
cache PRESERVED (timeout = backpressure, not channel-breakage; reconnect
would waste TCP+HTTP/2+TLS for nothing)
Symmetric timeout architecture (D11): GRPC_HEALTHY_DEADLINE_MS=1500 +
GRPC_CIRCUIT_BREAKER_MS=5000 — parity with iOS HTTP path. Outer Promise.race
is defense-in-depth against grpc-node#2620-style channel sticking (closed
in 1.9.11 but cheap insurance).
Dispatch (D3 + D9):
PERCY_MAESTRO_GRPC=0 kill switch → skip gRPC entirely (in-process rollback)
PERCY_ANDROID_GRPC_PORT set+valid → runAndroidGrpcDump → branch by class
env absent / invalid → maestro CLI primary → adb (current behavior)
R-7 (shutdown race): runAndroidGrpcDump accepts shutdownInProgress flag
sourced from grpcClientCache.shutdownInProgress (set by percy.stop()
before close). CANCELLED-during-shutdown returns {kind:'unavailable',
reason:'shutdown'} — no fallback chain on a tearing-down process.
Drift recording uses this branch's existing two-slot
setMaestroHierarchyDrift({platform:'android', code, reason}) — drops
PR #2210's separate single-slot recordSchemaDrift.
Cleanup: removes the stale "Single-author note about PR #2210" comment
and a stale grpc-node#2620 framing reference (issue is closed).
Refs:
- 2026-05-07-002 plan Units 2, 3, 7 + D1, D3, D9, D10, D11
- Replaces PR #2210's runGrpcDump
Constructs grpcClientCache as a per-Percy-instance Map in the constructor
(matching the established ownership pattern for browser, server, queues,
client, monitoring). Disposes via closeGrpcClientCache(this.grpcClientCache)
in stop()'s teardown block.
Module-global state would leak channels between concurrent Percy instances
in a single process (programmatic-API users, test harnesses) and create
shutdown races where one instance's stop() invalidates another's pending
RPCs. Per-instance ownership matches every other long-lived resource on
Percy.
Asymmetry with maestroHierarchyDrift (deliberate, documented in
maestro-hierarchy.js header): drift envelope stays module-scoped because
drift is observability state — surfaced process-wide on /percy/healthcheck.
Channels are transport state — per-instance lifecycle.
R-7 (shutdown race): sets cache.shutdownInProgress = true BEFORE closing
channels so any in-flight runAndroidGrpcDump() that hits CANCELLED returns
{kind:'unavailable', reason:'shutdown'} instead of triggering the maestro
CLI + adb fallback chain on a tearing-down process.
api.js relay handler now threads percy.grpcClientCache through to
maestroDump() so the Android gRPC primary can reuse channels across
snapshots in the same session.
Mitigates grpc-node#2964 (open) — ChannelzTrace memory leak when Client
is not explicitly .close()-d.
Refs:
- 2026-05-07-002 plan Unit 5 + D9
- R-7 (shutdown race), R-3 (cache scope) resolved by this commit
… dispatch (Units 2/3/4/5)
28 new specs covering the absorbed gRPC primary path:
classifyGrpcFailure (D10 three classes):
- schema-class: missing code → grpc-decode; INVALID_ARGUMENT,
FAILED_PRECONDITION, OUT_OF_RANGE, UNIMPLEMENTED, DATA_LOSS
- contention-class: DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED
- channel-broken: UNAVAILABLE, INTERNAL, CANCELLED, unmapped codes
- returns null for falsy errors
runAndroidGrpcDump (success + failure paths):
- hierarchy parsed from gRPC response.hierarchy XML envelope
- schema-class UNIMPLEMENTED → drift bit set on android slot,
no eviction, ios slot stays null
- contention-class DEADLINE_EXCEEDED → cache PRESERVED (D10)
- channel-broken UNAVAILABLE → cache evicted, client.close() called
- CANCELLED-during-shutdown → unavailable/shutdown (R-7), no fallback
- CANCELLED outside shutdown → channel-broken (cache evicted)
- empty hierarchy field → grpc-no-xml-envelope drift
Cache reuse + per-instance isolation (D9):
- reuses same client across calls to same address
- two independent caches do not share clients
- connection-fail in cache A does not invalidate cache B
closeGrpcClientCache (Unit 5):
- closes every cached client, clears map
- idempotent on empty cache
- handles undefined / null gracefully
dump({platform:'android'}) dispatch (Unit 3):
- env set + gRPC success: gRPC primary, no CLI/adb
- env set + schema-class: returns immediately, no fallback
- env set + contention-class: SKIPS CLI, goes straight to adb
- env set + channel-broken: falls through to maestro CLI
- PERCY_MAESTRO_GRPC=0 kill switch: skips gRPC entirely
- env absent: gRPC NOT attempted, maestro CLI primary
- malformed env: falls through cleanly
Test mocking pattern: factory injection (makeFakeFactory + makeFixedClient)
matches the iOS HTTP path's makeFakeHttpRequest. Inlined GRPC_STATUS enum
isolates classifier coverage from @grpc/grpc-js runtime drift.
Test count: 779 → 807 specs total. All 28 new tests pass green; the 27
pre-existing failures (Install Chromium, runDoctorOnFailure, env-flake) are
unchanged from master.
Refs:
- 2026-05-07-002 plan Units 2, 3, 4, 5
…it 6)
Env-gated integration harness that exercises the gRPC primary path under
realistic Maestro flow contention. Spawns a real Maestro flow that holds
the device's gRPC agent active via extendedWaitUntil, then runs N=100
parallel runAndroidGrpcDump() iterations against the same agent. Asserts
{kind:'hierarchy'} on every iteration and records p50/p95/p99 timings.
Pre-merge gate (D11): p95 < 1200ms AND p99 < 2000ms across 100 iterations
under live tapOn flow load. Failure means D11's 1500ms healthy / 5000ms
breaker budget is wrong OR the device-side agent is contention-fragile —
investigate before relaxing the threshold.
Required env (skips cleanly when absent):
- MAESTRO_ANDROID_TEST_DEVICE: connected Android serial
- PERCY_ANDROID_GRPC_PORT: realmobile/mobile-injected gRPC port (or
manual `adb forward tcp:<host_port> tcp:7001` for local validation)
- MAESTRO_BIN: optional, defaults to `maestro` on PATH
Fixtures:
- test/integration/fixtures/pause-30s-flow.yaml — Maestro flow that
parks the device in a known-busy state for 30s
- test/fixtures/maestro-hierarchy/grpc-response.xml — captured response
against cli-2.5.1 for fixture-driven unit tests
- test/fixtures/maestro-hierarchy/grpc-capture-notes.md — recapture
procedure when Maestro version drifts
Per-Percy cache equivalent: harness instantiates a fresh Map() shared
across iterations so it exercises real channel reuse + the
contention-vs-channel-broken eviction policy from D10.
Refs:
- 2026-05-07-002 plan Unit 6
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the per-snapshot ~9s JVM-cold-start
maestro hierarchyCLI shell-out with direct transport to Maestro's view-hierarchy services, on both platforms. Element-region resolution now runs as a stateless RPC against Maestro's existing channel rather than spawning a second Maestro flow context — fixing the production gRPC-session-collision failure mode that drops snapshots whenever a Maestro flow is in progress (which is always, during element-region resolution).MaestroDriver/viewHierarchyon127.0.0.1:$PERCY_ANDROID_GRPC_PORTuiautomator dump/viewHierarchyon127.0.0.1:$PERCY_IOS_DRIVER_HOST_PORT--driver-host-port)Both transports drop in alongside the same two-slot
maestroHierarchyDriftenvelope on/percy/healthcheck(per-platform schema-drift surface) and follow the same three-class error taxonomy: schema-class → no fallback + drift bit; channel-broken → fallback + cache eviction; contention-class → fallback (skipping CLI) + cache PRESERVED.Self-hosted customers without the env var injection see zero behavior change — the env var presence is the deployment-shape signal; absence routes to the existing maestro CLI primary + adb fallback.
This PR consolidates work originally scoped across three branches (
feat/ios-element-regions-maestro-hierarchyPR #2202 closed,feat/grpc-element-region-resolverPR #2210 to be closed, and the iOS HTTP work on this branch).Architecture
iOS HTTP path (already shipped on the prior 14 commits of this branch)
runIosHttpDumpPOSTs{appIds: [], excludeKeyboardElements: false}to Maestro's iOS XCTestRunner/viewHierarchyendpoint. At cli-2.0.7 the runner detects the AUT itself viaRunningApp.getForegroundApp()(Maestro PR #2365) — no bundleId discovery, no SDK changes, no realmobile control-plane changes. SpringBoard-only responses (older Maestro) route to maestro-CLI fallback.Android gRPC path (newly absorbed from PR #2210)
runAndroidGrpcDumpcallsMaestroDriver/viewHierarchydirectly via@grpc/grpc-jsover the same transport Maestro CLI uses internally. Vendored proto atpackages/core/src/proto/maestro_android.proto(upstream SHAbc8bde1b, cli-2.5.1).Three-class error taxonomy (refined from PR #2210's two-class scheme during deepen-plan):
The contention-class refinement is the key correctness win: timeout under live Maestro flow load is backpressure evidence, not channel-breakage. Evicting the channel on every timeout would force a TCP+HTTP/2+TLS reconnect (~50-200ms cost) that buys nothing, because the underlying agent is still busy. Keeping the channel cached lets the next call reuse the queue position.
Symmetric timeouts:
GRPC_HEALTHY_DEADLINE_MS = 1500+GRPC_CIRCUIT_BREAKER_MS = 5000, parity with the iOS HTTP path's existing values. OuterPromise.raceis defense-in-depth against historicalgrpc-node#2620(closed in 1.9.11).Per-
Percycache scope: thegrpcClientCacheMap is constructed on thePercyinstance and disposed instop()— matches@percy/core's established ownership pattern for every other long-lived resource (server, browser, queues, client). Module-global state would leak channels between concurrentPercyinstances and create shutdown races.Shutdown race handling:
percy.stop()setscache.shutdownInProgress = truebefore closing channels. Any in-flightrunAndroidGrpcDumpthat hitsCANCELLEDreturns{kind:'unavailable', reason:'shutdown'}— no fallback chain on a tearing-down process.Two rollback knobs (deliberate)
PERCY_ANDROID_GRPC_PORT(orPERCY_IOS_DRIVER_HOST_PORT) injection in mobile/realmobile.PERCY_MAESTRO_GRPC=0in the BS appPercy env. Skips gRPC on next CLI restart without coordinated mobile-side deploy. The 3am-page response.What's in this bundle
The iOS HTTP path commits already on this branch plus 5 new commits absorbing PR #2210:
68e67db573459be6135f0ea3bf5d1f55e8583bdf(Commits
a1bd69daand earlier are the iOS HTTP path documented in the prior PR description history.)Testing
Unit / maestro-hierarchy / Android gRPC primary path— 28 specs acrossclassifyGrpcFailure,runAndroidGrpcDump, cache reuse + per-instance isolation,closeGrpcClientCache, anddump({platform:'android'})dispatch.BrowserStack App Automate validation (2026-05-07)
End-to-end validation of the iOS HTTP path on real BS realmobile/mobile hosts (full plan + results in local
docs/plans/2026-05-07-001-feat-bs-validation-maestro-ios-http-resolver-plan.md):5439a79e(passed)unavailable / multi-device-no-serial(resolved by mobile PR #13206 commitddac377)8ed1a6c8(passed)dump-error / fallback-dump-exit-137(the gRPC-contention root cause this PR's gRPC-direct path fixes)41ebf750(passed)dump-error / http-non-json-content-type— iOS HTTP primary path was exercised in production; schema-drift envelope correctly classified the response.The validation surfaced the BS-side env-injection gaps (mobile and realmobile didn't pass
ANDROID_SERIAL/PERCY_IOS_DRIVER_HOST_PORTto the Percy CLI process). Companion commits landed on the BS-side PRs:ddac377—ANDROID_SERIALinjection incli_manager.rb.62a0f7e—PERCY_IOS_DRIVER_HOST_PORT+PERCY_IOS_DEVICE_UDIDinjection inmaestro_session.rb start_app_percy.Post-Deploy Monitoring & Validation
[percy] iOS HTTP schema-drift: .../[percy] gRPC viewHierarchy schema-class failure (...)— proto drift signals./percy/healthcheckmaestroHierarchyDriftenvelope. Alert when.androidor.iosslot populates.curl http://127.0.0.1:<cli_port>/percy/healthcheckmid-session → expectmaestroHierarchyDrift: { android: null, ios: null }.PERCY_ANDROID_GRPC_PORTinjection lands:MAESTRO_ANDROID_TEST_DEVICE=<serial> PERCY_ANDROID_GRPC_PORT=<port> node packages/core/test/integration/maestro-hierarchy-concurrent.harness.js— gate is p95 < 1200ms AND p99 < 2000ms across 100 iterations.[percy] dump took Nms via grpc (M nodes)(Android, env-set) or[percy] dump took Nms via maestro-http(iOS, env-set).null.PERCY_MAESTRO_GRPC=0for fast rollback.Pending follow-ups (out of scope for this PR)
browserstack/mobilePR injectingPERCY_ANDROID_GRPC_PORT(analog ofrealmobilecommit62a0f7efor iOS). Without this, the Android gRPC primary stays dormant in production.http-non-json-content-typefrom this realmobile deployment's Maestro version. Needs follow-up to determine missing content-type header vs endpoint shape vs version mismatch.feat/grpc-element-region-resolverafter this PR merges to master.🤖 Generated with Claude Opus 4.7 (1M context, ultrathink) via Claude Code