feat(codeceptq): CLI to query HTML with CodeceptJS locators#5550
Merged
feat(codeceptq): CLI to query HTML with CodeceptJS locators#5550
Conversation
Adds `codeceptq` — a standalone CLI that takes an HTML stream (stdin or --file) plus a CodeceptJS locator (CSS / XPath / fuzzy / semantic) and prints matched elements with line numbers and outerHTML snippets. Designed to give AI agents a fast feedback loop against `aiTrace`'s per-step HTML snapshots: "would this locator match at step N?" without re-running the test or spawning a browser. - Reuses Locator class for CSS→XPath conversion + semantic builders (--field, --click, --checkable, --select). - Optional context arg scopes matches: `codeceptq 'Save' '.modal' --click`. - Stable output flags: --limit, --snippet (default 500), --full, --json. - Exit codes: 0 match, 1 no match, 2 invalid input/XPath. - formatHtml now uses `inline: []` so every element gets its own line in trace HTML — line numbers map 1:1 to elements for codeceptq output. - 45 runner tests against test/data/checkout.html, github.html, gitlab.html, drag_drop.html assert exact line + snippet for every locator strategy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run_test, run_step_by_step, and pausedPayload now include aiTraceDir (the per-test output/trace_<title>_<hash>/ folder) so agents can point codeceptq directly at the saved *_page.html snapshots without globbing or recomputing the hash. Per-test entries in reporterJson.tests[] also carry the dir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # bin/mcp-server.js
The 'Sign up' --click case on github.html (2k-line fixture, 12-branch semantic union XPath) takes ~8s locally and exceeds the default 10s mocha timeout on slower CI runners. Suite-level timeout matches what the local runs already use. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locator.clickable.wide and field.labelContains emit predicates of form [@aria-labelledby = //*[@id][normalize-space(string(.)) = 'X']/@id ]. xpath@0.0.34 re-runs the inner //* scan once per outer element match — O(N²) on non-trivial docs. The 2k-line github fixture spent 8.5s in that single branch out of 12. Pre-resolve the inner subquery once, splice the resulting id (or a sentinel for no-match) back as a literal so the engine sees a flat attribute compare. Github 'Sign up' --click: 9026ms → 276ms (~33×). Full runner suite: 14s → 6s. Reverts the 30s describe-level timeout from the previous commit since the underlying perf issue is now fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the post-hoc regex pre-resolver with strategy-level construction. Each semantic locator (--click/--field/--checkable) is built as a list of XPath branches; doc-wide subqueries (label[@for] resolution, ids by visible text) are evaluated once and inlined as literal predicates instead of sitting nested inside outer per-element predicates that the engine re-executes on every match. Eval loop runs each branch separately and sorts results by source offset to preserve the document-order contract of XPath unions. Github 'Sign up' --click: 9000ms → 264ms (independent of XPath engine — fontoxpath benched the same as xpath@0.0.34 on the original union). All 45 runner tests pass with identical line/snippet output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cate The wide clickable / labelContains field XPath includes: .//*[@aria-labelledby = //*[@id][normalize-space(string(.)) = X]/@id] That predicate forces every element to evaluate the inner //*[@id] subquery, which is O(N²) on any non-trivial document for pure-JS XPath engines (xpath npm: 7641ms on a 2k-line page; fontoxpath: 7057ms on the same branch). Browser engines optimize via join-pushdown. Adding [@aria-labelledby] as a left-to-right filter predicate first cuts the slow comparison to only elements that actually have the attribute: .//*[@aria-labelledby][@aria-labelledby = //*[@id][...]/@id] 7641ms → 52ms (147×). Semantics identical: in XPath, [A][B] and [A and B] produce the same result-set, but predicates are evaluated left-to-right, so the cheap attr-existence check filters out the bulk first. This is a single-character XPath change — codeceptq goes from 9000ms → 325ms on test/data/github.html with no special-case code. Reverted the per-strategy reimplementation in lib/command/query.js (back to using Locator.clickable.wide / Locator.field.byText directly). Added two unit tests for the aria-labelledby branch in Locator.clickable.wide (positive + negative). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
codeceptq— a standalone CLI that takes an HTML stream (stdin or--file) plus a CodeceptJS locator (CSS / XPath / fuzzy / semantic) and prints matched elements with line numbers and outerHTML snippets.Designed for AI agents iterating on locators against
aiTrace's per-step*_page.htmlsnapshots: "would this locator match at step N?" — answered in milliseconds, no browser, no re-run.Changes
bin/codeceptq.js(registered ascodeceptqinpackage.json#bin).lib/command/query.js— parse5-tracked line numbers + xmldom xpath eval. ReusesLocatorfor CSS→XPath and semantic builders (Locator.field.byText,Locator.clickable.wide,Locator.checkable.byText,Locator.select.byVisibleText).lib/html.js#formatHtmlnow passesinline: []to js-beautify so every element in trace HTML lands on its own line — line numbers fromcodeceptqmap 1:1 to elements.xpath@0.0.34promoted from devDependencies → dependencies (already in tree).--snippetlength 500 chars;--fullfor complete outerHTML;--jsonfor tooling.0match,1no match,2invalid input/XPath.Tests
test/runner/codeceptq_test.js— 45 tests againsttest/data/{checkout,github,gitlab,app/drag_drop}.html. Each assertion shows the expected{ line, snippet }inline so the test source is also a behavior spec:Coverage: XPath, CSS (id/class/attr/forced),
--field,--click/--clickable,--checkable,--select, fuzzy auto-detect, context scoping, stdin vs--file,--limit,--snippet,--full,--json, exit codes, large fixtures.Test plan
npx mocha test/runner/codeceptq_test.js→ 45 passingnpx mocha test/unit/html_test.js test/unit/utils/trace_test.js→ existing tests still pass with newinline: []npx eslint bin/codeceptq.js lib/command/query.js test/runner/codeceptq_test.js→ cleanexamples/output/trace_*/*_page.html— finds all 17 inputs with line numbersnpx codeceptq 'something' --file output/trace_*/<step>_page.htmlafter a real test run🤖 Generated with Claude Code