A CI chaos monkey for web apps. Point it at your site and it crawls every page on its own β discovering links and in-app (SPA) navigations as it goes β then on each page finds every button/link/input and mashes them: clicking, double-clicking, typing random keystrokes, selecting, scrolling, resizing, navigating. It even completes create-flows β filling forms with valid data and submitting them β so empty apps populate themselves and deep editors get exercised. When something breaks (an uncaught error, a 500, a crash, a blank screen, a broken imageβ¦) it writes a report and fails your build.
It's deterministic (seeded, so any failure replays), bounded (action/time budgets), and safe by default: it stays on your origin, skips destructive controls, refuses to run against live payment keys, and redacts secrets.
npx buttonmash run https://staging.example.comWarning
Point this at a test/staging environment, never production. A random clicker mutates state. Use Stripe/PayPal test mode and test cards. buttonmash tries hard to avoid damage (see Safety), but those are guardrails, not guarantees β the real safety control is running against a disposable environment with test-mode billing.
Existing in-page monkeys (gremlins.js and friends) inject synthetic events and never actually fail your CI β they just log to the console. buttonmash flips that around: it drives the page from the harness side with Playwright, so it owns the verdict and the exit code. It also enumerates real elements (so it hits buttons below the fold, unlike coordinate-based clickers), dispatches trusted input, and deduplicates findings into an actionable report with a reproducible seed.
npm install --save-dev buttonmash
npx playwright install --with-deps chromium # one-time browser installRequires Node 20+.
# 1. (optional) capture an authenticated session β opens a browser, you log in
npx buttonmash auth https://staging.example.com/login
# β saves cookies/localStorage to playwright/.auth/user.json
# 2. scaffold a config (optional)
npx buttonmash init
# 3. run it
npx buttonmash run https://staging.example.com --auth playwright/.auth/user.json
# 4. reproduce a failure exactly (the seed is printed on every run)
npx buttonmash run https://staging.example.com --seed <seed-from-report>When it finishes you get a buttonmash-report/ folder with report.html
(self-contained), results.json, and junit.xml. Exit code is 1 if anything
broke at or above your fail threshold.
The quickest way is the bundled composite action (installs the browser + runs buttonmash + uploads the report):
name: buttonmash
on: [pull_request]
jobs:
buttonmash:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- uses: actions/checkout@v5
# start your app under test here (e.g. npm ci && npm run start &) and wait for itβ¦
- uses: cj-vana/buttonmash@v0.1.8
with:
target: http://localhost:3000
args: --seed ci --max-actions 800Or wire it by hand for full control:
name: buttonmash
on: [pull_request]
jobs:
buttonmash:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- uses: actions/checkout@v5
- uses: actions/setup-node@v5
with: { node-version: 20, cache: npm }
- run: npm ci
- run: npx playwright install --with-deps chromium
# start your app under test here (e.g. npm run start &) and wait for itβ¦
- run: npx buttonmash run http://localhost:3000 --seed ci --fail-on high
env:
# storageState captured locally and stored as a secret (base64 or file)
STORAGE_STATE: ${{ secrets.BUTTONMASH_STORAGE_STATE }}
- uses: actions/upload-artifact@v5
if: ${{ !cancelled() }}
with: { name: buttonmash-report, path: buttonmash-report/ }buttonmash auto-detects GitHub Actions and emits inline annotations plus a job-summary table. The non-zero exit code fails the job.
By default buttonmash auto-crawls: starting from your target, it discovers
every same-origin <a href> link and every client-side route the app
navigates to via buttons/navigate() (it hooks pushState/popstate), queues
them, and works through them breadth-first. When the link frontier runs dry it
returns to the start and keeps clicking β so button-driven SPA shells (where the
nav isn't <a href>) still get fully covered. One run, the whole reachable site:
npx buttonmash run https://staging.example.com # crawls everything it can reachControls:
budget.maxPagesβ cap on distinct pages per run (default 100), so CI stays bounded.routesβ optional hints: pages nothing links to (e.g. a deep editor URL). They seed the frontier; the crawl finds the rest. Also available as--route <url...>.explore.crawl: falseβ disable auto-crawl and only sweeptarget+routes.
Dangerous paths (logout/delete/cancel) and off-origin URLs are never enqueued.
Discovery also reaches inside open shadow DOM (web-component design systems β Salesforce LWC, Ionic, Shoelace/Lit/Material Web) and same-origin iframes (embedded editors, wizards), so component-based apps aren't invisible to it.
It's built to survive messy real apps on long CI sweeps: it recovers from
renderer crashes (recreates the page and continues, skipping the page that
crashed), opens custom ARIA dropdowns and picks an option, declines file
pickers so a file input can't hang the run, and you can scope the crawl
with guardrails.includePaths / excludePaths.
A fresh app is mostly empty lists β so buttonmash creates its own data. When
it finds a fillable form (or opens a "New/Add/Create" flow), it fills every
required field β and a fraction of optional ones β with valid, deterministic
values inferred from each field's type/label/pattern/min-max/options (real
emails, in-range numbers, seeded dates, a chosen <select> option, mirrored
password-confirm), clicks the form's safe submit, repairs on validation
errors, and follows into the created record so deep editors get exercised. No
per-site config β detection is structural, so it works on any app.
It stays safe by reusing the same guardrails: it never submits a form with a
credit-card field, an auth/login/signup form (would mutate your session), or one
whose submit is destructive β and the network fence still blocks live payments.
One free-text field per form carries a reflected-input canary, so created
records still feed the XSS oracle. Bounded by explore.forms.maxRecords;
--dry-run fills but never submits. Turn it off with explore.forms.enabled: false.
Create buttonmash.config.ts (or .js/.json) β buttonmash init writes a
starter. CLI flags override the file.
import { defineConfig } from 'buttonmash';
export default defineConfig({
target: 'https://staging.example.com',
seed: 'ci',
// Auth: a saved sessionβ¦
auth: { storageState: 'playwright/.auth/user.json' },
// β¦or a scriptable login (CI-friendly; re-authenticates if the session drops
// mid-run). Credentials support ${ENV_VAR} so secrets stay out of the file:
// auth: {
// loginScript: {
// url: '/login', usernameSelector: '#email', passwordSelector: '#password',
// submitSelector: 'button[type=submit]', username: '${E2E_USER}', password: '${E2E_PASS}',
// successUrl: '/dashboard',
// },
// },
budget: { maxActions: 500, maxDurationMs: 300_000, maxDepth: 12, maxPages: 100 },
// Point it at any deployment: extra headers (auth proxy / feature flags),
// HTTP basic-auth, and a device viewport. ${ENV_VAR} keeps secrets in env.
// headers: { 'X-Feature-Flag': 'on', Authorization: 'Bearer ${API_TOKEN}' },
// viewport: { width: 390, height: 844 }, // mobile
// auth: { basicAuth: { username: '${BASIC_USER}', password: '${BASIC_PASS}' } },
// Auto-crawl is on by default; `routes` are optional hints for pages nothing
// links to (e.g. a deep editor). The crawl discovers everything else.
// routes: ['/dashboard', '/settings/billing'],
explore: { crawl: true },
guardrails: {
// allowedOrigins: ['https://staging.example.com'], // defaults to target origin
// includePaths: ['^/app/'], // scope the crawl (regex on pathname)
// excludePaths: ['/admin'], // never crawl these
billing: { mode: 'refuse' }, // refuse | warn | off
// dryRun: true, // read-only: explore without submitting
destructive: { enabled: true, extraVerbs: ['archivar'] },
},
detectors: {
a11y: false, // opt-in axe-core scan
ignoreHttpStatuses: [401, 403],
ignorePatterns: ['ResizeObserver loop'], // benign console/network noise (regex)
custom: [{ name: 'error-boundary', pattern: 'Something went wrong', severity: 'high', target: 'console' }],
},
failOn: 'high', // critical | high | medium | low | info
});| Flag | Description |
|---|---|
--seed <s> |
Reproducibility seed (printed every run) |
--route <url...> |
Extra route hints to sweep in the same run (crawl finds the rest) |
--max-actions <n> / --max-duration <sec> |
Budget |
--fail-on <severity> |
Min severity that fails the build (default high) |
--dry-run |
Read-only: explore without submitting or mutating |
--auth <path> |
Playwright storageState JSON |
--billing <refuse|warn|off> |
Live-payment guard |
--browser <chromium|firefox|webkit> |
Engine |
--headed |
Show the browser |
--out <dir> / --formats json,junit,html,sarif |
Reporting |
- Uncaught JS errors and
console.error - HTTP 4xx/5xx responses and failed requests
- Renderer crashes and hangs / unresponsive pages (wall-clock watchdog)
- Framework error overlays (Next.js/Vite/React, "Application error") β caught even when an error boundary swallows the throw
- Blank screens ("white screen of death") and broken images
- Reflected input β a safe canary probe that flags possible XSS sinks (never injects executing payloads)
- Client-exposed secrets (Stripe/AWS/GitHub/Slack/β¦ keys, gitleaks-derived)
- Accessibility violations via axe-core (opt-in)
- Session loss β if an authed run gets redirected to a login page mid-run (expired session), it flags it and re-authenticates when a login script is configured
- Custom signals β your own console/DOM/url regex rules
Findings are deduplicated (the same bug firing 500Γ becomes one finding with
count: 500) and carry a minimal repro trace. To stay usable on real apps,
buttonmash ships a default allowlist of benign console noise (ResizeObserver
loops, React dev warnings, HMRβ¦) and downgrades third-party console.error
(analytics/chat/payment SDKs) so they don't redden your build β first-party
errors stay high (detectors.thirdPartyConsole: true to opt in). State dedup is
structural by default, so live counters/clocks don't explode the state space
on dynamic apps. And if CI cancels or times out mid-run, a partial report is
still written (SIGTERM-safe) so you never lose the findings collected so far.
buttonmash is built to break things without breaking you:
- Stay on origin. Off-origin navigations and
target=_blankpopups are blocked; the session is fenced to your allowed origins. - Skip destructive controls. Buttons/links matching a multilingual verb
list (delete, pay, logout, cancel subscription, β¦), or pointing at dangerous
paths (
/logout,/account/delete,/billing/cancel), are detected and downgraded to a harmless hover. - Refuse live billing. If live Stripe/Braintree keys or live processor hosts
are detected, buttonmash aborts (
billing.mode: 'refuse') and tells you to switch to test mode. Publishable test keys are fine. - Redact secrets. Anything matching a secret pattern is scrubbed before it's written to any report or artifact; auth/cookie headers are never persisted.
- Dismiss, never confirm. Native
confirm()/beforeunloaddialogs are always dismissed, so the monkey can't click "Yes, delete". - Dry-run mode.
--dry-runexplores read-only: hover, scroll, navigate links β no form submits, typing, or mutations.
Every run writes results.json (the source of truth). Optionally junit.xml
(for CI test rendering), a self-contained report.html, and results.sarif
(for GitHub code-scanning). Exit codes follow the pytest/ESLint convention:
| Code | Meaning |
|---|---|
0 |
No findings at/above the fail threshold |
1 |
Findings at/above the threshold β the build-failing signal |
2 |
buttonmash itself errored (bad config/usage/internal) |
Every choice β which element, which action, which input β flows through a single
seeded PRNG, and the in-page Math.random is seeded identically. The seed is
printed at startup and embedded in every report, and replaying it makes the
monkey take the same decisions.
Caveat worth knowing: the page clock is deliberately not frozen (freezing
time breaks many real apps). So replay is reliable for apps whose rendered DOM
is stable given the same inputs; apps with heavy async-loaded content, polling,
or wall-clock/Math.random-driven rendering can still diverge, because a
different DOM at a step changes what the monkey sees and therefore what it picks
next. Pinning the seed in CI plus a stable build gets you most of the way.
import { buttonmash } from 'buttonmash';
const result = await buttonmash({ target: 'http://localhost:3000', failOn: 'high' });
console.log(result.stats, result.findings);
if (result.run.exitCode !== 0) process.exit(1);launch (Playwright) β auth (storageState) β fence (origin/dialogs/popups)
β crawl frontier (target + routes; grows with discovered links + SPA navs)
β per page: discover interactive elements β fingerprint state (coverage)
β choose element (epsilon-greedy) β gate (safety) β perform action
β log trace β capture artifacts on signal
β when exhausted, move to the next page in the frontier
β aggregate + dedupe β report (json/junit/html/sarif) β exit code
- Doesn't pierce shadow DOM or cross-origin iframes (payment iframes are intentionally left alone).
- Pure random/coverage exploration can under-explore deep multi-step flows.
- Heuristic destructive detection is English-leaning + a few languages; extend
destructive.extraVerbsfor your UI. Sandbox + test mode is the real safety net.
npm install
npm run build # tsup β dist/
npm test # vitest (unit + e2e against the bundled buggy app)
npm run typecheck && npm run lintThe examples/buggy-app/ is a deliberately broken page used to dogfood the tool
in CI.
MIT Β© cj-vana