chore: remove Playwright smoke tests from app template#346
chore: remove Playwright smoke tests from app template#346
Conversation
Playwright and sharp add significant install weight to the template without providing value in the default developer workflow. Vitest alone covers unit testing needs. Co-authored-by: Isaac
d798bf3 to
69d025c
Compare
|
Running custom evals build on this to check if there is any regression after the playwright drop https://6177827686947384.4.gcp.databricks.com/jobs/212883645927255/runs/313658142661406 |
|
Running a targeted dev eval to measure quality impact of this change.
Will post per-app deltas and a verdict here when the run terminates (~40–60 min). |
|
Initial run finished (
|
| Metric | baseline 456555456546311 |
this branch | Δ |
|---|---|---|---|
build_success |
true | true | — |
type_safety_pass |
true | true | — |
apps_validate_pass |
true | true | — |
local_runability |
1.0 | 1.0 | — |
smoke_tests_pass |
true | false | expected (Playwright removed) |
unit_tests_pass |
true | false | |
appeval_100 |
1.000 | 0.667 | −0.333 |
Build / type-check / validate / runtime layers all preserved. ✅
Unexpected coupling on unit_tests_pass
Generated apps have no vitest test files. The eval framework runs npm test:
- BASE stdout:
No test files found, exiting with code 0 - NEW stdout:
No test files found, exiting with code 1
This PR doesn't directly edit vitest.config.ts, so the change in vitest's no-files behavior is indirect — likely from removing the tests/ dir / script chain, or eval-framework drift between baseline (2026-05-05) and this run (2026-05-06). Worth pinning down before this lands, otherwise every generated app's unit_tests_pass will flip to false.
host_onboarding_checklist (succeeded in NEW, failed in baseline gen) hit appeval_100=1.0, which is a positive signal that the build/runtime path on this branch is fine — but it's not directly comparable.
Next
Kicked off a wider re-run over the full 30-prompt nightly catalog to drown out the per-app gen flakiness: 854667388920187. Will post statistically meaningful deltas (≈10+ clean comparisons) when it terminates (~60–90 min).
Lean
Until the wider data lands and the unit_tests_pass regression is explained, flag-gating Playwright (option 2) looks lower-risk than full removal — keeps it opt-in for the apps that want it while letting you ship the install-size / memory wins.
Summary
playwright.config.tsandtests/smoke.spec.tsfrom the app template@playwright/testandsharpfrom devDependenciestest:e2e,test:e2e:ui, andtest:smokescriptstestscript fromvitest run && npm run test:smoketovitest runcleanscript to remove Playwright-related artifact directoriesTest plan
databricks apps initproduces a working template without Playwright referencesnpm testruns vitest successfully in a generated appThis pull request and its description were written by Isaac.