feat(playwright): add max-parallel and customizable timeouts, add docker logs on Grafana startup failure#711
Conversation
|
Oh wow now it's failing with only 3 in the matrix |
xnyo
left a comment
There was a problem hiding this comment.
Great addition! However, I noticed the latest version of wait-for-grafana (v1.0.3) changed the way timeouts are handled and added a new input startupTimeout that's specific for the Grafana startup, separate from timeout: grafana/plugin-actions#213
Maybe we can take this opportunity to:
- Bump
wait-for-grafanatov1.0.3 - Bind the new
playwright-grafana-startup-timeoutto the newsetupTimeoutinput - Introduce also
playwright-grafana-timeoutwhich binds to the existingtimeoutinput (defaulting to 60)
WDYT?
@L2D2Grafana Hmm yeah this is unusual 🤔 . I don't think it's related to the number of jobs in the matrix since each job should run into its own VM. Maybe something is preventing Grafana from starting at all. Can you try pinning the workflow to this branch Note: If logs-drilldown requires some secrets in Vault the workflow will fail (non-main branches and non-release tags do not have access to Vault for security reasons). If this is the case, I can help testing in one of the testing repos for plugin-ci-workflows, which can access Vault from any branch (including PRs) |
|
Having debug logs for playwrights docker container startup allowed me to dig deeper into the issue, Grafana is crashing. Grafana 13.x added the grafana-apiserver advisor check-type bootstrap, which now competes with the provisioning subsystem at exactly the worst time. https://github.com/grafana/logs-drilldown/actions/runs/25501547107/job/74836196412?pr=1883 🤖 So Grafana isn't slow — it's crashing during startup. That's why we see 60 s of 000 (TCP refused): the process is exiting before binding :3000, and wait-for-grafana polls a port that never opens. What's actually happening Legacy provisioning (logger=provisioning.datasources) is inserting the 5 provisioned datasources from provisioning/datasources/default.yaml: Why this only bites on contended runners |
|
This issue might be fixed already in Grafan 13.0.2 by grafana/grafana#123034. A workaround seems to be disabling the Advisor app |
xnyo
left a comment
There was a problem hiding this comment.
@L2D2Grafana Great catch on the investigation! I think this is a great addition for the customizable timeouts and failure logs, so I also agree this is worth merging imo 👍 . Before approving, I suggest only renaming the PR (it's used for the changelog) to be a bit more descriptive, something like:
feat(playwright): add max-parallel and customizable timeouts, add docker logs on Grafana startup failure
Updated, ty! |
|
FYI this is affecting other teams https://raintank-corp.slack.com/archives/C08QSAXQBCZ/p1778865730276499 and #721 |
Description ✨
Attempt to fix playwright docker containers failing to startup, by allowing users to set a max-concurrency limit or add a longer grafana-startup-timeout.
Drilldown apps and Logs Drilldown in particular are experiencing playwright docker containers that are failing to startup in under 60s. 7 matrix tests are running concurrently since we are obligated to work with grafana 11.6 and I believe this is causing the failure.

Summary 📝
playwright-grafana-startup-timeoutincd.ymlandci.yml, then forwarding it asgrafana-startup-timeoutintoplaywright.yml.playwright-max-parallelincd.ymlandci.yml, then forwarding it asmax-parallelintoplaywright.yml.playwright.ymlto applystrategy.max-parallel: ${{ inputs.max-parallel }}with a default of256, preserving existing behavior unless explicitly overridden.wait-for-grafanato printdocker compose ps -aand recentdocker compose logs, making startup/concurrency flakes actionable in CI logs.Test 🧪
make actionlintsuccessfully.