Stop relying on mockserver Docker healthcheck for TeamCity test readiness#23675
Stop relying on mockserver Docker healthcheck for TeamCity test readiness#23675AAraKKe wants to merge 1 commit into
Conversation
Validation ReportAll 20 validations passed. Show details
|
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 061915d | Docs | Datadog PR Page | Give us feedback! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 061915da28
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| yield instance | ||
| compose_file = COMPOSE_FILE.format('mockserver') | ||
| conditions = [CheckDockerLogs(compose_file, ['started on port: 8111'], attempts=60, wait=2)] | ||
| with docker_run(compose_file, conditions=conditions, wait_for_health=False, sleep=2): |
There was a problem hiding this comment.
Remove unsupported docker_run keyword
When USE_OPENMETRICS is false, this fixture now calls docker_run(..., wait_for_health=False, ...), but the dev helper's signature is docker_run(..., waith_for_health=False, ...) and does not accept wait_for_health (confirmed via inspect.signature(datadog_checks.dev.docker_run)). As a result the legacy/mockserver TeamCity tests raise TypeError: docker_run() got an unexpected keyword argument 'wait_for_health' before starting Docker, so this path is broken instead of just bypassing the healthcheck.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I think this has an outdated version of the API, we already fixed this.
What does this PR do?
In
teamcity/tests/conftest.py, the mockserver branch of thedd_environmentfixture no longer waits on Docker's healthcheck. It now callsdocker_runwithwait_for_health=Falseand aCheckDockerLogscondition that waits for the mockserver boot line (started on port: 8111). The branches for the mockserver and OpenMetrics paths are also split into two independentdocker_runblocks so each path only configures what it actually needs.Motivation
The TeamCity unit/integration job is intermittently flaky on the legacy (mockserver) path. The mockserver image ships a JVM-based HEALTHCHECK (
java -cp /mockserver-netty-jar-with-dependencies.jar org.mockserver.cli.HealthCheck) with a 5s per-probe timeout and 3 retries. Under load on GitHub-hosted runners the JVM cold start can exceed 5s, three timeouts in a row mark the containerunhealthy, anddocker compose up --waitfails before the test even starts. See for example https://github.com/DataDog/integrations-core/actions/runs/25722380777/job/75526507413, where every TeamCity test errors withtenacity.RetryErrororiginating fromcontainer teamcity is unhealthy, even though mockserver itself logsstarted on port: 8111well before the healthcheck gives up.Raising healthcheck timing thresholds would mitigate the symptom but still depends on an opaque, JVM-driven probe shipped by the upstream image. Driving readiness from the application log line is the same pattern the OpenMetrics branch in this fixture already uses (
CheckDockerLogs('teamcity-server', ['TeamCity initialized'], ...)), and it removes the Docker healthcheck from the test critical path entirely.Note on dynamic ports
The host port (
8111) is still hardcoded. Making it dynamic viafind_portswas considered and intentionally deferred: it is not what is failing here (the observed flake is the JVM healthcheck timing out, not a port collision), GitHub-hosted runners are fresh VMs per job so 8111 is effectively never taken, andteamcity/tests/common.pyhardcodes8111inREST_INSTANCEplus roughly a dozen URL-parsing fixtures whose expected values contain the literal port. Plumbing a dynamic port through those assertions is a non-trivial refactor for a problem we have not seen. If port collisions ever do show up, that can be addressed as its own change.Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged