Skip to content

Stop relying on mockserver Docker healthcheck for TeamCity test readiness#23675

Open
AAraKKe wants to merge 1 commit into
masterfrom
aarakke/teamcity-test-readiness
Open

Stop relying on mockserver Docker healthcheck for TeamCity test readiness#23675
AAraKKe wants to merge 1 commit into
masterfrom
aarakke/teamcity-test-readiness

Conversation

@AAraKKe
Copy link
Copy Markdown
Contributor

@AAraKKe AAraKKe commented May 12, 2026

What does this PR do?

In teamcity/tests/conftest.py, the mockserver branch of the dd_environment fixture no longer waits on Docker's healthcheck. It now calls docker_run with wait_for_health=False and a CheckDockerLogs condition that waits for the mockserver boot line (started on port: 8111). The branches for the mockserver and OpenMetrics paths are also split into two independent docker_run blocks so each path only configures what it actually needs.

Motivation

The TeamCity unit/integration job is intermittently flaky on the legacy (mockserver) path. The mockserver image ships a JVM-based HEALTHCHECK (java -cp /mockserver-netty-jar-with-dependencies.jar org.mockserver.cli.HealthCheck) with a 5s per-probe timeout and 3 retries. Under load on GitHub-hosted runners the JVM cold start can exceed 5s, three timeouts in a row mark the container unhealthy, and docker compose up --wait fails before the test even starts. See for example https://github.com/DataDog/integrations-core/actions/runs/25722380777/job/75526507413, where every TeamCity test errors with tenacity.RetryError originating from container teamcity is unhealthy, even though mockserver itself logs started on port: 8111 well before the healthcheck gives up.

Raising healthcheck timing thresholds would mitigate the symptom but still depends on an opaque, JVM-driven probe shipped by the upstream image. Driving readiness from the application log line is the same pattern the OpenMetrics branch in this fixture already uses (CheckDockerLogs('teamcity-server', ['TeamCity initialized'], ...)), and it removes the Docker healthcheck from the test critical path entirely.

Note on dynamic ports

The host port (8111) is still hardcoded. Making it dynamic via find_ports was considered and intentionally deferred: it is not what is failing here (the observed flake is the JVM healthcheck timing out, not a port collision), GitHub-hosted runners are fresh VMs per job so 8111 is effectively never taken, and teamcity/tests/common.py hardcodes 8111 in REST_INSTANCE plus roughly a dozen URL-parsing fixtures whose expected values contain the literal port. Plumbing a dynamic port through those assertions is a non-trivial refactor for a problem we have not seen. If port collisions ever do show up, that can be addressed as its own change.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@AAraKKe AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label May 12, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 12, 2026

Validation Report

All 20 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.02%. Comparing base (0b1ed8f) to head (061915d).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 Bot commented May 12, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 57.14%
Overall Coverage: 87.21% (-0.01%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 061915d | Docs | Datadog PR Page | Give us feedback!

@AAraKKe AAraKKe marked this pull request as ready for review May 12, 2026 11:00
@AAraKKe AAraKKe requested a review from a team as a code owner May 12, 2026 11:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 061915da28

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

yield instance
compose_file = COMPOSE_FILE.format('mockserver')
conditions = [CheckDockerLogs(compose_file, ['started on port: 8111'], attempts=60, wait=2)]
with docker_run(compose_file, conditions=conditions, wait_for_health=False, sleep=2):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unsupported docker_run keyword

When USE_OPENMETRICS is false, this fixture now calls docker_run(..., wait_for_health=False, ...), but the dev helper's signature is docker_run(..., waith_for_health=False, ...) and does not accept wait_for_health (confirmed via inspect.signature(datadog_checks.dev.docker_run)). As a result the legacy/mockserver TeamCity tests raise TypeError: docker_run() got an unexpected keyword argument 'wait_for_health' before starting Docker, so this path is broken instead of just bypassing the healthcheck.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has an outdated version of the API, we already fixed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration/teamcity qa/skip-qa Automatically skip this PR for the next QA team/agent-integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant