Skip to content

[ddev] Retry agent check on transient failure to fix SNMP E2E flake#23646

Open
Kyle-Neale wants to merge 1 commit into
masterfrom
kyle.neale/ddev-agent-readiness-gate
Open

[ddev] Retry agent check on transient failure to fix SNMP E2E flake#23646
Kyle-Neale wants to merge 1 commit into
masterfrom
kyle.neale/ddev-agent-readiness-gate

Conversation

@Kyle-Neale
Copy link
Copy Markdown
Contributor

@Kyle-Neale Kyle-Neale commented May 8, 2026

What does this PR do?

Wraps agent check invocations in ddev env agent with a bounded retry (3 attempts, 0.5s backoff).

Motivation

Mitigates the SNMP master.yml E2E flake (no valid check found) — a brief race in the existing E2E config-swap path where autodiscovery can deregister the check between the swap and the immediate agent check call. ~44% of recent master SNMP runs hit this; >99% of tests still pass within failing runs.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it
    will automatically open a backport PR once this one is merged

@dd-octo-sts dd-octo-sts Bot added the ddev label May 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 52.94118% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.85%. Comparing base (996b3d5) to head (bd4527c).
⚠️ Report is 13 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-datadog-prod-us1
Copy link
Copy Markdown
Contributor

datadog-datadog-prod-us1 Bot commented May 8, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 52.94%
Overall Coverage: 87.36%

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: bd4527c | Docs | Datadog PR Page | Give us feedback!

@Kyle-Neale Kyle-Neale force-pushed the kyle.neale/ddev-agent-readiness-gate branch 2 times, most recently from 0102e1e to ee648c2 Compare May 11, 2026 13:58
@Kyle-Neale Kyle-Neale changed the title [ddev] Wait for Agent cmd-server before returning from start() [ddev] Retry agent check on transient failure to fix SNMP E2E flake May 11, 2026
@Kyle-Neale Kyle-Neale force-pushed the kyle.neale/ddev-agent-readiness-gate branch 3 times, most recently from c3d50af to f277b75 Compare May 11, 2026 19:03
@Kyle-Neale Kyle-Neale force-pushed the kyle.neale/ddev-agent-readiness-gate branch from f277b75 to bd4527c Compare May 11, 2026 20:02
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 11, 2026

Validation Report

All 20 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@Kyle-Neale Kyle-Neale marked this pull request as ready for review May 12, 2026 13:34
@Kyle-Neale Kyle-Neale requested a review from a team as a code owner May 12, 2026 13:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd4527ce4b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@@ -0,0 +1 @@
Retry agent check invocations on transient failures to address SNMP E2E flake from autodiscovery reload races No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge End the changelog entry with a period

The repository instructions in AGENTS.md say changelog entries should be a single line that ends with a period. This new entry currently has no trailing period, so it violates the documented changelog format and should be updated before merging.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant