[SDTEST-3211] Use Datadog test suite durations endpoint#49
Conversation
Introduces TestSuiteDurationsClient that calls POST /api/v2/ci/ddtest/test_suite_durations to fetch historical test suite duration percentiles (p50, p90) for optimizing parallel test splitting. Follows the same layered architecture as the existing TestOptimizationClient with interface-based dependency injection for testability. Made-with: Cursor
Fetch backend test suite durations during optimization setup, store them in memory for later use, and keep planning behavior unchanged when the API is empty or errors. Made-with: Cursor
Add debug logging for the raw backend response body when fetching test suite durations to match the visibility we already have for settings. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
E2E Test Report: SUCCESS ✅Tested by: Shepherd Agent (autonomous QA for Datadog Test Optimization) Test Environment
Approach
Results
Methodology
ConclusionThe duration-based weighting flips the bin-packing input from equal weights (1 s default per file) to backend-p50-derived weights, and the resulting split clearly reflects that. Both code paths (no/empty backend data → count fallback; non-empty backend data → p50 weights) were exercised and behave as the PR description specifies. This E2E test was performed by Shepherd - autonomous QA agent for Datadog Test Optimization |
E2E Test Report (Round 2): Subdirectory Execution ✅Tested by: Shepherd Agent Follow-up to the earlier forem report — this round specifically exercises the Test Environment
Approach
Results
File counts flip from
Confirming log lines from the run: ConclusionThe subdirectory execution path called out in the PR description is wired correctly:
Combined with the earlier forem run, both the repo-root and subdirectory execution modes are validated end-to-end. This E2E test was performed by Shepherd - autonomous QA agent for Datadog Test Optimization |
E2E Test Report (Round 3): ITR Skip + Duration Weighting ✅Tested by: Shepherd Agent Third round, focused on the interaction between ITR test skipping and the new p50-based weighting in
Test Environment
SetupCustom mockdog scenario
Results — Predicted vs ActualAll hand-computed predictions hit:
Resulting Split
Sanity-grepped ConclusionBoth PR-49 invariants for the
Combined with rounds 1 (forem repo-root, no skips) and 2 (spree from
This E2E test was performed by Shepherd - autonomous QA agent for Datadog Test Optimization |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00af6a60c0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| apiKey := os.Getenv(constants.APIKeyEnvironmentVariable) | ||
| if apiKey == "" { | ||
| slog.Error("An API key is required for agentless mode. Use the DD_API_KEY env variable to set it") | ||
| return nil |
There was a problem hiding this comment.
Handle missing agentless API key without panicking
When DD_CIVISIBILITY_AGENTLESS_ENABLED=true but DD_API_KEY is absent, this branch returns a nil *DatadogDurationsAPI. NewDurationsClient stores that typed nil in the DurationsAPI interface, so PrepareTestOptimization later calls c.api.FetchTestSuiteDurations and panics instead of simply falling back without durations. Please return an error/no-op client or guard this typed-nil case before fetching durations.
Useful? React with 👍 / 👎.
Made-with: Cursor
E2E Test Report (Round 4): Production EU End-to-End ✅Tested by: Shepherd Agent Fourth round, this time against real production EU backend (not mockdog) — confirming the feature works end-to-end in production and quantifying the real-world wall-time savings. Test Environment
Five-Run ProgressionThe same
Net improvement run #1 → run #5: 8:46 → 5:27 = 3:19 saved (38% faster) using 3 fewer parallel workers. Key Observations1. Backend coverage compounds quickly. 2. Empty / partial responses are correctly non-fatal. 3. The 5 packed runners in run #3 finished within 4 seconds of each other (3:51 → 3:55). 4. The bin-packer correctly unisolates the heavy file when bin capacity grows. 5. Optimal max-parallelism is now an observable property, not a guess. Per-Run Worker Distribution (Run #5)Total wall time bound: 5:27. Spread across all 5 runners: 7 seconds. Production-Side Verification
ConclusionPR-49 works end-to-end against production EU. Beyond the feature merely functioning, the real-data run produces concretely better outcomes than count-based scheduling: tighter packing across non-outlier runners, lower total wall time, and the ability to right-size This E2E test was performed by Shepherd - autonomous QA agent for Datadog Test Optimization |
Summary
Adds support for the test suite durations backend API and uses that data to improve ddtest planning.
POST /api/v2/ci/ddtest/test_suite_durations, including pagination and in-memory storage.Testing
Automated validation:
make testmake lintE2E validation should cover: