Skip to content

[SDTEST-3759] Use cached weights for CI node subsplits#58

Merged
anmarchenko merged 2 commits into
mainfrom
anmarchenko/weighted-ci-node-subsplit
May 13, 2026
Merged

[SDTEST-3759] Use cached weights for CI node subsplits#58
anmarchenko merged 2 commits into
mainfrom
anmarchenko/weighted-ci-node-subsplit

Conversation

@anmarchenko
Copy link
Copy Markdown
Member

@anmarchenko anmarchenko commented May 12, 2026

What

Reuse the weighted test file distribution for CI-node worker subsplitting. Planning now stores suite duration weighting data in .testoptimization/cache/test_suite_durations.json, and ddtest run restores it before splitting a CI node across local workers.

Why

CI-node subsplitting previously used round-robin assignment, so local workers could be uneven even though the node-level split was weighted. Reusing the same weighted distribution keeps local worker assignments aligned with the existing Test Impact Analysis duration estimates.

E2E testing

  1. Run ddtest plan in a repo with multiple test files and suite duration data.
  2. Inspect .testoptimization/cache/test_suite_durations.json and confirm it contains testSuiteDurations, suiteAggregates, suitesBySourceFile, and testFileWeights.
  3. Confirm testFileWeights includes the expected runnable test files with positive duration-derived weights, and that existing cache fields are still present.
  4. Inspect .testoptimization/tests-split/runner-* and confirm the initial node split uses the cached test file weights.
  5. Run ddtest run --ci-node=<node> --ci-node-workers=2 for one of the generated CI nodes.
  6. Confirm the run restores the cache, logs the restored object counts, and splits the selected CI node across local workers using cached weights rather than round-robin order. Unknown files should fall back to the default weight.

@anmarchenko anmarchenko changed the title [codex] Use cached weights for CI node subsplits [SDTEST-3759] Use cached weights for CI node subsplits May 12, 2026
@anmarchenko anmarchenko marked this pull request as ready for review May 13, 2026 08:15
@anmarchenko anmarchenko requested a review from a team as a code owner May 13, 2026 08:15
Copy link
Copy Markdown
Member Author

E2E Test Report: SUCCESS

Tested by: Shepherd Agent (autonomous QA for Datadog Test Optimization)

Test Environment

  • Method: Local Shepherd testing with spree playground and mockdog scenario ddtest-suite-durations-spree-core-subdir
  • Revision tested: 1fabd410d5b9f081c318841b82fc8490dbc316e5
  • Branch tested: anmarchenko/weighted-ci-node-subsplit
  • Feature area: ddtest test parallelization / cached test suite duration weights for CI-node worker subsplitting

Results

Check Status Evidence
ddtest plan fetched suite durations PASS Fetched test suite durations ... modulesCount=1 testSuitesCount=6
Plan stored .testoptimization/cache/test_suite_durations.json PASS Cache contains testSuiteDurations, suiteAggregates, suitesBySourceFile, and testFileWeights
Cache preserved duration-derived file weights PASS products_helper_spec.rb=5000, base_helper_spec.rb=3000, four helper specs at 100
Initial split used weighted distribution PASS With 4 runners, products_helper_spec.rb and base_helper_spec.rb were isolated while the four 100ms specs were paired
ddtest run restored cached weights before CI-node execution PASS Restored test suite durations cache objectsCount=24 ... testFileWeightsCount=6
CI-node worker subsplit used cached weights instead of round-robin PASS For ciNode=1, ciNodeWorkers=2, worker 0 received only base_helper_spec.rb; worker 1 received the four 100ms files
Test execution completed and emitted telemetry to mockdog PASS Mockdog report: 82 events, 73 tests, 5 suites, 2 modules, 2 sessions; telemetry valid

Evidence Details

Plan-only command:

./bin/crook run spree -c ddtest-plan-core \
  --scenario ddtest-suite-durations-spree-core-subdir \
  --dep ddtest=anmarchenko/weighted-ci-node-subsplit \
  --debug \
  -e 'DD_TEST_OPTIMIZATION_RUNNER_TESTS_LOCATION=spec/helpers/*_spec.rb'

Run-side CI-node command:

./bin/crook run spree -c ddtest-core \
  --scenario ddtest-suite-durations-spree-core-subdir \
  --dep ddtest=anmarchenko/weighted-ci-node-subsplit \
  --debug \
  -e 'DD_TEST_OPTIMIZATION_RUNNER_TESTS_LOCATION=spec/helpers/*_spec.rb' \
  -e DD_TEST_OPTIMIZATION_RUNNER_MAX_PARALLELISM=2 \
  -e DD_TEST_OPTIMIZATION_RUNNER_CI_NODE=1 \
  -e DD_TEST_OPTIMIZATION_RUNNER_CI_NODE_WORKERS=2

Key log evidence:

INFO Fetched test suite durations ... modulesCount=1 testSuitesCount=6
DEBUG Test suite durations written to file path=.testoptimization/cache/test_suite_durations.json
INFO Test execution planning completed parallelRunners=2 testFilesCount=6
INFO Restored test suite durations cache objectsCount=24 modulesCount=1 testSuitesCount=6 suiteAggregatesCount=6 suitesBySourceFileCount=6 testFileWeightsCount=6
INFO Running tests for CI node in parallel mode ciNode=1 ciNodeWorkers=2 testFilesCount=5
DEBUG Assigned test files to CI node worker ciNode=1 workerIndex=0 testFiles=[spec/helpers/base_helper_spec.rb]
DEBUG Assigned test files to CI node worker ciNode=1 workerIndex=1 testFiles="[spec/helpers/currency_helper_spec.rb spec/helpers/images_helper_spec.rb spec/helpers/locale_helper_spec.rb spec/helpers/shipment_helper_spec.rb]"
CI Test Cycle: 82 events (73 tests, 5 suites, 2 modules, 2 sessions) gzipped
Telemetry: valid, 2 sessions, 73 tests, 9 payloads

Issues Found

None.

Verification

Datadog UI verification was not applicable because this was a local mockdog-targeted e2e run. Backend submission behavior was verified through mockdog reports and ddtest debug logs.


This E2E test was performed by Shepherd - autonomous QA agent for Datadog Test Optimization.

@anmarchenko anmarchenko merged commit f9de0b8 into main May 13, 2026
3 checks passed
@anmarchenko anmarchenko deleted the anmarchenko/weighted-ci-node-subsplit branch May 13, 2026 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants