Skip to content

perf(worker): Optimize flake expiration updates in test analytics#871

Open
sentry[bot] wants to merge 1 commit into
mainfrom
seer/perf/bulk-update-flakes-ctHwcR
Open

perf(worker): Optimize flake expiration updates in test analytics#871
sentry[bot] wants to merge 1 commit into
mainfrom
seer/perf/bulk-update-flakes-ctHwcR

Conversation

@sentry
Copy link
Copy Markdown
Contributor

@sentry sentry Bot commented Apr 20, 2026

Fixes WORKER-Y94. The issue was that: Individual flake.save() calls within handle_pass for expiring flakes cause an N+1 query pattern.

  • Modified handle_pass to return expired Flake objects instead of saving and deleting them immediately.
  • Updated process_single_upload to collect expired Flake objects.
  • Implemented bulk updating of expired Flake objects at the end of process_flakes_for_commit to reduce individual database writes.

This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13571244

Not quite right? Click here to continue debugging with Seer.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.


Note

Medium Risk
Changes flake expiration persistence from per-flake save() to deferred bulk_update, which is lower DB load but could affect correctness if expired flakes aren’t persisted as expected or if concurrent processing assumes immediate saves.

Overview
Reduces N+1 database writes when expiring flaky tests by deferring persistence: handle_pass now returns an expired Flake (instead of saving/deleting immediately), process_single_upload collects these expirations, and process_flakes_for_commit performs a single bulk_update for all expired flakes after the existing bulk_create upsert of active flakes.

Reviewed by Cursor Bugbot for commit 13cdd51. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 13cdd51. Configure here.

Flake.objects.bulk_update(
all_expired_flakes,
["end_date", "count", "recent_passes_count"],
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expired flakes bulk_update missing fail_count field

High Severity

The bulk_update for expired flakes uses fields ["end_date", "count", "recent_passes_count"] but omits fail_count, which the bulk_create for non-expired flakes correctly includes. If handle_failure increments fail_count on a flake that later expires via handle_pass (e.g., across multiple uploads in the same commit), that fail_count change is silently lost. The original flake.save() persisted all fields.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 13cdd51. Configure here.

Flake.objects.bulk_update(
all_expired_flakes,
["end_date", "count", "recent_passes_count"],
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newly created flake expiring crashes bulk_update

Medium Severity

If a new Flake is created in handle_failure (pk=None) and later expires via handle_pass within the same commit processing, it ends up in all_expired_flakes without a primary key. Django's bulk_update raises a ValueError ("All bulk_update() objects must have a primary key set") for objects with pk=None, crashing the entire processing run. The original flake.save() handled this correctly by performing an INSERT.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 13cdd51. Configure here.

Comment on lines +154 to +157
Flake.objects.bulk_update(
all_expired_flakes,
["end_date", "count", "recent_passes_count"],
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The bulk_update for expired flakes omits the fail_count field, causing failure count increments to be lost if a flake expires in the same processing run.
Severity: HIGH

Suggested Fix

Add the fail_count field to the list of fields being updated in the Flake.objects.bulk_update call for expired flakes. The updated list should be ["end_date", "count", "recent_passes_count", "fail_count"].

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: apps/worker/services/test_analytics/ta_process_flakes.py#L154-L157

Potential issue: When a test flake experiences a failure and then subsequently expires
(due to 30 consecutive passes) within the same `process_flakes_for_commit` execution,
the incremented `fail_count` is not persisted to the database. This is because the
`bulk_update` operation for expired flakes does not include `fail_count` in its list of
fields to update. This results in data loss, leading to inaccurate flake metrics for
flakes that have both failures and then expire in the same processing window.

Did we get this right? 👍 / 👎 to inform future reviews.

@codecov-notifications
Copy link
Copy Markdown

codecov-notifications Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...orker/services/test_analytics/ta_process_flakes.py 93.33% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@sentry
Copy link
Copy Markdown
Contributor Author

sentry Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.25%. Comparing base (0ad8a0c) to head (13cdd51).
⚠️ Report is 32 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...orker/services/test_analytics/ta_process_flakes.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #871   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files        1307     1307           
  Lines       48017    48026    +9     
  Branches     1636     1636           
=======================================
+ Hits        44299    44308    +9     
  Misses       3407     3407           
  Partials      311      311           
Flag Coverage Δ
workerintegration 58.51% <6.66%> (-0.04%) ⬇️
workerunit 90.39% <93.33%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants