perf(worker): Optimize flake expiration updates in test analytics#871
perf(worker): Optimize flake expiration updates in test analytics#871sentry[bot] wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 13cdd51. Configure here.
| Flake.objects.bulk_update( | ||
| all_expired_flakes, | ||
| ["end_date", "count", "recent_passes_count"], | ||
| ) |
There was a problem hiding this comment.
Expired flakes bulk_update missing fail_count field
High Severity
The bulk_update for expired flakes uses fields ["end_date", "count", "recent_passes_count"] but omits fail_count, which the bulk_create for non-expired flakes correctly includes. If handle_failure increments fail_count on a flake that later expires via handle_pass (e.g., across multiple uploads in the same commit), that fail_count change is silently lost. The original flake.save() persisted all fields.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 13cdd51. Configure here.
| Flake.objects.bulk_update( | ||
| all_expired_flakes, | ||
| ["end_date", "count", "recent_passes_count"], | ||
| ) |
There was a problem hiding this comment.
Newly created flake expiring crashes bulk_update
Medium Severity
If a new Flake is created in handle_failure (pk=None) and later expires via handle_pass within the same commit processing, it ends up in all_expired_flakes without a primary key. Django's bulk_update raises a ValueError ("All bulk_update() objects must have a primary key set") for objects with pk=None, crashing the entire processing run. The original flake.save() handled this correctly by performing an INSERT.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 13cdd51. Configure here.
| Flake.objects.bulk_update( | ||
| all_expired_flakes, | ||
| ["end_date", "count", "recent_passes_count"], | ||
| ) |
There was a problem hiding this comment.
Bug: The bulk_update for expired flakes omits the fail_count field, causing failure count increments to be lost if a flake expires in the same processing run.
Severity: HIGH
Suggested Fix
Add the fail_count field to the list of fields being updated in the Flake.objects.bulk_update call for expired flakes. The updated list should be ["end_date", "count", "recent_passes_count", "fail_count"].
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: apps/worker/services/test_analytics/ta_process_flakes.py#L154-L157
Potential issue: When a test flake experiences a failure and then subsequently expires
(due to 30 consecutive passes) within the same `process_flakes_for_commit` execution,
the incremented `fail_count` is not persisted to the database. This is because the
`bulk_update` operation for expired flakes does not include `fail_count` in its list of
fields to update. This results in data loss, leading to inaccurate flake metrics for
flakes that have both failures and then expire in the same processing window.
Did we get this right? 👍 / 👎 to inform future reviews.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #871 +/- ##
=======================================
Coverage 92.25% 92.25%
=======================================
Files 1307 1307
Lines 48017 48026 +9
Branches 1636 1636
=======================================
+ Hits 44299 44308 +9
Misses 3407 3407
Partials 311 311
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |


Fixes WORKER-Y94. The issue was that: Individual
flake.save()calls withinhandle_passfor expiring flakes cause an N+1 query pattern.handle_passto return expiredFlakeobjects instead of saving and deleting them immediately.process_single_uploadto collect expiredFlakeobjects.Flakeobjects at the end ofprocess_flakes_for_committo reduce individual database writes.This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13571244
Not quite right? Click here to continue debugging with Seer.
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
Note
Medium Risk
Changes flake expiration persistence from per-flake
save()to deferredbulk_update, which is lower DB load but could affect correctness if expired flakes aren’t persisted as expected or if concurrent processing assumes immediate saves.Overview
Reduces N+1 database writes when expiring flaky tests by deferring persistence:
handle_passnow returns an expiredFlake(instead of saving/deleting immediately),process_single_uploadcollects these expirations, andprocess_flakes_for_commitperforms a singlebulk_updatefor all expired flakes after the existingbulk_createupsert of active flakes.Reviewed by Cursor Bugbot for commit 13cdd51. Bugbot is set up for automated code reviews on this repo. Configure here.