Speedup lpad detect_lostruns
#554
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
detect_lostrunstakes 25 min to run on our currentLaunchPadand looks highly optimizableRUNNINGfireworks and make one query per FW to check if launches areFIZZLED/COMPLETEDExample: For 10k RUNNING jobs and 1k lost runs, this requires 11,000+ DB queries.
Mongo has had batch queries and aggregation pipelines since 3.x to do this faster: (1) Batch fetch all relevant FireWorks with
find({"fw_id": {"$in": fw_ids}}), (2) Collect all launch IDs then batch fetch their states in one query, and (3) Use a MongoDB aggregation pipeline with$lookupto find inconsistent FireWorks server-side instead of N queries client-side. This reduces the operation from ~11,000+ queries to order 10 batch queries plus 1 aggregation, which should give a big speedup and is backwards-compatible