⚡ Bolt: Optimize linear lookups in file extension checks#354
⚡ Bolt: Optimize linear lookups in file extension checks#354bashandbone wants to merge 1 commit into
Conversation
Replaced generator expressions inside `next()` with standard `for` loops and early returns for the `is_doc` and `is_data` properties in `src/codeweaver/core/metadata.py`. This eliminates generator frame allocation overhead and speeds up linear file extension checks, adding a measurable performance improvement for frequent property access. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's guide (collapsed on small PRs)Reviewer's GuideOptimizes FileMetadata extension classification by replacing generator-based lookups with explicit for-loop early returns in hot paths, and documents this pattern in the Bolt optimization guide. Flow diagram for optimized FileMetadata extension lookups (is_doc / is_data)flowchart TD
A[Start is_doc / is_data] --> B[Import DOC_FILES_EXTENSIONS or DATA_FILES_EXTENSIONS]
B --> C[for ext in DOC_FILES_EXTENSIONS / DATA_FILES_EXTENSIONS]
C --> D{ext.ext == self.ext}
D -- Yes --> E[return True]
D -- No --> F{More extensions?}
F -- Yes --> C
F -- No --> G[return False]
E --> H[End]
G --> H[End]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
🤖 Hi @bashandbone, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Given that
DOC_FILES_EXTENSIONSandDATA_FILES_EXTENSIONSappear to be static collections, consider using a precomputed set or mapping ofextvalues for O(1) membership checks instead of repeated linear scans, which will likely outperform the micro-optimization from removing the generator. - The
is_docandis_dataimplementations are now almost identical; consider extracting a shared helper (e.g.,_has_ext(self.ext, EXT_COLLECTION)) to avoid duplication and keep future changes to the lookup logic in one place. - If you keep the
# noqa: SIM110suppression, it may be worth tightening the comment to explicitly reference this being a measured hot path (or add a short note about the specific context) so future readers understand why the usual readability preference is intentionally bypassed here.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Given that `DOC_FILES_EXTENSIONS` and `DATA_FILES_EXTENSIONS` appear to be static collections, consider using a precomputed set or mapping of `ext` values for O(1) membership checks instead of repeated linear scans, which will likely outperform the micro-optimization from removing the generator.
- The `is_doc` and `is_data` implementations are now almost identical; consider extracting a shared helper (e.g., `_has_ext(self.ext, EXT_COLLECTION)`) to avoid duplication and keep future changes to the lookup logic in one place.
- If you keep the `# noqa: SIM110` suppression, it may be worth tightening the comment to explicitly reference this being a measured hot path (or add a short note about the specific context) so future readers understand why the usual readability preference is intentionally bypassed here.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
🤖 I'm sorry @bashandbone, but I was unable to process your request. Please see the logs for more details. |
There was a problem hiding this comment.
Pull request overview
Optimizes frequent file-extension category checks in ExtLangPair by replacing next((...)) generator expressions with explicit for loops and early returns, reducing generator allocation overhead during linear lookups.
Changes:
- Rewrote
ExtLangPair.is_docandExtLangPair.is_datato use explicit loops with early returns (and added# noqa: SIM110to preserve this pattern). - Added a Bolt learning entry documenting the “for-loop over generator/next” optimization pattern for hot paths.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/codeweaver/core/metadata.py |
Replaces generator/next()-based linear lookups with early-return loops for is_doc/is_data. |
.jules/bolt.md |
Documents the optimization rationale/pattern for future reference. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
💡 What: Optimized the
is_docandis_dataproperty getters insrc/codeweaver/core/metadata.pyby replacingnext((...))generator expressions with standardforloops that use early returns. Suppressed theruffruleSIM110with# noqato ensure the manual loop is preserved over anany()generator.🎯 Why: In CPython, evaluating generator comprehensions inside functions like
next()orany()involves allocating a generator object and evaluating its frames. When performing a simple linear search to find a matching extension, manually writing out aforloop that returns early avoids this overhead completely. These properties are accessed frequently during file discovery, making this a small but compounding speedup.📊 Impact: Reduces evaluation overhead in file metadata extension lookups by entirely bypassing generator frame allocations, yielding faster linear lookups.
🔬 Measurement: The optimization can be verified by profiling calls to
FileMetadata.is_docandFileMetadata.is_dataover a large list of files compared to their previous implementations, seeing a drop in generator allocation overhead.PR created automatically by Jules for task 7415917703769671111 started by @bashandbone
Summary by Sourcery
Optimize file metadata extension classification checks and document the performance guideline in the optimization playbook.
Enhancements:
Documentation: