⚡ Bolt: [performance improvement] Optimize dynamic SQL generation for D1#243
⚡ Bolt: [performance improvement] Optimize dynamic SQL generation for D1#243bashandbone wants to merge 1 commit into
Conversation
* Replaced intermediate `Vec` allocations and `format!` calls with a single `String::with_capacity` buffer and `write!` macro usage in `build_upsert_stmt` and `build_delete_stmt` in `crates/flow/src/targets/d1.rs`. * Added performance learning to `.jules/bolt.md`. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideOptimizes dynamic SQL generation for Cloudflare D1 by replacing intermediate Vec and format! usage with preallocated String buffers and write! calls, makes some rule-engine APIs take non-lifetime-tied references, and applies a few minor formatting/clarity cleanups across the codebase and tests. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
build_upsert_stmt, the newON CONFLICT DO UPDATEbuilder iterates overvalues.fields(and unconditionally usesvalue_fields_schemafor each index), whereas the insertcolumns/placeholdersare only populated whenvalueisSome; this can cause the update clause to diverge from the set of actually inserted columns—consider keeping the update loop aligned with the sameSome(value)filter used when pushing columns/placeholders. - The new SQL builders rely on multiple
write!(...).unwrap()calls, which will now panic on any formatting error; if you want to preserve the previous behavior offormat!(which didn’t panic on formatting), consider either propagating thesefmt::Results or explicitly ignoring them (e.g.,let _ = write!(...)) to avoid introducing new panic paths.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `build_upsert_stmt`, the new `ON CONFLICT DO UPDATE` builder iterates over `values.fields` (and unconditionally uses `value_fields_schema` for each index), whereas the insert `columns`/`placeholders` are only populated when `value` is `Some`; this can cause the update clause to diverge from the set of actually inserted columns—consider keeping the update loop aligned with the same `Some(value)` filter used when pushing columns/placeholders.
- The new SQL builders rely on multiple `write!(...).unwrap()` calls, which will now panic on any formatting error; if you want to preserve the previous behavior of `format!` (which didn’t panic on formatting), consider either propagating these `fmt::Result`s or explicitly ignoring them (e.g., `let _ = write!(...)`) to avoid introducing new panic paths.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
Replaces intermediate Vec/format!/join patterns in D1 dynamic SQL builders (build_upsert_stmt, build_delete_stmt) with pre-sized String::with_capacity + write! to reduce heap churn. Also includes incidental rustfmt-style reflows in a few unrelated files and a learnings note in .jules/bolt.md.
Changes:
- Rewrote D1 upsert/delete SQL construction to use a single pre-allocated buffer with
std::fmt::Write. - Minor formatting reflows in
referent_rule.rs,rule/mod.rs,check_var.rs,tree_sitter/mod.rs, and removed a trivialassert!(true, ...)ind1_cache_integration.rs. - Added a performance-learning entry to
.jules/bolt.md.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/flow/src/targets/d1.rs | Main perf change: replaces Vec + format!/join with pre-sized String + write! in build_upsert_stmt and build_delete_stmt; adds std::fmt::Write import. |
| crates/flow/tests/d1_cache_integration.rs | Removes a no-op assert!(true, ...) from a construction-only test. |
| crates/rule-engine/src/rule/referent_rule.rs | Cosmetic single-line reflow of read(). |
| crates/rule-engine/src/rule/mod.rs | Cosmetic multi-line reflow of defined_vars chain. |
| crates/rule-engine/src/check_var.rs | Drops unnecessary 'r lifetime annotations on references that don't need to share the lifetime. |
| crates/ast-engine/src/tree_sitter/mod.rs | Cosmetic reflow of from_utf8 fallback and a test assert_eq!. |
| .jules/bolt.md | Adds a performance-learning note documenting the SQL-builder optimization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| **Learning:** During DAG traversals, creating owned variants of identifiers (like `file.to_path_buf()`) *before* checking `visited` HashSets results in heap allocations (O(E)) for every edge instead of every visited node (O(V)). By moving the `&PathBuf` allocation strictly *after* all HashSet `contains` checks using the borrowed reference (`&Path`), we drastically reduce memory churn. | ||
| **Action:** Always check `HashSet::contains` with a borrowed reference *before* creating the owned version required by `HashSet::insert`, especially in performance-critical graph traversal paths. | ||
|
|
||
| ## 2024-05-18 - [Performance: Dynamic SQL Generation for Cloudflare D1] |
| let mut sql = String::with_capacity( | ||
| 14 + self.table_name.len() | ||
| + 7 | ||
| + self | ||
| .key_fields_schema | ||
| .iter() | ||
| .map(|f| f.name.len() + 5) | ||
| .sum::<usize>(), |
💡 What:
Replaced intermediate
Vecallocations andformat!calls inside loops with a pre-allocatedString::with_capacitybuffer and sequentialwrite!(viastd::fmt::Write) macro calls inbuild_upsert_stmtandbuild_delete_stmtfor Cloudflare D1 targets.🎯 Why:
During dynamic SQL generation in
crates/flow/src/targets/d1.rs, creating intermediate Vectors (e.g.,update_clauses,where_clauses) and running.join(", ")alongside multiple nested string interpolations (format!) triggered unnecessary heap allocations and expensive memory copying (churn). This overhead scaled linearly with the number of columns updated or queried.📊 Impact:
Reduces heap allocation frequency during SQL construction in D1 database targets. Improves memory efficiency and slightly speeds up D1 mutation requests by eliminating the
Vecallocation overhead, making performance scale smoothly as queries grow.🔬 Measurement:
Verified functionality by running all related D1 target tests (
cargo test -p thread-flow --test d1_target_tests --test d1_minimal_tests --test d1_cache_integration), which passed successfully. Ran benchmarks viacargo bench -p thread-flow d1_profilingto ensure no regressions occurred.PR created automatically by Jules for task 12039807803892862716 started by @bashandbone
Summary by Sourcery
Optimize dynamic SQL generation for Cloudflare D1 targets and apply minor code cleanups and style adjustments across rule-engine and AST modules.
Enhancements:
Documentation:
Tests: