Skip to content

Add code size tracking workflow and scripts#8113

Open
joseph-isaacs wants to merge 5 commits into
developfrom
claude/quirky-mccarthy-1MC1W
Open

Add code size tracking workflow and scripts#8113
joseph-isaacs wants to merge 5 commits into
developfrom
claude/quirky-mccarthy-1MC1W

Conversation

@joseph-isaacs
Copy link
Copy Markdown
Contributor

Summary

This PR adds automated code size tracking for pull requests. It introduces two Python scripts and a GitHub Actions workflow to measure and report Rust lines of code per crate, with delta comparisons against the base branch.

Changes

  1. scripts/crate-loc.py: New script that computes lines of code per workspace crate by:

    • Reading workspace members from Cargo.toml
    • Running tokei to analyze the repository
    • Attributing each Rust source file to its crate based on directory hierarchy
    • Outputting results as JSON
  2. scripts/compare-loc.py: New script that renders per-crate LOC as a collapsible markdown comment by:

    • Comparing HEAD and base revision LOC data
    • Computing deltas and percentage changes
    • Formatting results as a markdown <details> block with a summary line and expandable table
    • Handling edge cases like newly added crates and unchanged lines
  3. .github/workflows/code-size.yml: New GitHub Actions workflow that:

    • Runs on every pull request
    • Checks out both HEAD and base revisions
    • Installs tokei for code analysis
    • Computes LOC for both revisions using the new scripts
    • Posts a sticky PR comment with the comparison results
    • Uses concurrency controls to update a single comment per PR

The workflow enables developers and reviewers to quickly see code size changes at a glance, with detailed per-crate breakdowns available by expanding the comment.

Testing

The scripts are straightforward Python utilities with no external dependencies beyond tokei (which is installed by the workflow). The workflow will be tested automatically on the first PR that uses it. Manual verification can be done by:

  • Running python3 scripts/crate-loc.py . locally to verify LOC computation
  • Running python3 scripts/compare-loc.py <json-file> to verify markdown rendering

https://claude.ai/code/session_01Df7kNNDHfHnoa9uhTdGH9c

Adds a Code Size CI job that runs tokei over the workspace and posts a
single collapsible PR comment: a one-line total in the summary, with the
full per-crate line-count breakdown (and deltas against the base) on
expand. Nested crates are attributed to their longest path prefix so they
are not double counted.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Code size: 286,481 lines of Rust across 60 crates
Crate Lines Δ %
vortex-array 112,157
vortex-cuda 14,524
vortex-layout 11,515
vortex-duckdb 11,320
encodings/fastlanes 9,805
benchmarks-website/server 9,735
vortex-bench 9,032
vortex-tensor 7,476
vortex-datafusion 6,620
vortex-file 5,620
vortex-ffi 5,039
encodings/alp 4,858
vortex-python 4,833
vortex-flatbuffers 4,632
vortex-buffer 4,478
encodings/fsst 3,916
vortex-compressor 3,715
vortex-test/compat-gen 3,676
vortex-io 3,568
fuzz 3,483
encodings/runend 3,382
vortex-tui 3,327
encodings/sparse 3,086
benchmarks-website/migrate 2,920
encodings/parquet-variant 2,662
vortex-btrblocks 2,602
vortex-turboquant 2,574
vortex 2,562
vortex-mask 2,505
encodings/zstd 1,938
encodings/datetime-parts 1,843
vortex-jni 1,763
encodings/sequence 1,530
encodings/pco 1,115
benchmarks/vector-search-bench 966
vortex-ipc 948
encodings/decimal-byte-parts 926
benchmarks/datafusion-bench 772
encodings/zigzag 700
vortex-web/crate 694
benchmarks/random-access-bench 675
benchmarks/lance-bench 609
encodings/bytebool 568
vortex-error 556
vortex-sqllogictest 531
vortex-cxx 515
vortex-scan 479
benchmarks/compress-bench 419
benchmarks/duckdb-bench 418
vortex-proto 398
vortex-session 385
vortex-metrics 363
vortex-cuda/nvcomp 322
vortex-cuda/cub 319
vortex-array-macros 313
vortex-utils 272
vortex-cuda/gpu-scan-cli 189
xtask 140
vortex-test/e2e-cuda 133
vortex-cuda/macros 60

Total: 286,481 → 286,481 (—)

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 26, 2026

Merging this PR will improve performance by 13.65%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 5 improved benchmarks
❌ 1 regressed benchmark
✅ 1245 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(1000, 10)] 45 µs 30.8 µs +46.26%
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 187.7 µs 225.3 µs -16.71%
Simulation chunked_varbinview_opt_into_canonical[(1000, 10)] 239.5 µs 202.1 µs +18.49%
Simulation new_alp_prim_test_between[f32, 16384] 118.5 µs 104 µs +13.94%
Simulation new_alp_prim_test_between[f32, 32768] 182.2 µs 153.3 µs +18.83%
Simulation new_bp_prim_test_between[i16, 32768] 132.3 µs 120 µs +10.26%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/quirky-mccarthy-1MC1W (152f7d3) with develop (ae30d83)

Open in CodSpeed

claude added 2 commits May 26, 2026 18:08
Adds a Crate Binary Size CI job that builds the datafusion-bench binary on
stable and runs cargo-bloat to attribute its machine code back to each
first-party Vortex crate. Posts a single collapsible PR comment: a one-line
Vortex total in the summary, with the full per-crate .text breakdown on
expand. Third-party crates (datafusion, arrow, tokio, std) are filtered out
using the workspace member set from cargo metadata.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Builds datafusion-bench for both the PR head and develop on the same runner
(reusing the target directory) and reports the per-crate .text delta against
develop in the sticky PR comment. Removes the tokei lines-of-code report and
its workflow, leaving binary size as the sole code-size metric.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

code size change −5.8% (datafusion-bench)
Crate .text Δ vs develop %
vortex_fastlanes 1.16 MiB −863.6 KiB −42.1%
vortex_array 9.20 MiB +2.8 KiB +0.0%
vortex_buffer 207.6 KiB −253 B −0.1%
vortex_fsst 100.9 KiB +233 B +0.2%
vortex_runend 300.9 KiB +121 B +0.0%
vortex_bench 370.7 KiB
vortex_layout 367.4 KiB
vortex_sequence 297.5 KiB
vortex_compressor 228.4 KiB
vortex_file 225.7 KiB
vortex_sparse 177.3 KiB
vortex_alp 174.4 KiB
vortex_datafusion 170.0 KiB
vortex_pco 139.1 KiB
datafusion_bench 114.2 KiB
vortex_btrblocks 65.9 KiB
vortex_datetime_parts 60.1 KiB
vortex_zigzag 56.9 KiB
vortex_mask 43.6 KiB
vortex_io 38.8 KiB
vortex_zstd 37.4 KiB
vortex_decimal_byte_parts 36.2 KiB
vortex_proto 35.7 KiB
vortex_session 25.3 KiB
vortex_bytebool 21.9 KiB
vortex_error 17.6 KiB
vortex_scan 9.4 KiB
vortex_metrics 8.3 KiB
vortex 7.9 KiB
vortex_utils 4.3 KiB
vortex_flatbuffers 149 B

Vortex total: 14.47 MiB → 13.63 MiB (−860.7 KiB)

claude added 2 commits May 26, 2026 18:32
When the PR introduces no .text delta against develop, emit just the summary
line instead of the full per-crate table to keep the comment quiet.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Collapsed line now reads "no code size change (datafusion-bench)" when
unchanged, or "code size change +/-XX% (datafusion-bench)" with the per-crate
details table on expand when it changed.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs added changelog/ci changelog/skip Do not list PR in the changelog labels May 26, 2026
@myrrc
Copy link
Copy Markdown
Contributor

myrrc commented May 27, 2026

@joseph-isaacs Code size should ignore auto-generated stuff like vortex-duckdb/{cpp.rs,include/vortex.h}

Copy link
Copy Markdown
Contributor

@myrrc myrrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/ci changelog/skip Do not list PR in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants