Skip to content

UPSTREAM PR #1318: chore: replace rand and srand at the library level#76

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1318-sd_replace_rand
Open

UPSTREAM PR #1318: chore: replace rand and srand at the library level#76
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1318-sd_replace_rand

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Mar 5, 2026

Note

Source pull request: leejet/stable-diffusion.cpp#1318

These functions have global state, so they could interfere with application behavior.

It would arguably be more correct to use std::default_random_device, but that seemed a bit overkill for this.

These functions have global state, so they could interfere with
application behavior.
@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod March 5, 2026 04:16 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Mar 5, 2026

Overview

Analysis of stable-diffusion.cpp compared 49,765 functions across two versions, identifying 107 modified functions, 18 new functions, and 0 removed functions. The changes stem from a single commit replacing C-style rand/srand with C++ random number generation for improved thread safety and reproducibility.

Binaries Analyzed:

  • build.bin.sd-cli: +0.103% power consumption (491,105.58 → 491,612.53 nanojoules)
  • build.bin.sd-server: -0.107% power consumption (527,129.70 → 526,563.51 nanojoules)

Overall performance impact is negligible, with power consumption changes under 0.2% indicating effective performance neutrality despite individual function variations.

Function Analysis

std::vector::end() (build.bin.sd-cli): Throughput time increased 306.67% (59.77ns → 243.07ns, +183.30ns). Response time increased 223.91% (81.86ns → 265.16ns, +183.30ns). This STL function regression appears compiler-driven, likely from disabled inlining. While called frequently (411 uses), absolute impact remains modest.

std::vector<sd_lora_t>::end() (build.bin.sd-server): Throughput time improved 75.41% (243.07ns → 59.78ns, -183.29ns). Response time improved 69.44% (263.94ns → 80.65ns, -183.29ns). Compiler optimizations improved this LoRA parameter iteration function.

ggml_threadpool_params_default (build.bin.sd-cli): Throughput time improved 58.40% (217.48ns → 90.47ns, -127.01ns). Response time improved 45.46% (279.79ns → 152.59ns, -127.20ns). GGML submodule optimizations reduced threadpool initialization overhead.

ggml_compute_forward_map_custom3 (build.bin.sd-server): Throughput time improved 35.05% (219.25ns → 142.41ns, -76.84ns). Response time improved 32.91% (233.99ns → 156.98ns, -77.01ns). Custom operation handling benefits from more efficient RNG implementation.

apply_binary_op (build.bin.sd-cli): Throughput time improved 6.15% (1286.26ns → 1207.13ns, -79.13ns). Response time improved 4.26% (2362.80ns → 2262.11ns, -100.69ns). This frequently-called tensor addition operation shows modest but meaningful improvement.

Other analyzed functions showed mixed compiler-driven optimizations in STL operations (string construction, regex handling, vector reallocation) with changes ranging from -50% to +113%, but absolute impacts remained under 100ns per call.

Additional Findings

Core ML inference operations (matrix multiplication, convolution, attention) remain unchanged. Performance variations are predominantly compiler artifacts affecting peripheral functions (initialization, CLI parsing, memory management) rather than inference hot paths. The RNG replacement successfully achieves thread safety and reproducibility goals without compromising computational efficiency, as confirmed by near-zero net power consumption changes.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

@loci-dev loci-dev force-pushed the main branch 3 times, most recently from dd19ab8 to 98460a7 Compare March 10, 2026 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants