UPSTREAM PR #1322: feat: add spectrum caching method by loci-dev · Pull Request #78 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-03-06T04:14:55Z

Note

Source pull request: leejet/stable-diffusion.cpp#1322

Yet another training-free acceleration method. This PR implements Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration, currently for UNet only. For DiT models, we already have enough options in my view

This could replace and deprecate ucache, which was only ever an experimental method

Example usage:

/build/bin/sd-cli -m models/model.safetensors -p "a cute cat" --steps 20 -H 1024 -W 1024 --fa -s 42 --cache-mode spectrum --scheduler simple --sampling-method euler

Steps	Baseline	Spectrum
20
	×1.0	×1.82
30
	×1.0	×2.14
40
	×1.0	×2.50

loci-review · 2026-03-06T05:38:32Z

Overview

Analysis of commit e2a3d0c ("add spectrum") comparing base and target versions across build.bin.sd-server and build.bin.sd-cli binaries. Total functions: 49,806 (120 modified, 66 new, 0 removed). Power consumption increased minimally: build.bin.sd-server +0.25% (527,149 nJ → 528,461 nJ), build.bin.sd-cli +0.22% (491,453 nJ → 492,534 nJ).

Function Analysis

sd_cache_params_init (both binaries): Response and throughput time increased +94ns (+58%), adding initialization for 7-8 Spectrum caching parameters (spectrum_w, spectrum_m, spectrum_lam, spectrum_window_size, spectrum_flex_window, spectrum_warmup_steps, spectrum_stop_percent). This one-time initialization cost enables step-skipping optimization with potential 2-3x inference speedup.

Standard library improvements: std::vector::begin() (sd-server) improved -181ns (-68%), std::vector::begin() (sd-cli) -181ns (-68%), std::basic_string::_M_set_length() (sd-cli) -77ns (-54%), and std::vector::_S_max_size() (sd-cli) -203ns (-63%). These compiler optimizations benefit text processing and Spectrum's history buffer operations.

Standard library regressions: std::vector<TensorStorage*>::end() (sd-server) +183ns (+227%), __gnu_cxx::__ops::__pred_iter (sd-cli) +169ns (+213%), and std::shared_ptr::_M_destroy() (sd-server) +189ns (+180%). No source code changes—regressions stem from compiler optimization variations in non-critical paths (model loading, backend management, cleanup operations).

Other analyzed functions showed sub-microsecond changes in non-critical paths including HTTP utilities and metadata accessors.

Additional Findings

Spectrum caching targets the diffusion denoising loop—the primary performance bottleneck—by predicting when steps can be skipped. No GPU kernels or GGML operations were negatively impacted. The 94ns initialization overhead is negligible compared to potential millisecond-scale savings from skipping expensive denoising steps. String operation improvements particularly benefit CLIP tokenization and T5 encoding pipelines.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

loci-review · 2026-03-08T05:11:06Z

Overview

Analysis of 49,812 functions (126 modified, 66 new, 0 removed) across two binaries shows minimal performance impact from Spectrum cache implementation.

Power Consumption:

build.bin.sd-server: +0.164% (+863.41 nJ)
build.bin.sd-cli: +0.342% (+1,681.77 nJ)

Function Analysis

Intentional Feature Addition:

sd_cache_params_init (both binaries): +94ns (+58%) - Initializes 7 new Spectrum cache parameters (spectrum_w, spectrum_m, spectrum_lam, spectrum_window_size, spectrum_flex_window, spectrum_warmup_steps, spectrum_stop_percent). One-time setup cost enabling intelligent denoising step-skipping during inference.

Compiler-Induced Changes (STL functions, no source modifications):

std::vector::end() (both binaries): +183ns (+227-307%) - Added indirect jump pattern at entry
std::vector::begin() (both binaries): -181ns (-68-74%) - Optimized block consolidation (9→7 blocks)
std::shared_ptr::_M_destroy (LCMScheduler): +189ns (+61%) - Extra branching indirection at entry
std::basic_string::_M_set_length: -77ns (-41-54%) - Entry block optimization
std::vector::_S_max_size: -203ns (-57-63%) - Dead code elimination

Other analyzed functions (arange, T5Runner::get_desc, chrono::operator-, all_of, basic_string::_M_disjunct) showed compiler-generated code layout changes with minimal real-world impact.

Additional Findings

All impacted functions are outside the critical denoising loop. The 94ns initialization overhead enables runtime step-skipping optimization (potential 10-30% inference speedup). Compiler optimizations in some STL functions (-181ns to -203ns) partially offset regressions in others (+183ns to +189ns), resulting in negligible net impact (~519ns total across all functions, representing 0.000002-0.0000052% of typical inference time).

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

add spectrum

e2a3d0c

loci-dev temporarily deployed to stable-diffusion-cpp-prod March 6, 2026 04:14 — with GitHub Actions Inactive

guard spectrum cache mode to UNet models

3aa1dbb

loci-dev temporarily deployed to stable-diffusion-cpp-prod March 8, 2026 04:15 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 682032b to dd19ab8 Compare March 8, 2026 04:59

loci-dev force-pushed the main branch from dd19ab8 to 98460a7 Compare March 10, 2026 04:15

loci-dev force-pushed the main branch from 98460a7 to b898db0 Compare March 17, 2026 04:17

loci-dev force-pushed the main branch from b898db0 to 5012c52 Compare April 4, 2026 04:17

loci-dev force-pushed the main branch 2 times, most recently from a75ffd9 to 9bc3c69 Compare April 11, 2026 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1322: feat: add spectrum caching method#78

UPSTREAM PR #1322: feat: add spectrum caching method#78
loci-dev wants to merge 2 commits into
mainfrom
loci/pr-1322-spectrum

loci-dev commented Mar 6, 2026

Uh oh!

loci-review Bot commented Mar 6, 2026

Uh oh!

loci-review Bot commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Mar 6, 2026

Uh oh!

loci-review Bot commented Mar 6, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

loci-review Bot commented Mar 8, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants