Skip to content

UPSTREAM PR #1336: feat: add generic DiT support to spectrum cache#82

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1336-dit-spectrum
Open

UPSTREAM PR #1336: feat: add generic DiT support to spectrum cache#82
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1336-dit-spectrum

Conversation

@loci-dev
Copy link

Note

Source pull request: leejet/stable-diffusion.cpp#1336

Implementing the paper literally is heavier because it depends on architecture-specific internal feature hooks and final-block forecasting inside each DiT family, which would require invasive model-side changes and separate validation per model. This patch keeps Spectrum generic at the sampler level

Comparaison with easycache with 20 steps:

Baseline EasyCache Spectrum
Image Baseline EasyCache Spectrum
Speedup x1.82 x1.82
Config 0.2, 0.15, 0.95 default

@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod March 12, 2026 04:16 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Mar 12, 2026

Overview

Analysis of 49,806 functions following commit 2cbf294 ("add dit") shows negligible net performance impact with mixed compiler-driven optimizations. Modified: 56 functions (0.11%), New: 0, Removed: 0.

Binaries analyzed:

  • build.bin.sd-cli: -0.016% power consumption (-76.48 nJ)
  • build.bin.sd-server: +0.146% power consumption (+768.09 nJ)

Function Analysis

All performance changes occur in C++ standard library functions (not application code) due to compiler optimization variations between builds:

Improvements:

  • std::vector::begin() (sd-cli): -68% response time (-181ns) — entry block consolidation
  • std::vector::empty() (sd-server): -41% response time (-190ns) — eliminated intermediate jump
  • __gnu_cxx::__normal_iterator::operator+ (sd-server): -40% response time (-66ns) — merged stack canary operations
  • std::__shared_ptr::operator= (sd-server): -8% response time (-80ns) — entry block optimization

Regressions:

  • std::_Sp_counted_ptr_inplace::_M_destroy (sd-server): +61% response time (+189ns) — unnecessary branch insertion
  • std::vector::back() (sd-server): +26% response time (+184ns) — added entry indirection
  • std::vector::erase() (sd-server): +6% response time (+81ns) — entry block split

Other analyzed functions showed similar compiler-driven variations with offsetting improvements and regressions.

Assessment: The ±0.15% power consumption variance indicates improvements and regressions largely cancel out. No source code changes to these standard library functions were detected. Changes stem from compiler code generation differences, not intentional optimizations.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants