UPSTREAM PR #1195: feat: support for Segmind's Vega model #18

loci-dev · 2026-01-16T10:41:10Z

Mirrored from leejet/stable-diffusion.cpp#1195

Since we've gone so far as to support SDXS, now it's only a small step to also support Segmind's Vega model.
This is another gift for users of small devices like RaspberryPi etc.

There are only minor changes to the code, and they don't affect other models,
except for SDXS. Here I changed the test on the U-Net block (in model.cpp) from
diffusion_model.output_blocks.7.1
to
diffusion_model.output_blocks.3.1.transformer_blocks.1
so that both models can easy be distinguished with one single test.

Thank you very much.

loci-agentic-ai · 2026-01-16T11:28:53Z

Explore the complete analysis inside the Version Insights

Now I'll generate the comprehensive performance review report based on all the gathered information.

Performance Review Report

Overview

This review analyzes performance changes between two versions of the stable-diffusion.cpp project following commits 97255f9 and 75c0b7f, which add support for the Segmind-Vega distilled model. The changes modified 5 files, added 3 new files, and deleted 3 files across two binaries: build.bin.sd-server and build.bin.sd-cli.

Power Consumption Impact

sd-server: 0.113% increase (498,292.69 → 498,856.43 nJ)
sd-cli: 0.013% increase (469,148.81 → 469,209.21 nJ)

Total power consumption increased by approximately 624 nanojoules, representing negligible energy impact across both binaries.

Performance Analysis

Intentional Feature Addition

The primary source code change was adding VERSION_SDXL_VEGA support to the sd_version_is_sdxl() function, which now checks five SDXL variants instead of four. This function appears in multiple compilation units and shows consistent performance impact:

Absolute increase: +10.7ns per call
Percentage increase: ~18.8%
Justification: The additional conditional check is necessary for Segmind-Vega model classification and routing to appropriate SDXL-specific inference paths

This change is functionally required and the performance cost is minimal given the function's sub-70ns execution time and role as a lightweight classifier used 25 times across the codebase.

Compiler-Level Variations

The majority of performance changes stem from compiler optimization differences rather than source code modifications:

Improvements:

f8_e4m3_to_f16: -212ns (18% faster) from consolidated entry blocks in FP8-to-FP16 conversion, critical for quantized model inference
std::vector::begin() (httplib): -181ns (68% faster) from eliminated intermediate branching in HTTP routing path
std::sub_match::_M_str: -186ns (49% faster) from entry block consolidation in regex operations

Regressions:

std::vector::end() (thread): +183ns (227% slower) from added control flow indirection
std::vector::_S_max_size: +212ns (151% slower) from unnecessary unconditional branch
std::vector::_M_swap_data: +73ns (43% slower) from extra basic blocks in Flux modulation vector operations

These STL function changes show no source code modifications and represent compiler code generation differences, likely from optimization flag changes, compiler version updates, or security instrumentation adjustments between builds.

Mixed Optimization

The shared_ptr::operator= assignment for scheduler polymorphism shows favorable trade-offs: +80ns response time (8.3%) but +103% throughput improvement, indicating better instruction cache locality from code reorganization.

Conclusion

The performance changes reflect intentional feature enhancement (Segmind-Vega support) with acceptable overhead and compiler-level optimizations that produce mixed results. The net power consumption increase of 0.113% for sd-server and 0.013% for sd-cli is negligible. The absolute timing changes range from -212ns to +212ns, which are insignificant in the context of ML inference workloads that operate in millisecond-to-second timescales. The code changes successfully enable new model variant support while maintaining overall system efficiency.

loci-agentic-ai · 2026-01-17T08:11:05Z

Explore the complete analysis inside the Version Insights

akleine added 2 commits January 16, 2026 09:58

feat: add support for Segmind-Vega model

75c0b7f

docs: update distilled_sd.md for the Vega model

97255f9

loci-dev temporarily deployed to stable-diffusion-cpp-prod January 16, 2026 10:41 — with GitHub Actions Inactive

fix: Correction of diff between SSD-1B/Vega/SD-Tiny/SDXS

3c56dac

loci-dev temporarily deployed to stable-diffusion-cpp-prod January 17, 2026 07:35 — with GitHub Actions Inactive

loci-dev force-pushed the master branch 7 times, most recently from 243db15 to 436639f Compare January 23, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1195: feat: support for Segmind's Vega model #18

UPSTREAM PR #1195: feat: support for Segmind's Vega model #18

Uh oh!

loci-dev commented Jan 16, 2026

Uh oh!

loci-agentic-ai bot commented Jan 16, 2026

Uh oh!

loci-agentic-ai bot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #1195: feat: support for Segmind's Vega model #18

Are you sure you want to change the base?

UPSTREAM PR #1195: feat: support for Segmind's Vega model #18

Uh oh!

Conversation

loci-dev commented Jan 16, 2026

Uh oh!

loci-agentic-ai bot commented Jan 16, 2026

Performance Review Report

Overview

Power Consumption Impact

Performance Analysis

Intentional Feature Addition

Compiler-Level Variations

Mixed Optimization

Conclusion

Uh oh!

loci-agentic-ai bot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants