UPSTREAM PR #1222: improve handling of VAE decode failures #30

loci-dev · 2026-01-21T17:48:37Z

Mirrored from leejet/stable-diffusion.cpp#1222

for cases like leejet/stable-diffusion.cpp#1220

The cli should no longer save an empty image to disk if the generation failed. If no imge is sucessfuly saved, it should exit with error code -1 instead of 0. (Maybe I should change it to return -1 if any image failed to be saved?)

loci-agentic-ai · 2026-01-21T18:48:52Z

Performance Review Report: Stable Diffusion C++ - VAE Error Handling Improvements

Impact Classification: Moderate

Commit: 6aa3b9d "improve handling of VAE decode failures"
Binaries Analyzed: build.bin.sd-cli, build.bin.sd-server
Functions Analyzed: 14 (3 performance-critical, 11 utility/STL)

Executive Summary

The target version introduces error handling improvements for VAE decode operations, resulting in 7-8% response time increases (+4,600 nanoseconds absolute) in three GPU-critical functions. The changes prioritize production reliability over raw performance, with negligible user-facing impact (<0.001% of total generation time).

Performance-Critical Functions Impacted

1. StableDiffusionGGML::preview_image() [both binaries]

Base: 56,279-56,385 ns → Target: 60,881-60,977 ns
Change: +4,591-4,601 ns (+8.14-8.18%)
Called 20-50 times per generation for real-time previews

2. StableDiffusionGGML::decode_first_stage() [sd-cli]

Base: 63,713 ns → Target: 68,312 ns
Change: +4,599 ns (+7.22%)
Critical VAE decoder for final latent-to-image conversion

3. UpscalerGGML::upscale() [sd-cli]

Base: 61,032 ns → Target: 65,640 ns
Change: +4,608 ns (+7.55%)
ESRGAN 4x super-resolution with 345 convolution operations

Code Changes and Justification

All three functions received identical error handling enhancements:

Added return statements in tiling lambda callbacks to propagate compute() status
Implemented explicit error checking after VAE/TAE compute operations
Added free_compute_buffer() calls on error paths to prevent GPU memory leaks
Added memset initialization for result buffers

These changes prevent silent GPU failures, enable graceful degradation under memory constraints, and ensure proper resource cleanup—critical for production ML inference systems.

Cumulative Impact Analysis

Typical Generation (20 steps):

Preview overhead: 20 × 4.6 μs = 92 μs
Final decode overhead: 4.6 μs
Total: ~97 μs on 10-30 second generation (0.0003-0.001%)

Power Consumption: Negligible increase (<0.01W), imperceptible in 50-150W GPU inference workloads.

Additional Findings

Eleven STL functions show compiler-driven optimizations (no source changes) with 23-75% improvements in various metrics, providing distributed performance gains that partially offset error handling overhead. Notable: std::vector operations improved 40-75%, std::unordered_map::find() gained 44% throughput.

Conclusion

The 4.6 microsecond overhead per GPU operation is justified by substantial reliability improvements. The changes prevent crashes, memory leaks, and silent failures while adding sub-millisecond latency to multi-second operations. This represents mature engineering prioritizing correctness in production ML systems.
See the complete breakdown in Version Insights
Have questions? Tag @loci-dev to ask about this PR.

noamshiovits-Aurora · 2026-01-22T09:48:08Z

@loci-dev whats the most substantial change in this PR

improve handling of VAE decode failures

6aa3b9d

loci-dev temporarily deployed to stable-diffusion-cpp-prod January 21, 2026 17:49 — with GitHub Actions Inactive

loci-dev force-pushed the master branch 2 times, most recently from 243db15 to 436639f Compare January 23, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1222: improve handling of VAE decode failures #30

UPSTREAM PR #1222: improve handling of VAE decode failures #30

Uh oh!

loci-dev commented Jan 21, 2026

Uh oh!

loci-agentic-ai bot commented Jan 21, 2026

Uh oh!

noamshiovits-Aurora commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #1222: improve handling of VAE decode failures #30

Are you sure you want to change the base?

UPSTREAM PR #1222: improve handling of VAE decode failures #30

Uh oh!

Conversation

loci-dev commented Jan 21, 2026

Uh oh!

loci-agentic-ai bot commented Jan 21, 2026

Performance Review Report: Stable Diffusion C++ - VAE Error Handling Improvements

Impact Classification: Moderate

Executive Summary

Performance-Critical Functions Impacted

Code Changes and Justification

Cumulative Impact Analysis

Additional Findings

Conclusion

Uh oh!

noamshiovits-Aurora commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants