feat: --aggressive-offload for Apple Silicon (MPS) by uxtechie · Pull Request #13367 · Comfy-Org/ComfyUI

uxtechie · 2026-04-11T22:32:38Z

Aggressive Offload Benchmark — Apple Silicon (MPS)

Summary

Add --aggressive-offload CLI flag for Apple Silicon (MPS) systems with
unified memory. Forces model parameter destruction and MPS allocator
flush between generation runs, eliminating swap pressure on disk.

Problem

On Apple Silicon, CPU RAM = GPU VRAM (unified memory). ComfyUI's default
memory management assumes models can be offloaded to CPU — but on MPS,
"offloaded" models still consume the same physical RAM. When running
large pipelines (e.g. FLUX.2 Dev 32B GGUF + Mistral 24B GGUF = ~42 GB), the
MPS allocator fragments the shared pool, forcing macOS to swap model
weights to SSD. This turns GPU compute into disk-bound I/O.

Hardware

Component	Specification
System	Apple M5 Pro
Unified RAM	48 GB
macOS	Tahoe
PyTorch	2.x (MPS backend)
ComfyUI	Latest main branch + aggressive-offload patch

Model Configuration

Component	Model	Format	Size
Transformer (DiT)	FLUX.2 Dev 32B	GGUF Q5_K_M	~20 GB
Text Encoder	Mistral 24B	GGUF Q6_K	~18 GB
VAE	flux2-vae	safetensors	~160 MB
Total pipeline	—	—	~40 GB

Generation Parameters

Parameter	Value
Resolution	896 × 1152 (9:16 portrait)
Steps	20
Sampler	euler
Scheduler	simple
CFG	4.0

Results

Single Generation Latency

Metric	Without Patch	With `--aggressive-offload`
Time per image	~50 min	~20 min
Speedup	baseline	2.5×

The 2.5× improvement is caused by eliminating swap pressure on disk.
On Apple Silicon's unified memory architecture, CPU RAM is GPU VRAM.
Without the patch, the MPS allocator progressively fragments this shared
pool across sampling steps. When physical RAM is exhausted, macOS begins
swapping model weights to SSD — turning what should be GPU compute into
disk-bound I/O. A single FLUX.2 Dev generation at 896×1152 can push
virtual memory usage past 60 GB, with the SSD bottlenecking every tensor
operation. The per-step torch.mps.empty_cache() flush prevents this
fragmentation from accumulating, keeping the entire pipeline in physical
RAM and avoiding swap entirely.

Multi-Batch Stability

Metric	Without Patch	With `--aggressive-offload`
Consecutive generations	2 of 3 ✅ (3rd crashes)	4 of 4 ✅ (stopped manually)
Failure mode	`Cannot copy out of meta tensor` in `vae_decode`	None observed
Root cause	VAE (160 MB) incorrectly destroyed via meta device	Fixed: 1 GB threshold preserves small models

Memory Behaviour

Phase	Without Patch	With `--aggressive-offload`
Model loading	~36 GB allocated	~36 GB allocated
During sampling	Progressive growth via fragmentation	Stable (flushed per step)
Between generations	Models stay resident, no reclaim	Large models (>1 GB) moved to meta, small models preserved
After 3rd generation	OOM / excessive swap (~60+ GB virtual)	~18 GB (only reloading model on next run)

Changes

File	Change	Purpose
`comfy/cli_args.py`	`--aggressive-offload` flag	Opt-in activation
`comfy/model_management.py`	Force-unload + meta-device destruction	Reclaim RAM between runs
`comfy/model_management.py`	1 GB threshold for force-unload AND meta destruction	Preserve small models (VAE)
`comfy/model_management.py`	Weakref guards before model attribute access	Prevent AttributeError on dead refs
`comfy/model_management.py`	Lifecycle callback system	Decouple from execution engine
`comfy/model_management.py`	`EXTRA_RESERVED_VRAM` 4 GB gated by flag	No impact on default MPS behaviour
`comfy/samplers.py`	MPS `empty_cache()` per step	Prevent allocator fragmentation
`comfy_execution/caching.py`	`BasicCache.clear_all()` public API	Formal cache invalidation
`comfy_execution/caching.py`	`NullCache.clear_all()` no-op	Contract compliance for null backend
`comfy_execution/caching.py`	`LRUCache`/`RAMPressureCache` metadata reset	Prevent stale bookkeeping after invalidation
`execution.py`	Cache invalidation via callback	Robust against executor reset

Testing

15 unit tests covering: clear_all() for all cache variants (Basic, LRU, RAMPressure, Null), lifecycle callbacks, meta-device threshold, MPS flush conditionality, and CLI flag wiring.
Empirical: 4+ consecutive 896×1152 FLUX.2 Dev generations without crash.
All tests use meta-device tensors — no large memory allocations, CI-safe on any platform.

Reproduction

# With patch
python main.py --aggressive-offload

# Without patch (baseline)
python main.py

Queue 4+ identical txt2img jobs at 896×1152, 20 steps, FLUX.2 Dev Q5_K_M.
Observe: time per image and whether batch completes without crash.

Conclusion

The --aggressive-offload flag resolves two critical issues on Apple Silicon:

Swap thrashing — 2.5× latency improvement (50 min → 20 min) by preventing
MPS allocator fragmentation via per-step cache flushes.
Multi-batch OOM — Eliminates Cannot copy out of meta tensor crashes by
using a size-based threshold (>1 GB) for aggressive model destruction, preserving
small models like the VAE that the execution cache depends on.

The patch is opt-in (--aggressive-offload), has zero impact on non-MPS platforms,
and is gated behind VRAMState.SHARED checks throughout.

Eliminate swap pressure on unified memory systems by: - Force-destroying model parameters via meta device after use - Flushing MPS allocator cache per sampling step - Preserving small models (<1GB, e.g. VAE) via size threshold - Lifecycle callback system for execution cache invalidation Benchmarked on M5 Pro 48GB with FLUX.2 Dev 32B GGUF: - Latency: 50 min → 20 min per image (2.5× improvement) - Stability: 4+ consecutive generations without OOM

coderabbitai · 2026-04-11T22:37:51Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a CLI flag --aggressive-offload and exposes AGGRESSIVE_OFFLOAD in model management. Introduces a global append-only model-destroyed callback registry. Enhances free_memory() with Apple Silicon (shared/MPS) aggressive-offload behavior: relaxed candidate filtering, forced unloads for unused models, queued large-model moves to device="meta", per-batch callback invocation, and extra GC/cache-empty passes. KSAMPLER wraps callbacks to call torch.mps.empty_cache() on MPS when enabled. Adds clear_all() to cache classes. PromptExecutor registers a destruction callback to clear output caches. Tests added.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main feature being added: an aggressive-offload optimization specifically for Apple Silicon's MPS backend.
Description check	✅ Passed	The pull request description comprehensively describes the changeset, detailing the aggressive-offload feature for Apple Silicon MPS systems, including problem statement, benchmarks, implementation approach, and testing coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_execution/caching.py`:
- Around line 193-202: NullCache is missing the public invalidation API
clear_all(), violating the cache contract used by CacheSet.init_null_cache and
consumers like PromptExecutor; add a no-op clear_all(self) method to the
NullCache class so callers can unconditionally call cache.clear_all() without
AttributeError (implement as an empty method that does not raise and document it
as a no-op invalidation for null backend).

In `@comfy/model_management.py`:
- Around line 684-688: The current code unconditionally sets EXTRA_RESERVED_VRAM
to 4GB when CPUState.MPS is detected, which changes default MPS memory
heuristics; change this so the 4GB reservation is only applied when the explicit
opt-in flag is set (e.g., AGGRESSIVE_OFFLOAD or a dedicated MPS flag). Update
the CPUState.MPS branch to check the boolean AGGRESSIVE_OFFLOAD (or an
equivalent MPS-specific flag) before assigning EXTRA_RESERVED_VRAM = 4 * 1024 *
1024 * 1024 and before emitting the "MPS detected: reserving 4 GB..." log,
leaving the default path unchanged when the flag is false; ensure you reference
CPUState.MPS, EXTRA_RESERVED_VRAM, and AGGRESSIVE_OFFLOAD in the change so the
behavior is gated behind the opt-in.
- Around line 490-511: The current register_model_destroyed_callback appends
strong references to _on_model_destroyed_callbacks causing previous
PromptExecutor instances (registered via PromptExecutor.__init__) to be
retained; change this to store weak references: use weakref.WeakSet/weakref.ref
for plain callables and weakref.WeakMethod for bound methods, and update the
invocation logic that iterates _on_model_destroyed_callbacks to dereference weak
refs and prune dead entries before calling. Also add an
unregister_model_destroyed_callback(callback) that locates and removes the
matching weakref (or dead entries) so executors can unregister on teardown;
ensure functions that invoked register_model_destroyed_callback are updated to
call unregister during cleanup.

In `@tests-unit/test_aggressive_offload.py`:
- Around line 178-180: The test test_large_model_above_threshold currently
instantiates FakeLinearModel(size_mb=2048) which allocates a huge real tensor;
change the test to avoid allocating backing storage by stubbing or mocking the
reported parameter size instead: either modify FakeLinearModel to accept a flag
(e.g., allocate=False) that only sets reported size metadata without allocating
tensors, or replace the instantiation in test_large_model_above_threshold with a
lightweight fake/mock object that implements the same size-reporting interface
used by the code under test (e.g., .parameters(), .numel(), or a size_mb
property) so the test exercises the destruction-threshold logic without creating
large memory buffers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 93e9ffdf-231b-4934-a488-77ed0e235f08

📥 Commits

Reviewing files that changed from the base of the PR and between a2840e7 and 07c3887.

📒 Files selected for processing (6)

comfy/cli_args.py
comfy/model_management.py
comfy/samplers.py
comfy_execution/caching.py
execution.py
tests-unit/test_aggressive_offload.py

comfy_execution/caching.py

comfy/model_management.py

tests-unit/test_aggressive_offload.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests-unit/test_aggressive_offload.py (1)

185-194: ⚠️ Potential issue | 🟠 Major

Avoid allocating a real 2 GB tensor in this unit test.

Line 187 creates FakeLinearModel(size_mb=2048), which materializes a huge backing tensor via Line 24. This can OOM/flake CI and turns a logic test into a machine-capacity test.

💡 Suggested change

 class FakeLinearModel(nn.Module):
@@
-    def __init__(self, size_mb: float = 2.0):
+    def __init__(self, size_mb: float = 2.0, *, allocate: bool = True):
         super().__init__()
         # Each float32 param = 4 bytes, so `n` params ≈ size_mb * 1024² / 4
         n = int(size_mb * 1024 * 1024 / 4)
-        self.weight = nn.Parameter(torch.zeros(n, dtype=torch.float32))
+        if allocate:
+            self.weight = nn.Parameter(torch.zeros(n, dtype=torch.float32))
+        else:
+            self.weight = nn.Parameter(torch.zeros(1, dtype=torch.float32))
+        self.reported_size_bytes = int(size_mb * 1024 * 1024)
@@
     def test_large_model_above_threshold(self):
         """A 2 GB model (UNET/CLIP-sized) must BE above the destruction threshold."""
-        model = FakeLinearModel(size_mb=2048)
-
-        model_size = sum(p.numel() * p.element_size() for p in model.parameters())
+        model = FakeLinearModel(size_mb=2048, allocate=False)
+        model_size = model.reported_size_bytes
         threshold = 1024 * 1024 * 1024  # 1 GB

#!/bin/bash
# Verify whether the large-threshold test still allocates real backing storage.
rg -n -C2 'def __init__\(self, size_mb|torch\.zeros\(n|test_large_model_above_threshold|size_mb=2048' tests-unit/test_aggressive_offload.py

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests-unit/test_aggressive_offload.py` around lines 185 - 194, The test
test_large_model_above_threshold currently constructs
FakeLinearModel(size_mb=2048) which allocates a real huge tensor; change
FakeLinearModel (constructor / __init__) so it no longer materializes a backing
buffer for large sizes: compute the intended numel from size_mb and return
lightweight fake parameter objects for parameters() (or a small
torch.nn.Parameter) whose numel() and element_size() methods return the computed
values, or add an allocate=False flag and skip creating the large tensor when
false; update test_large_model_above_threshold to use the non-allocating
behavior so the assertion on model_size still computes correctly without
allocating gigabytes.

🧹 Nitpick comments (1)

tests-unit/test_aggressive_offload.py (1)

217-242: These MPS flush tests don’t currently validate sampler flush behavior.

Lines 217-242 only assert local boolean/device facts; they can pass even if the samplers MPS empty_cache() call path regresses. Consider exercising the actual wrapper/step path with a mocked torch.mps.empty_cache call count.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests-unit/test_aggressive_offload.py` around lines 217 - 242, Replace the
passive asserts with a test that actually exercises the sampler flush path:
patch torch.mps.empty_cache (unittest.mock.patch) and then run the sampler
entrypoint that would trigger a flush (the sampler wrapper/step used by your
codebase — e.g., the samplers module entry function referenced as "samplers" in
the comment) while toggling comfy.model_management.AGGRESSIVE_OFFLOAD and
simulating an MPS device; assert that torch.mps.empty_cache was called when
mm.AGGRESSIVE_OFFLOAD is True and the sampler/device reports type "mps", and
assert it was not called when AGGRESSIVE_OFFLOAD is False or device.type !=
"mps". Ensure you set mm.AGGRESSIVE_OFFLOAD back to its original value in a
finally block and restore any patched device attributes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests-unit/test_aggressive_offload.py`:
- Around line 257-261: Update the test test_flag_defaults_from_cli_args to
assert the actual wiring instead of just attribute presence: import
comfy.model_management and assert that comfy.model_management.AGGRESSIVE_OFFLOAD
== comfy.cli_args.args.aggressive_offload so the test fails if the CLI flag is
not propagated to the model_management constant (keep the existing imports and
names: test_flag_defaults_from_cli_args, comfy.cli_args.args.aggressive_offload,
comfy.model_management.AGGRESSIVE_OFFLOAD).

---

Duplicate comments:
In `@tests-unit/test_aggressive_offload.py`:
- Around line 185-194: The test test_large_model_above_threshold currently
constructs FakeLinearModel(size_mb=2048) which allocates a real huge tensor;
change FakeLinearModel (constructor / __init__) so it no longer materializes a
backing buffer for large sizes: compute the intended numel from size_mb and
return lightweight fake parameter objects for parameters() (or a small
torch.nn.Parameter) whose numel() and element_size() methods return the computed
values, or add an allocate=False flag and skip creating the large tensor when
false; update test_large_model_above_threshold to use the non-allocating
behavior so the assertion on model_size still computes correctly without
allocating gigabytes.

---

Nitpick comments:
In `@tests-unit/test_aggressive_offload.py`:
- Around line 217-242: Replace the passive asserts with a test that actually
exercises the sampler flush path: patch torch.mps.empty_cache
(unittest.mock.patch) and then run the sampler entrypoint that would trigger a
flush (the sampler wrapper/step used by your codebase — e.g., the samplers
module entry function referenced as "samplers" in the comment) while toggling
comfy.model_management.AGGRESSIVE_OFFLOAD and simulating an MPS device; assert
that torch.mps.empty_cache was called when mm.AGGRESSIVE_OFFLOAD is True and the
sampler/device reports type "mps", and assert it was not called when
AGGRESSIVE_OFFLOAD is False or device.type != "mps". Ensure you set
mm.AGGRESSIVE_OFFLOAD back to its original value in a finally block and restore
any patched device attributes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 36f37e85-b5b0-490e-b4a5-0d1b6da94b80

📥 Commits

Reviewing files that changed from the base of the PR and between 07c3887 and d9a4089.

📒 Files selected for processing (2)

comfy_execution/caching.py
tests-unit/test_aggressive_offload.py

🚧 Files skipped from review as they are similar to previous changes (1)

comfy_execution/caching.py

tests-unit/test_aggressive_offload.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_execution/caching.py`:
- Around line 193-203: clear_all() currently clears self.cache and
self.subcaches but leaves LRU/pressure metadata (e.g. used_generation, children,
timestamps) on LRUCache/RAMPressureCache instances, causing stale bookkeeping;
modify comfy_execution.caching.clear_all to iterate over existing cache and
subcache objects (instances of LRUCache and RAMPressureCache) and reset their
internal metadata fields (set used_generation to 0 or appropriate baseline,
clear children collections, clear timestamps structures and any other per-entry
bookkeeping) before or after clearing self.cache/self.subcaches so eviction
behavior is fully reset; reference the LRUCache and RAMPressureCache classes and
their attributes used_generation, children, timestamps when making the changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ac0b6276-9896-4835-b713-0c5f31568390

📥 Commits

Reviewing files that changed from the base of the PR and between d9a4089 and c63c5bc.

📒 Files selected for processing (3)

comfy/model_management.py
comfy_execution/caching.py
tests-unit/test_aggressive_offload.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests-unit/test_aggressive_offload.py
comfy/model_management.py

comfy_execution/caching.py

coderabbitai

🧹 Nitpick comments (1)

comfy/model_management.py (1)

776-802: Consider wrapping individual callback invocations in try/except.

The two-phase design (deferred meta move + single batch notification) is well-structured. However, if a registered callback raises an exception at line 801, subsequent callbacks won't execute. While the current sole consumer (PromptExecutor) is unlikely to throw, defensive handling would improve resilience as more consumers are added.

🛡️ Optional: Defensive callback invocation

     if _meta_destroy_queue and _on_model_destroyed_callbacks:
         for cb in _on_model_destroyed_callbacks:
-            cb("batch")
+            try:
+                cb("batch")
+            except Exception as e:
+                logging.warning(f"[aggressive-offload] Callback failed: {e}")
         logging.info(f"[aggressive-offload] Invalidated execution cache after destroying {len(_meta_destroy_queue)} model(s)")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@comfy/model_management.py` around lines 776 - 802, The loop invoking
registered callbacks (_on_model_destroyed_callbacks) should guard each callback
call so one failing callback doesn't stop the rest; change the for cb in
_on_model_destroyed_callbacks: cb("batch") to call each cb("batch") inside a
try/except that catches Exception, logs the failure (e.g., logging.exception or
logging.warning with the exception) and then continues, ensuring all callbacks
(including PromptExecutor) still run and the final logging.info still executes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@comfy/model_management.py`:
- Around line 776-802: The loop invoking registered callbacks
(_on_model_destroyed_callbacks) should guard each callback call so one failing
callback doesn't stop the rest; change the for cb in
_on_model_destroyed_callbacks: cb("batch") to call each cb("batch") inside a
try/except that catches Exception, logs the failure (e.g., logging.exception or
logging.warning with the exception) and then continues, ensuring all callbacks
(including PromptExecutor) still run and the final logging.info still executes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 328d2ed0-f541-4917-9626-7a126746a09d

📥 Commits

Reviewing files that changed from the base of the PR and between c63c5bc and db8ffcb.

📒 Files selected for processing (3)

comfy/model_management.py
comfy_execution/caching.py
tests-unit/test_aggressive_offload.py

🚧 Files skipped from review as they are similar to previous changes (2)

comfy_execution/caching.py
tests-unit/test_aggressive_offload.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy/model_management.py`:
- Around line 750-758: The aggressive-offload block can dereference a dead
weakref; before accessing
current_loaded_models[i].model.model.__class__.__name__ or calling
model_memory(), first retrieve the model reference into a local (e.g., model_ref
= current_loaded_models[i].model), check that model_ref is not None and that
getattr(model_ref, "model", None) is not None (and that currently_used still
False), and only then compute model_size_mb and log; if the weakref is dead,
skip/continue the offload branch to avoid AttributeError. Ensure checks occur
inside the AGGRESSIVE_OFFLOAD and VRAMState.SHARED branch and reference the same
current_loaded_models[i] entry consistently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 00b934cf-fe05-4da1-8e14-d54258db5827

📥 Commits

Reviewing files that changed from the base of the PR and between db8ffcb and ffcc43a.

📒 Files selected for processing (3)

comfy/model_management.py
comfy_execution/caching.py
tests-unit/test_aggressive_offload.py

comfy/model_management.py

uxtechie added 2 commits April 12, 2026 00:23

feat: add BasicCache.clear_all() public API for cache invalidation

cfcd334

uxtechie requested review from Kosinkadink, comfyanonymous and guill as code owners April 11, 2026 22:32

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

comfy_execution/caching.py Show resolved Hide resolved

comfy/model_management.py Show resolved Hide resolved

comfy/model_management.py Outdated Show resolved Hide resolved

tests-unit/test_aggressive_offload.py Outdated Show resolved Hide resolved

uxtechie force-pushed the feat/aggressive-offload branch from 07c3887 to d9a4089 Compare April 11, 2026 22:46

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

tests-unit/test_aggressive_offload.py Show resolved Hide resolved

uxtechie force-pushed the feat/aggressive-offload branch from d9a4089 to c63c5bc Compare April 11, 2026 22:51

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

comfy_execution/caching.py Show resolved Hide resolved

uxtechie force-pushed the feat/aggressive-offload branch 3 times, most recently from ee9abc9 to ffcc43a Compare April 11, 2026 23:01

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

comfy/model_management.py Show resolved Hide resolved

uxtechie force-pushed the feat/aggressive-offload branch 2 times, most recently from 72977bb to c2213f8 Compare April 11, 2026 23:20

test: add unit tests for --aggressive-offload (12 tests)

7ec3984

uxtechie force-pushed the feat/aggressive-offload branch from c2213f8 to 7ec3984 Compare April 11, 2026 23:20

Merge branch 'master' into feat/aggressive-offload

0bd2a35

uxtechie force-pushed the feat/aggressive-offload branch from 162b2ad to 0bd2a35 Compare April 11, 2026 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: --aggressive-offload for Apple Silicon (MPS)#13367

feat: --aggressive-offload for Apple Silicon (MPS)#13367
uxtechie wants to merge 4 commits intoComfy-Org:masterfrom
uxtechie:feat/aggressive-offload

uxtechie commented Apr 11, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 11, 2026 •

edited

Loading

Reviews paused

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

uxtechie commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Aggressive Offload Benchmark — Apple Silicon (MPS)

Summary

Problem

Hardware

Model Configuration

Generation Parameters

Results

Single Generation Latency

Multi-Batch Stability

Memory Behaviour

Changes

Testing

Reproduction

Conclusion

Uh oh!

coderabbitai bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

uxtechie commented Apr 11, 2026 •

edited

Loading

coderabbitai bot commented Apr 11, 2026 •

edited

Loading