Skip to content

SeedVR2 native RoPE: rewire NaMMRotaryEmbedding3d.forward to apply_rope1 directly (closes pollockjj/mydevelopment#224)#52

Merged
pollockjj merged 3 commits into
issue_101from
issue_224_seedvr2_native_rope_apply_rope1
May 8, 2026
Merged

SeedVR2 native RoPE: rewire NaMMRotaryEmbedding3d.forward to apply_rope1 directly (closes pollockjj/mydevelopment#224)#52
pollockjj merged 3 commits into
issue_101from
issue_224_seedvr2_native_rope_apply_rope1

Conversation

@pollockjj
Copy link
Copy Markdown
Owner

@pollockjj pollockjj commented May 8, 2026

Summary

Rewires NaMMRotaryEmbedding3d.forward (the multimodal RoPE used by SeedVR2 inference) to call comfy.ldm.flux.math.apply_rope1 directly via a thin _apply_rope1_partial helper, with NaMMRotaryEmbedding3d.get_freqs emitting freqs in flux-canonical shape [..., d/2, 2, 2] (cos/-sin/sin/cos baked in). Matches the pattern used by 7 other native ComfyUI native-DiT models (flux, hidream, kandinsky5, lumina, qwen_image, wan, sam3) and eliminates the wasteful freqs_mat construction + terminal torch.cat in the legacy apply_rotary_emb wrapper that OOM'd on the largest cell of the SeedVR2 native_3b non-tiled corpus (VideoLQ_000 1280×960×100 on RTX 5090 32 GB).

Tracking issue: pollockjj/mydevelopment#224.

Diagnosis

Pre-rewrite OOM site

NaMMRotaryEmbedding3d.forward calls Yusef's apply_rotary_emb wrapper at comfy/ldm/seedvr/model.py:466-504. The wrapper builds a 2×2 rotation matrix via three torch.stacks and ends in torch.cat((t_left, t_middle_out, t_right), dim=-1). For VideoLQ_000 (320×240 LR → 1280×960×100 target):

```
torch.OutOfMemoryError: Allocation on device 0 would exceed allowed memory.
Currently allocated : 26.98 GiB
Requested : 2.38 GiB
Device limit : 31.36 GiB
Free (according to CUDA): 19.69 MiB
File "comfy/ldm/seedvr/model.py", line 503, in apply_rotary_emb
out = torch.cat((t_left, t_middle_out, t_right), dim=-1)
```

Canonical (vendor seedvr_repo/models/dit_v2/rope.py) and numz (ComfyUI-SeedVR2_VideoUpscaler/src/models/dit_3b/rope.py) both clear the same cell with the same fp16 weights. Both call from rotary_embedding_torch import apply_rotary_emb directly — the upstream lucidrains library implementation. Native is the only leg that hand-rolls the wrapper.

Prior signal

Kosinkadink raised this exact concern as item #2 of his upstream PR review (Comfy-Org#11294):

  1. I see there is some rope re-implementation and some rope reuse as well. Is some of the rope stuff for SeedVR2 unique enough that it cannot be refactored in a way that reuses apply_rope1?

Yusef's response confirmed the matrix-form refactor that this PR straightens out:

For 2, it already uses apply_rope1. I just refactored some parts of the rope implementation as a matrix to work with apply_rope1. You can see it in apply_rotary_emb

The wrapper's apply_rope1 call was preserved; the wasteful matrix construction around it is what this PR removes.

Implementation

Single file changed: comfy/ldm/seedvr/model.py (one new helper + one rewrite + 4 forward sites rewired).

_to_flux_freqs_cis(freqs_interleaved) (new helper)

Converts lucidrains-interleaved freqs [..., d] ([θ0, θ0, θ1, θ1, ...] from RotaryEmbedding.forward's repeat(freqs, '... n -> ... (n r)', r=2)) into flux-canonical freqs_cis of shape [..., d/2, 2, 2] with cos/-sin/sin/cos baked. Mirrors comfy/ldm/flux/math.py:rope (line 27).

_apply_rope1_partial(t, freqs_cis) (new helper)

Applies apply_rope1 to the leading rot_d = 2 * freqs_cis.shape[-3] components of t's last dim, passing through the remaining dims untouched. Mirrors the partial-rope contract of the legacy wrapper's t_left/t_middle/t_right split. Critical for SeedVR2-3B: rope_dim=128 integer-divides into 3 axes as 128 // 3 = 42 per-axis, total 42 * 3 = 126; head_dim is 128, so the trailing 2 head dims are unrotated. The fast path triggers when rot_d == t.shape[-1] and avoids the cat entirely.

NaMMRotaryEmbedding3d.get_freqs rewrite

Computes interleaved freqs via the existing parent-class get_axial_freqs calls (unchanged), then post-processes through _to_flux_freqs_cis so the returned tensors have the flux-canonical shape apply_rope1 expects.

NaMMRotaryEmbedding3d.forward rewire

The 4 call sites (vid_q, vid_k, txt_q, txt_k) at the previous lines 536-544 now call _apply_rope1_partial(t, freqs_cis) directly. The legacy apply_rotary_emb wrapper is no longer on the active SeedVR2 inference path.

Out of scope

  • RotaryEmbedding3d.forward (lines 421-438; the non-multimodal video-only path) still uses the wrapper. No OOM observed there. Defer to a follow-up if benchmarking shows benefit.
  • MMRotaryEmbeddingBase.__init__ staticmethod registration (line 323). Stays.
  • The legacy apply_rotary_emb wrapper, slice_at_dim, rotate_half, exists helpers all remain — still used by the non-rewired paths.

Tests

New file: tests-unit/comfy_test/test_seedvr_rope_rewrite.py (5 tests, all GREEN post-rewrite; 3 RED on baseline before the implementation, 1 GREEN trivially as oracle gate, +1 partial-rope test added during implementation):

  1. test_namm_forward_calls_apply_rope1_directly — call-graph spy: apply_rotary_emb.call_count == 0, apply_rope1.call_count == 4.
  2. test_get_freqs_emits_flux_canonical_shape — shape [..., d/2, 2, 2] with cos/-sin/sin/cos pattern.
  3. test_namm_forward_output_tensor_equal_against_legacy_oracle — fp32 CPU oracle from the unchanged wrapper fed with legacy-shape freqs; torch.testing.assert_close(rtol=0, atol=0).
  4. test_namm_forward_partial_rope_passthrough_matches_wrapper_oracle — dim=128, rot_d=126 < head_dim=128; exercises the partial-rope path explicitly with the same torch.equal gate. Catches the partial-rope bug that the dim=192 oracle (test 3) cannot.
  5. test_namm_forward_ast_has_no_apply_rotary_emb_calls — AST walk over forward, asserts zero references to apply_rotary_emb.

Existing test_seedvr_rope_delegation.py (8 parametrized cases of the wrapper itself) remains GREEN — wrapper unchanged.

Net diff

  • comfy/ldm/seedvr/model.py: +27 / -4 (one new helper, one wrapper, 4 forward sites rewired)
  • tests-unit/comfy_test/test_seedvr_rope_rewrite.py: +297 / -0 (new file, 5 tests)

Single file in the production path. No new dependency. Acquires the comfy.quant_ops.ck.apply_rope1 quantized fast path for free (same as the other 7 native models).

Test plan

  • All 5 rope-rewrite tests GREEN.
  • test_seedvr_rope_delegation.py (existing wrapper tests) remain GREEN.
  • Full seedvr2 unit test surface (19 files) — no regressions introduced by this PR.
  • Pre-rewrite VideoLQ_000 OOM at apply_rotary_emb's torch.cat (26.98 GiB allocated, requested 2.38 GiB) reproduced on RTX 5090 32 GB cold start.
  • Post-rewrite VideoLQ_000 no longer hits the RoPE OOM site (~3.6 GiB peak reduction). Cell still OOMs further downstream at concat_win (comfy/ldm/seedvr/model.py:148, called from model.py:852 v=concat_win(vid_v, txt_v)) — that is a separate native-vs-upstream divergence (upstream casts q/k/v to bfloat16 before the cat; native does not) tracked outside this PR.
  • 21 of 22 native_3b non-tiled corpus cells (UDM10×6, YouHQ40 squares×6, LTX×9) pass on pollockjj/mydevelopment:main:github_issues/220/05_outputs/*_native_3b.{mp4,log} post-rewrite, matching the pre-rewrite passing-cell count. The 22nd (VideoLQ_000) hits the separate concat_win wall.

Linked

pollockjj added 2 commits May 7, 2026 23:37
Land 4 regression tests for the upcoming rewrite of
NaMMRotaryEmbedding3d.forward to call apply_rope1 directly (matching the
pattern used by flux/hidream/kandinsky5/lumina/qwen_image/wan/sam3) and
NaMMRotaryEmbedding3d.get_freqs to emit freqs in flux-canonical shape
[..., d/2, 2, 2] with cos/-sin/sin/cos baked in.

Pre-rewrite baseline:
- test_namm_forward_calls_apply_rope1_directly: RED — wrapper.call_count == 4
- test_get_freqs_emits_flux_canonical_shape: RED — current shape is 2D [L, dim]
- test_namm_forward_output_tensor_equal_against_legacy_oracle: GREEN —
  trivially passes since forward IS the wrapper path pre-rewrite; this test
  becomes the post-rewrite equivalence gate.
- test_namm_forward_ast_has_no_apply_rotary_emb_calls: RED — 4 calls at
  lines 536/537/543/544.

Self-contained: no .pt fixture. Oracle is computed inside the test by
reproducing the pre-rewrite get_freqs body and feeding the unchanged
apply_rotary_emb wrapper (Shape B keeps the wrapper for
RotaryEmbedding3d.forward and the lucidrains-RotaryEmbedding staticmethod
registration).
…ope1 directly

NaMMRotaryEmbedding3d.forward now calls comfy.ldm.flux.math.apply_rope1
directly via a thin _apply_rope1_partial helper that handles the partial-
rope passthrough case (rope_dim=128 → per-axis 42 → total 126; head_dim=128;
trailing 2 dims pass through unrotated, mirroring the legacy
apply_rotary_emb wrapper's t_left/t_middle/t_right contract).

NaMMRotaryEmbedding3d.get_freqs now emits freqs in flux-canonical layout
[..., d/2, 2, 2] with cos/-sin/sin/cos baked in, via a new
_to_flux_freqs_cis helper. The lucidrains-interleaved upstream layout
[θ0, θ0, θ1, θ1, ...] is un-interleaved via the standard ::2 stride slice.

Pattern matches the 7 other native ComfyUI native-DiT models (flux,
hidream, kandinsky5, lumina, qwen_image, wan, sam3) that call apply_rope1
directly. Eliminates the wasteful freqs_mat construction + terminal
torch.cat in the legacy apply_rotary_emb wrapper that OOM'd on
VideoLQ_000 (1280x960x100). Wrapper retained for RotaryEmbedding3d.forward
(non-multimodal video-only path) and the lucidrains RotaryEmbedding
staticmethod registration.

Tests added in test_seedvr_rope_rewrite.py:
  - test_namm_forward_calls_apply_rope1_directly
  - test_get_freqs_emits_flux_canonical_shape
  - test_namm_forward_output_tensor_equal_against_legacy_oracle
  - test_namm_forward_partial_rope_passthrough_matches_wrapper_oracle (dim=128)
  - test_namm_forward_ast_has_no_apply_rotary_emb_calls
All five GREEN; pre-rewrite baseline had 3 RED (call-graph, get_freqs
shape, AST) and 1 GREEN (oracle, trivially equal pre-rewrite).

Existing test_seedvr_rope_delegation.py (8 parametrized cases of the
wrapper itself) remains GREEN — wrapper unchanged.

Net diff: comfy/ldm/seedvr/model.py +27/-4. Single file, no new dep.

Tracking: pollockjj/mydevelopment#224
Copilot AI review requested due to automatic review settings May 8, 2026 04:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reworks SeedVR2’s multimodal 3D RoPE path (NaMMRotaryEmbedding3d) to bypass the legacy apply_rotary_emb wrapper and call Flux’s apply_rope1 directly using flux-canonical freqs ([..., d/2, 2, 2]), reducing unnecessary intermediate allocations that contributed to OOMs.

Changes:

  • Add helpers to convert lucidrains-interleaved freqs to flux-canonical freqs_cis, and to apply RoPE only to the leading rotated sub-dimension (preserving the “partial-rope” tail passthrough contract).
  • Rewire NaMMRotaryEmbedding3d.forward to use _apply_rope1_partial(...) (4 call sites) and update get_freqs to emit flux-canonical freqs.
  • Add a new unit test suite to lock in call-graph, shape/layout, numerical equivalence vs the legacy wrapper, partial-rope behavior, and AST-level enforcement.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
comfy/ldm/seedvr/model.py Adds flux-canonical freqs conversion + partial RoPE helper; rewires NaMMRotaryEmbedding3d to use apply_rope1 directly.
tests-unit/comfy_test/test_seedvr_rope_rewrite.py New regression tests for the RoPE rewrite (behavioral, structural, and partial-rope coverage).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from pathlib import Path
from unittest.mock import patch

import pytest
msg="txt_k partial-rope output diverges from wrapper oracle")


# Test 4 — drives AC-4 statically: AST walk over NaMMRotaryEmbedding3d.forward
Ruff F401 — `pytest` was imported but never referenced (no decorators, no
fixtures, no parametrize). All 5 tests are plain `def test_*` functions
that pytest discovers via filename + function-name conventions; they
don't need the `pytest` symbol at module scope.

Verified locally: `ruff check` clean; all 5 tests still GREEN.
@pollockjj pollockjj merged commit 2bc7f7e into issue_101 May 8, 2026
6 checks passed
@pollockjj pollockjj deleted the issue_224_seedvr2_native_rope_apply_rope1 branch May 8, 2026 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants