-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When loading zai-org/GLM-Image with device_map="mps" in diffusers, some model parameters become silently corrupted during GlmImagePipeline.from_pretrained call.
The corruption:
Happens only when tensors are placed directly on MPS during loading
Is non-deterministic across dtypes
- float32 + MPS: weights corrupted, bias OK
- float16 + MPS: bias corrupted, weights OK
Does not occur when loading on CPU first and then moving to MPS
This results in extreme values (~1e37), LayerNorm overflow, and NaN / zero outputs (all-black images).
Reproduction
❌ Corrupted
from diffusers.pipelines.glm_image import GlmImagePipeline
import torch
pipe = GlmImagePipeline.from_pretrained(
"zai-org/GLM-Image",
torch_dtype=torch.float32,
device_map="mps",
)✅ Correct workaround
from diffusers.pipelines.glm_image import GlmImagePipeline
import torch
pipe = GlmImagePipeline.from_pretrained(
"zai-org/GLM-Image",
torch_dtype=torch.float32,
)
pipe.to("mps")Logs
Device: mps, dtype: torch.float32
Keyword arguments {'trust_remote_code': True} are not expected by GlmImagePipeline and will be ignored.
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]
Loading weights: 0%| | 0/1011 [00:00<?, ?it/s]�[A
Loading weights: 0%| | 1/1011 [00:01<19:02, 1.13s/it]�[A
Loading weights: 1%|1 | 11/1011 [00:01<01:31, 10.97it/s]�[A
Loading weights: 2%|1 | 17/1011 [00:01<01:00, 16.51it/s]�[A
Loading weights: 2%|2 | 21/1011 [00:01<00:50, 19.57it/s]�[A
Loading weights: 4%|4 | 43/1011 [00:01<00:20, 47.21it/s]�[A
Loading weights: 5%|4 | 50/1011 [00:01<00:21, 44.71it/s]�[A
Loading weights: 7%|6 | 70/1011 [00:02<00:17, 54.72it/s]�[A
Loading weights: 8%|8 | 83/1011 [00:02<00:17, 53.99it/s]�[A
Loading weights: 9%|9 | 96/1011 [00:02<00:15, 57.71it/s]�[A
Loading weights: 11%|# | 109/1011 [00:02<00:15, 59.35it/s]�[A
Loading weights: 12%|#2 | 122/1011 [00:02<00:13, 67.90it/s]�[A
Loading weights: 13%|#3 | 135/1011 [00:03<00:13, 65.96it/s]�[A
Loading weights: 15%|#4 | 148/1011 [00:03<00:14, 61.61it/s]�[A
Loading weights: 16%|#5 | 161/1011 [00:03<00:13, 64.29it/s]�[A
Loading weights: 17%|#7 | 174/1011 [00:03<00:12, 66.14it/s]�[A
Loading weights: 18%|#8 | 187/1011 [00:04<00:13, 59.70it/s]�[A
Loading weights: 20%|#9 | 200/1011 [00:04<00:12, 63.92it/s]�[A
Loading weights: 21%|##1 | 213/1011 [00:04<00:10, 75.40it/s]�[A
Loading weights: 22%|##2 | 226/1011 [00:04<00:11, 66.29it/s]�[A
Loading weights: 24%|##3 | 239/1011 [00:04<00:12, 64.24it/s]�[A
Loading weights: 25%|##4 | 252/1011 [00:05<00:11, 64.00it/s]�[A
Loading weights: 26%|##6 | 265/1011 [00:05<00:11, 66.00it/s]�[A
Loading weights: 27%|##7 | 278/1011 [00:05<00:11, 66.18it/s]�[A
Loading weights: 29%|##8 | 291/1011 [00:05<00:11, 60.74it/s]�[A
Loading weights: 30%|### | 304/1011 [00:05<00:11, 62.85it/s]�[A
Loading weights: 31%|###1 | 317/1011 [00:06<00:11, 63.08it/s]�[A
Loading weights: 33%|###2 | 330/1011 [00:06<00:11, 60.75it/s]�[A
Loading weights: 34%|###3 | 343/1011 [00:06<00:11, 60.35it/s]�[A
Loading weights: 35%|###5 | 356/1011 [00:06<00:10, 62.33it/s]�[A
Loading weights: 36%|###6 | 369/1011 [00:06<00:09, 71.00it/s]�[A
Loading weights: 38%|###7 | 382/1011 [00:07<00:09, 65.62it/s]�[A
Loading weights: 39%|###9 | 395/1011 [00:07<00:09, 65.50it/s]�[A
Loading weights: 40%|#### | 408/1011 [00:07<00:09, 66.05it/s]�[A
Loading weights: 42%|####1 | 421/1011 [00:07<00:09, 64.67it/s]�[A
Loading weights: 43%|####2 | 434/1011 [00:07<00:09, 62.73it/s]�[A
Loading weights: 44%|####4 | 447/1011 [00:08<00:09, 60.69it/s]�[A
Loading weights: 45%|####5 | 460/1011 [00:08<00:08, 63.92it/s]�[A
Loading weights: 47%|####6 | 473/1011 [00:08<00:08, 61.30it/s]�[A
Loading weights: 48%|####8 | 486/1011 [00:08<00:08, 61.70it/s]�[A
Loading weights: 49%|####9 | 499/1011 [00:08<00:08, 61.26it/s]�[A
Loading weights: 56%|#####5 | 565/1011 [00:09<00:02, 160.02it/s]�[A
Loading weights: 61%|###### | 613/1011 [00:09<00:01, 217.39it/s]�[A
Loading weights: 64%|######4 | 649/1011 [00:09<00:01, 246.45it/s]�[A
Loading weights: 69%|######9 | 699/1011 [00:09<00:01, 299.36it/s]�[A
Loading weights: 75%|#######4 | 755/1011 [00:09<00:00, 358.18it/s]�[A
Loading weights: 79%|#######8 | 796/1011 [00:09<00:00, 358.00it/s]�[A
Loading weights: 83%|########3 | 843/1011 [00:09<00:00, 377.86it/s]�[A
Loading weights: 89%|########9 | 901/1011 [00:09<00:00, 414.18it/s]�[A
Loading weights: 94%|#########4| 951/1011 [00:09<00:00, 436.78it/s]�[A
Loading weights: 99%|#########8| 997/1011 [00:10<00:00, 441.35it/s]�[A
Loading weights: 100%|##########| 1011/1011 [00:10<00:00, 100.58it/s]
Loading pipeline components...: 14%|#4 | 1/7 [00:10<01:01, 10.25s/it]
Loading pipeline components...: 29%|##8 | 2/7 [00:11<00:23, 4.78s/it]
Loading weights: 0%| | 0/111 [00:00<?, ?it/s]�[A
Loading weights: 25%|##5 | 28/111 [00:00<00:00, 279.41it/s]�[A
Loading weights: 56%|#####5 | 62/111 [00:00<00:00, 306.42it/s]�[A
Loading weights: 87%|########7 | 97/111 [00:00<00:00, 320.95it/s]�[A
Loading weights: 100%|##########| 111/111 [00:00<00:00, 314.30it/s]
Loading pipeline components...: 43%|####2 | 3/7 [00:11<00:11, 2.77s/it]
Loading pipeline components...: 57%|#####7 | 4/7 [00:11<00:05, 1.72s/it]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]�[A
Loading checkpoint shards: 33%|###3 | 1/3 [00:02<00:04, 2.33s/it]�[A
Loading checkpoint shards: 67%|######6 | 2/3 [00:04<00:02, 2.34s/it]�[A
Loading checkpoint shards: 100%|##########| 3/3 [00:06<00:00, 2.11s/it]�[A
Loading checkpoint shards: 100%|##########| 3/3 [00:06<00:00, 2.17s/it]
Loading pipeline components...: 86%|########5 | 6/7 [00:18<00:02, 2.54s/it]
Loading pipeline components...: 100%|##########| 7/7 [00:18<00:00, 1.98s/it]
Loading pipeline components...: 100%|##########| 7/7 [00:18<00:00, 2.67s/it]
=== Transformer top-level children ===
rope: GlmImageRotaryPosEmbed
image_projector: GlmImageImageProjector
glyph_projector: FeedForward
prior_token_embedding: Embedding
prior_projector: FeedForward
time_condition_embed: GlmImageCombinedTimestepSizeEmbeddings
transformer_blocks: ModuleList
norm_out: GlmImageAdaLayerNormContinuous
proj_out: Linear
Hooking 30 transformer_blocks individually...
=== Block[0] sub-modules ===
block0.norm1: GlmImageAdaLayerNormZero
block0.norm1.norm: LayerNorm
block0.norm1.norm_context: LayerNorm
block0.norm1.linear: Linear
block0.attn1: Attention
block0.attn1.norm_q: LayerNorm
block0.attn1.norm_k: LayerNorm
block0.attn1.to_q: Linear
block0.attn1.to_k: Linear
block0.attn1.to_v: Linear
block0.attn1.to_out: ModuleList
block0.norm2: LayerNorm
block0.norm2_context: LayerNorm
block0.ff: FeedForward
block0.ff.net: ModuleList
=== Running 1-step inference ===
0%| | 0/1 [00:00<?, ?it/s] OK rope output[0]: shape=[4608, 128] min=-1 max=1
OK rope output[1]: shape=[4608, 128] min=-1 max=1
OK rope INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
*** NaN in image_projector output[0]: shape=[1, 4608, 4096] NaN=41472/18874368 clean_min=-2.985e+30 clean_max=3.325e+30
OK image_projector INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
OK glyph_projector output[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
OK glyph_projector INPUT[0]: shape=[1, 1, 1472] min=-0.3495 max=0.4175
OK prior_token_embedding output[0]: shape=[1, 4608, 4096] min=-0.1943 max=0.1709
OK prior_token_embedding INPUT[0]: shape=[1, 4608] min=149 max=1.632e+04
OK prior_projector output[0]: shape=[1, 4608, 4096] min=-0.2291 max=0.1654
OK prior_projector INPUT[0]: shape=[1, 4608, 4096] min=-0.1943 max=0.1709
*** NaN in time_condition_embed output[0]: shape=[1, 512] NaN=512/512
OK time_condition_embed INPUT[0]: shape=[1] min=999 max=999
OK time_condition_embed INPUT[1]: shape=[1, 2] min=1024 max=1152
OK time_condition_embed INPUT[2]: shape=[1, 2] min=0 max=0
*** NaN in block0.norm1.norm output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.norm1.norm INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
OK block0.norm1.norm_context output[0]: shape=[1, 1, 4096] min=-6.959 max=4.698
OK block0.norm1.norm_context INPUT[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
*** NaN in block0.norm1.linear output[0]: shape=[1, 49152] NaN=49152/49152
*** NaN in block0.norm1.linear INPUT[0]: shape=[1, 512] NaN=512/512
*** NaN in block0.norm1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.norm1 output[1]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[2]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[3]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[4]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[5]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[6]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[7]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[8]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[9]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
OK block0.norm1 INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
*** NaN in block0.norm1 INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in block0.attn1.to_q output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.to_q INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.to_k output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.to_k INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.to_v output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.to_v INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.norm_q output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
*** NaN in block0.attn1.norm_q INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
*** NaN in block0.attn1.norm_k output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
*** NaN in block0.attn1.norm_k INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
*** NaN in block0.attn1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.attn1 output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in block0.norm2 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.norm2 INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.norm2_context output[0]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in block0.norm2_context INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in block0.ff output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.ff INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.ff output[0]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in block0.ff INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[0] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[0] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[0] INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
OK transformer_blocks[0] INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
*** NaN in transformer_blocks[0] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[1] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[1] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[1] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[1] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[1] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[2] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[2] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[2] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[2] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[2] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[3] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[3] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[3] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[3] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[3] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[4] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[4] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[4] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[4] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[4] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[5] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[5] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[5] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[5] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[5] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[6] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[6] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[6] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[6] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[6] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[7] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[7] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[7] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[7] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[7] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[8] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[8] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[8] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[8] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[8] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[9] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[9] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[9] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[9] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[9] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[10] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[10] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[10] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[10] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[10] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[11] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[11] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[11] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[11] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[11] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[12] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[12] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[12] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[12] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[12] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[13] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[13] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[13] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[13] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[13] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[14] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[14] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[14] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[14] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[14] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[15] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[15] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[15] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[15] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[15] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[16] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[16] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[16] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[16] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[16] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[17] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[17] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[17] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[17] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[17] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[18] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[18] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[18] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[18] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[18] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[19] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[19] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[19] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[19] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[19] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[20] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[20] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[20] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[20] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[20] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[21] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[21] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[21] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[21] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[21] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[22] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[22] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[22] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[22] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[22] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[23] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[23] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[23] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[23] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[23] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[24] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[24] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[24] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[24] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[24] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[25] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[25] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[25] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[25] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[25] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[26] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[26] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[26] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[26] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[26] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[27] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[27] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[27] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[27] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[27] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[28] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[28] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[28] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[28] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[28] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[29] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[29] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[29] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[29] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[29] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in norm_out output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in norm_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in norm_out INPUT[1]: shape=[1, 512] NaN=512/512
*** NaN in proj_out output[0]: shape=[1, 4608, 64] NaN=294912/294912
*** NaN in proj_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
OK rope output[0]: shape=[4608, 128] min=-1 max=1
OK rope output[1]: shape=[4608, 128] min=-1 max=1
OK rope INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
*** NaN in image_projector output[0]: shape=[1, 4608, 4096] NaN=41472/18874368 clean_min=-2.985e+30 clean_max=3.325e+30
OK image_projector INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
OK glyph_projector output[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
OK glyph_projector INPUT[0]: shape=[1, 1, 1472] min=-0.3495 max=0.4175
OK prior_token_embedding output[0]: shape=[1, 4608, 4096] min=-0.1943 max=0.1709
OK prior_token_embedding INPUT[0]: shape=[1, 4608] min=149 max=1.632e+04
OK prior_projector output[0]: shape=[1, 4608, 4096] min=-0.2224 max=0.1475
OK prior_projector INPUT[0]: shape=[1, 4608, 4096] min=-0 max=0
*** NaN in time_condition_embed output[0]: shape=[1, 512] NaN=512/512
OK time_condition_embed INPUT[0]: shape=[1] min=999 max=999
OK time_condition_embed INPUT[1]: shape=[1, 2] min=1024 max=1152
OK time_condition_embed INPUT[2]: shape=[1, 2] min=0 max=0
*** NaN in block0.norm1.norm output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.norm1.norm INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
OK block0.norm1.norm_context output[0]: shape=[1, 1, 4096] min=-6.959 max=4.698
OK block0.norm1.norm_context INPUT[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
*** NaN in block0.norm1.linear output[0]: shape=[1, 49152] NaN=49152/49152
*** NaN in block0.norm1.linear INPUT[0]: shape=[1, 512] NaN=512/512
*** NaN in block0.norm1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.norm1 output[1]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[2]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[3]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[4]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[5]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[6]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[7]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[8]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 output[9]: shape=[1, 4096] NaN=4096/4096
*** NaN in block0.norm1 INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
OK block0.norm1 INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
*** NaN in block0.norm1 INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in block0.attn1.to_q output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.to_q INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.to_k output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.to_k INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.to_v output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.to_v INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
*** NaN in block0.attn1.norm_q output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
*** NaN in block0.attn1.norm_q INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
*** NaN in block0.attn1.norm_k output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
*** NaN in block0.attn1.norm_k INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
*** NaN in block0.attn1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.attn1 output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in block0.norm2 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.norm2 INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.norm2_context output[0]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in block0.norm2_context INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in block0.ff output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.ff INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in block0.ff output[0]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in block0.ff INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[0] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[0] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[0] INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
OK transformer_blocks[0] INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
*** NaN in transformer_blocks[0] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[1] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[1] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[1] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[1] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[1] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[2] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[2] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[2] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[2] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[2] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[3] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[3] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[3] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[3] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[3] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[4] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[4] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[4] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[4] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[4] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[5] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[5] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[5] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[5] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[5] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[6] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[6] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[6] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[6] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[6] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[7] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[7] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[7] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[7] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[7] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[8] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[8] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[8] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[8] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[8] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[9] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[9] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[9] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[9] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[9] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[10] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[10] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[10] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[10] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[10] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[11] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[11] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[11] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[11] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[11] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[12] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[12] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[12] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[12] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[12] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[13] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[13] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[13] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[13] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[13] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[14] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[14] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[14] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[14] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[14] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[15] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[15] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[15] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[15] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[15] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[16] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[16] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[16] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[16] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[16] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[17] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[17] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[17] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[17] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[17] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[18] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[18] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[18] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[18] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[18] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[19] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[19] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[19] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[19] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[19] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[20] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[20] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[20] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[20] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[20] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[21] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[21] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[21] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[21] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[21] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[22] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[22] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[22] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[22] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[22] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[23] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[23] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[23] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[23] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[23] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[24] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[24] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[24] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[24] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[24] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[25] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[25] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[25] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[25] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[25] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[26] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[26] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[26] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[26] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[26] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[27] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[27] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[27] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[27] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[27] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[28] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[28] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[28] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[28] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[28] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in transformer_blocks[29] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[29] output[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[29] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in transformer_blocks[29] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
*** NaN in transformer_blocks[29] INPUT[2]: shape=[1, 512] NaN=512/512
*** NaN in norm_out output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in norm_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
*** NaN in norm_out INPUT[1]: shape=[1, 512] NaN=512/512
*** NaN in proj_out output[0]: shape=[1, 4608, 64] NaN=294912/294912
*** NaN in proj_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
100%|##########| 1/1 [01:09<00:00, 69.09s/it]
100%|##########| 1/1 [01:09<00:00, 69.09s/it]
Final latents NaN: True
Layers with NaN: ['image_projector', 'time_condition_embed', 'block0.norm1.norm', 'block0.norm1.linear', 'block0.norm1', 'block0.attn1.to_q', 'block0.attn1.to_k', 'block0.attn1.to_v', 'block0.attn1.norm_q', 'block0.attn1.norm_k', 'block0.attn1', 'block0.norm2', 'block0.norm2_context', 'block0.ff', 'transformer_blocks[0]', 'transformer_blocks[1]', 'transformer_blocks[2]', 'transformer_blocks[3]', 'transformer_blocks[4]', 'transformer_blocks[5]', 'transformer_blocks[6]', 'transformer_blocks[7]', 'transformer_blocks[8]', 'transformer_blocks[9]', 'transformer_blocks[10]', 'transformer_blocks[11]', 'transformer_blocks[12]', 'transformer_blocks[13]', 'transformer_blocks[14]', 'transformer_blocks[15]', 'transformer_blocks[16]', 'transformer_blocks[17]', 'transformer_blocks[18]', 'transformer_blocks[19]', 'transformer_blocks[20]', 'transformer_blocks[21]', 'transformer_blocks[22]', 'transformer_blocks[23]', 'transformer_blocks[24]', 'transformer_blocks[25]', 'transformer_blocks[26]', 'transformer_blocks[27]', 'transformer_blocks[28]', 'transformer_blocks[29]', 'norm_out', 'proj_out']
Done.System Info
macOS (Apple Silicon), MPS backend
Python 3.12
PyTorch with MPS support
diffusers 0.37.0 (GlmImagePipeline)
torch 2.10.0
Model: zai-org/GLM-Image (bf16 safetensors)
Who can help?
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working