Skip to content

[Bug] GlmImagePipeline silently corrupts weights on MPS accelerator #13227

@yingding

Description

@yingding

Describe the bug

When loading zai-org/GLM-Image with device_map="mps" in diffusers, some model parameters become silently corrupted during GlmImagePipeline.from_pretrained call.

The corruption:

Happens only when tensors are placed directly on MPS during loading
Is non-deterministic across dtypes
  • float32 + MPS: weights corrupted, bias OK
  • float16 + MPS: bias corrupted, weights OK

Does not occur when loading on CPU first and then moving to MPS

This results in extreme values (~1e37), LayerNorm overflow, and NaN / zero outputs (all-black images).

Reproduction

❌ Corrupted

from diffusers.pipelines.glm_image import GlmImagePipeline
import torch

pipe = GlmImagePipeline.from_pretrained(
    "zai-org/GLM-Image",
    torch_dtype=torch.float32,
    device_map="mps",
)

✅ Correct workaround

from diffusers.pipelines.glm_image import GlmImagePipeline
import torch

pipe = GlmImagePipeline.from_pretrained(
    "zai-org/GLM-Image",
    torch_dtype=torch.float32,
)
pipe.to("mps")

Logs

Device: mps, dtype: torch.float32
Keyword arguments {'trust_remote_code': True} are not expected by GlmImagePipeline and will be ignored.

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/1011 [00:00<?, ?it/s]�[A

Loading weights:   0%|          | 1/1011 [00:01<19:02,  1.13s/it]�[A

Loading weights:   1%|1         | 11/1011 [00:01<01:31, 10.97it/s]�[A

Loading weights:   2%|1         | 17/1011 [00:01<01:00, 16.51it/s]�[A

Loading weights:   2%|2         | 21/1011 [00:01<00:50, 19.57it/s]�[A

Loading weights:   4%|4         | 43/1011 [00:01<00:20, 47.21it/s]�[A

Loading weights:   5%|4         | 50/1011 [00:01<00:21, 44.71it/s]�[A

Loading weights:   7%|6         | 70/1011 [00:02<00:17, 54.72it/s]�[A

Loading weights:   8%|8         | 83/1011 [00:02<00:17, 53.99it/s]�[A

Loading weights:   9%|9         | 96/1011 [00:02<00:15, 57.71it/s]�[A

Loading weights:  11%|#         | 109/1011 [00:02<00:15, 59.35it/s]�[A

Loading weights:  12%|#2        | 122/1011 [00:02<00:13, 67.90it/s]�[A

Loading weights:  13%|#3        | 135/1011 [00:03<00:13, 65.96it/s]�[A

Loading weights:  15%|#4        | 148/1011 [00:03<00:14, 61.61it/s]�[A

Loading weights:  16%|#5        | 161/1011 [00:03<00:13, 64.29it/s]�[A

Loading weights:  17%|#7        | 174/1011 [00:03<00:12, 66.14it/s]�[A

Loading weights:  18%|#8        | 187/1011 [00:04<00:13, 59.70it/s]�[A

Loading weights:  20%|#9        | 200/1011 [00:04<00:12, 63.92it/s]�[A

Loading weights:  21%|##1       | 213/1011 [00:04<00:10, 75.40it/s]�[A

Loading weights:  22%|##2       | 226/1011 [00:04<00:11, 66.29it/s]�[A

Loading weights:  24%|##3       | 239/1011 [00:04<00:12, 64.24it/s]�[A

Loading weights:  25%|##4       | 252/1011 [00:05<00:11, 64.00it/s]�[A

Loading weights:  26%|##6       | 265/1011 [00:05<00:11, 66.00it/s]�[A

Loading weights:  27%|##7       | 278/1011 [00:05<00:11, 66.18it/s]�[A

Loading weights:  29%|##8       | 291/1011 [00:05<00:11, 60.74it/s]�[A

Loading weights:  30%|###       | 304/1011 [00:05<00:11, 62.85it/s]�[A

Loading weights:  31%|###1      | 317/1011 [00:06<00:11, 63.08it/s]�[A

Loading weights:  33%|###2      | 330/1011 [00:06<00:11, 60.75it/s]�[A

Loading weights:  34%|###3      | 343/1011 [00:06<00:11, 60.35it/s]�[A

Loading weights:  35%|###5      | 356/1011 [00:06<00:10, 62.33it/s]�[A

Loading weights:  36%|###6      | 369/1011 [00:06<00:09, 71.00it/s]�[A

Loading weights:  38%|###7      | 382/1011 [00:07<00:09, 65.62it/s]�[A

Loading weights:  39%|###9      | 395/1011 [00:07<00:09, 65.50it/s]�[A

Loading weights:  40%|####      | 408/1011 [00:07<00:09, 66.05it/s]�[A

Loading weights:  42%|####1     | 421/1011 [00:07<00:09, 64.67it/s]�[A

Loading weights:  43%|####2     | 434/1011 [00:07<00:09, 62.73it/s]�[A

Loading weights:  44%|####4     | 447/1011 [00:08<00:09, 60.69it/s]�[A

Loading weights:  45%|####5     | 460/1011 [00:08<00:08, 63.92it/s]�[A

Loading weights:  47%|####6     | 473/1011 [00:08<00:08, 61.30it/s]�[A

Loading weights:  48%|####8     | 486/1011 [00:08<00:08, 61.70it/s]�[A

Loading weights:  49%|####9     | 499/1011 [00:08<00:08, 61.26it/s]�[A

Loading weights:  56%|#####5    | 565/1011 [00:09<00:02, 160.02it/s]�[A

Loading weights:  61%|######    | 613/1011 [00:09<00:01, 217.39it/s]�[A

Loading weights:  64%|######4   | 649/1011 [00:09<00:01, 246.45it/s]�[A

Loading weights:  69%|######9   | 699/1011 [00:09<00:01, 299.36it/s]�[A

Loading weights:  75%|#######4  | 755/1011 [00:09<00:00, 358.18it/s]�[A

Loading weights:  79%|#######8  | 796/1011 [00:09<00:00, 358.00it/s]�[A

Loading weights:  83%|########3 | 843/1011 [00:09<00:00, 377.86it/s]�[A

Loading weights:  89%|########9 | 901/1011 [00:09<00:00, 414.18it/s]�[A

Loading weights:  94%|#########4| 951/1011 [00:09<00:00, 436.78it/s]�[A

Loading weights:  99%|#########8| 997/1011 [00:10<00:00, 441.35it/s]�[A
Loading weights: 100%|##########| 1011/1011 [00:10<00:00, 100.58it/s]

Loading pipeline components...:  14%|#4        | 1/7 [00:10<01:01, 10.25s/it]
Loading pipeline components...:  29%|##8       | 2/7 [00:11<00:23,  4.78s/it]

Loading weights:   0%|          | 0/111 [00:00<?, ?it/s]�[A

Loading weights:  25%|##5       | 28/111 [00:00<00:00, 279.41it/s]�[A

Loading weights:  56%|#####5    | 62/111 [00:00<00:00, 306.42it/s]�[A

Loading weights:  87%|########7 | 97/111 [00:00<00:00, 320.95it/s]�[A
Loading weights: 100%|##########| 111/111 [00:00<00:00, 314.30it/s]

Loading pipeline components...:  43%|####2     | 3/7 [00:11<00:11,  2.77s/it]
Loading pipeline components...:  57%|#####7    | 4/7 [00:11<00:05,  1.72s/it]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]�[A

Loading checkpoint shards:  33%|###3      | 1/3 [00:02<00:04,  2.33s/it]�[A

Loading checkpoint shards:  67%|######6   | 2/3 [00:04<00:02,  2.34s/it]�[A

Loading checkpoint shards: 100%|##########| 3/3 [00:06<00:00,  2.11s/it]�[A
Loading checkpoint shards: 100%|##########| 3/3 [00:06<00:00,  2.17s/it]

Loading pipeline components...:  86%|########5 | 6/7 [00:18<00:02,  2.54s/it]
Loading pipeline components...: 100%|##########| 7/7 [00:18<00:00,  1.98s/it]
Loading pipeline components...: 100%|##########| 7/7 [00:18<00:00,  2.67s/it]

=== Transformer top-level children ===
  rope: GlmImageRotaryPosEmbed
  image_projector: GlmImageImageProjector
  glyph_projector: FeedForward
  prior_token_embedding: Embedding
  prior_projector: FeedForward
  time_condition_embed: GlmImageCombinedTimestepSizeEmbeddings
  transformer_blocks: ModuleList
  norm_out: GlmImageAdaLayerNormContinuous
  proj_out: Linear
  Hooking 30 transformer_blocks individually...

=== Block[0] sub-modules ===
  block0.norm1: GlmImageAdaLayerNormZero
    block0.norm1.norm: LayerNorm
    block0.norm1.norm_context: LayerNorm
    block0.norm1.linear: Linear
  block0.attn1: Attention
    block0.attn1.norm_q: LayerNorm
    block0.attn1.norm_k: LayerNorm
    block0.attn1.to_q: Linear
    block0.attn1.to_k: Linear
    block0.attn1.to_v: Linear
    block0.attn1.to_out: ModuleList
  block0.norm2: LayerNorm
  block0.norm2_context: LayerNorm
  block0.ff: FeedForward
    block0.ff.net: ModuleList

=== Running 1-step inference ===

  0%|          | 0/1 [00:00<?, ?it/s]  OK  rope output[0]: shape=[4608, 128] min=-1 max=1
  OK  rope output[1]: shape=[4608, 128] min=-1 max=1
  OK  rope INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
  *** NaN in image_projector output[0]: shape=[1, 4608, 4096] NaN=41472/18874368 clean_min=-2.985e+30 clean_max=3.325e+30
  OK  image_projector INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
  OK  glyph_projector output[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  OK  glyph_projector INPUT[0]: shape=[1, 1, 1472] min=-0.3495 max=0.4175
  OK  prior_token_embedding output[0]: shape=[1, 4608, 4096] min=-0.1943 max=0.1709
  OK  prior_token_embedding INPUT[0]: shape=[1, 4608] min=149 max=1.632e+04
  OK  prior_projector output[0]: shape=[1, 4608, 4096] min=-0.2291 max=0.1654
  OK  prior_projector INPUT[0]: shape=[1, 4608, 4096] min=-0.1943 max=0.1709
  *** NaN in time_condition_embed output[0]: shape=[1, 512] NaN=512/512
  OK  time_condition_embed INPUT[0]: shape=[1] min=999 max=999
  OK  time_condition_embed INPUT[1]: shape=[1, 2] min=1024 max=1152
  OK  time_condition_embed INPUT[2]: shape=[1, 2] min=0 max=0
  *** NaN in block0.norm1.norm output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm1.norm INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  block0.norm1.norm_context output[0]: shape=[1, 1, 4096] min=-6.959 max=4.698
  OK  block0.norm1.norm_context INPUT[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in block0.norm1.linear output[0]: shape=[1, 49152] NaN=49152/49152
  *** NaN in block0.norm1.linear INPUT[0]: shape=[1, 512] NaN=512/512
  *** NaN in block0.norm1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm1 output[1]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[2]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[3]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[4]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[5]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[6]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[7]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[8]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[9]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  block0.norm1 INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in block0.norm1 INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in block0.attn1.to_q output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_q INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_k output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_k INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_v output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_v INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_q output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_q INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_k output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_k INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.attn1 output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm2 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm2 INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm2_context output[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm2_context INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.ff output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.ff INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.ff output[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.ff INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[0] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[0] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[0] INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  transformer_blocks[0] INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in transformer_blocks[0] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[1] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[1] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[1] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[1] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[1] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[2] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[2] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[2] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[2] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[2] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[3] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[3] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[3] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[3] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[3] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[4] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[4] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[4] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[4] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[4] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[5] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[5] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[5] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[5] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[5] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[6] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[6] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[6] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[6] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[6] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[7] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[7] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[7] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[7] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[7] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[8] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[8] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[8] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[8] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[8] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[9] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[9] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[9] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[9] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[9] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[10] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[10] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[10] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[10] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[10] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[11] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[11] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[11] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[11] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[11] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[12] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[12] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[12] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[12] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[12] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[13] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[13] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[13] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[13] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[13] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[14] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[14] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[14] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[14] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[14] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[15] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[15] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[15] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[15] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[15] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[16] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[16] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[16] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[16] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[16] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[17] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[17] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[17] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[17] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[17] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[18] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[18] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[18] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[18] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[18] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[19] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[19] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[19] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[19] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[19] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[20] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[20] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[20] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[20] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[20] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[21] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[21] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[21] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[21] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[21] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[22] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[22] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[22] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[22] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[22] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[23] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[23] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[23] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[23] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[23] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[24] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[24] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[24] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[24] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[24] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[25] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[25] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[25] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[25] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[25] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[26] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[26] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[26] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[26] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[26] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[27] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[27] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[27] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[27] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[27] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[28] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[28] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[28] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[28] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[28] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[29] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[29] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[29] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[29] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[29] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in norm_out output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in norm_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in norm_out INPUT[1]: shape=[1, 512] NaN=512/512
  *** NaN in proj_out output[0]: shape=[1, 4608, 64] NaN=294912/294912
  *** NaN in proj_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  OK  rope output[0]: shape=[4608, 128] min=-1 max=1
  OK  rope output[1]: shape=[4608, 128] min=-1 max=1
  OK  rope INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
  *** NaN in image_projector output[0]: shape=[1, 4608, 4096] NaN=41472/18874368 clean_min=-2.985e+30 clean_max=3.325e+30
  OK  image_projector INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
  OK  glyph_projector output[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  OK  glyph_projector INPUT[0]: shape=[1, 1, 1472] min=-0.3495 max=0.4175
  OK  prior_token_embedding output[0]: shape=[1, 4608, 4096] min=-0.1943 max=0.1709
  OK  prior_token_embedding INPUT[0]: shape=[1, 4608] min=149 max=1.632e+04
  OK  prior_projector output[0]: shape=[1, 4608, 4096] min=-0.2224 max=0.1475
  OK  prior_projector INPUT[0]: shape=[1, 4608, 4096] min=-0 max=0
  *** NaN in time_condition_embed output[0]: shape=[1, 512] NaN=512/512
  OK  time_condition_embed INPUT[0]: shape=[1] min=999 max=999
  OK  time_condition_embed INPUT[1]: shape=[1, 2] min=1024 max=1152
  OK  time_condition_embed INPUT[2]: shape=[1, 2] min=0 max=0
  *** NaN in block0.norm1.norm output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm1.norm INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  block0.norm1.norm_context output[0]: shape=[1, 1, 4096] min=-6.959 max=4.698
  OK  block0.norm1.norm_context INPUT[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in block0.norm1.linear output[0]: shape=[1, 49152] NaN=49152/49152
  *** NaN in block0.norm1.linear INPUT[0]: shape=[1, 512] NaN=512/512
  *** NaN in block0.norm1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm1 output[1]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[2]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[3]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[4]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[5]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[6]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[7]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[8]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[9]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  block0.norm1 INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in block0.norm1 INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in block0.attn1.to_q output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_q INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_k output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_k INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_v output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_v INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_q output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_q INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_k output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_k INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.attn1 output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm2 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm2 INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm2_context output[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm2_context INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.ff output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.ff INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.ff output[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.ff INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[0] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[0] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[0] INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  transformer_blocks[0] INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in transformer_blocks[0] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[1] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[1] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[1] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[1] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[1] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[2] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[2] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[2] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[2] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[2] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[3] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[3] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[3] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[3] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[3] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[4] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[4] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[4] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[4] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[4] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[5] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[5] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[5] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[5] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[5] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[6] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[6] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[6] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[6] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[6] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[7] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[7] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[7] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[7] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[7] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[8] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[8] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[8] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[8] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[8] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[9] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[9] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[9] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[9] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[9] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[10] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[10] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[10] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[10] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[10] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[11] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[11] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[11] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[11] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[11] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[12] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[12] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[12] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[12] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[12] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[13] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[13] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[13] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[13] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[13] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[14] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[14] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[14] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[14] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[14] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[15] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[15] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[15] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[15] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[15] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[16] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[16] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[16] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[16] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[16] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[17] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[17] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[17] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[17] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[17] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[18] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[18] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[18] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[18] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[18] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[19] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[19] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[19] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[19] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[19] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[20] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[20] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[20] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[20] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[20] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[21] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[21] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[21] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[21] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[21] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[22] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[22] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[22] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[22] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[22] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[23] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[23] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[23] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[23] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[23] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[24] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[24] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[24] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[24] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[24] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[25] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[25] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[25] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[25] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[25] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[26] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[26] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[26] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[26] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[26] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[27] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[27] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[27] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[27] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[27] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[28] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[28] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[28] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[28] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[28] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[29] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[29] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[29] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[29] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[29] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in norm_out output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in norm_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in norm_out INPUT[1]: shape=[1, 512] NaN=512/512
  *** NaN in proj_out output[0]: shape=[1, 4608, 64] NaN=294912/294912
  *** NaN in proj_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368

100%|##########| 1/1 [01:09<00:00, 69.09s/it]
100%|##########| 1/1 [01:09<00:00, 69.09s/it]

Final latents NaN: True
Layers with NaN: ['image_projector', 'time_condition_embed', 'block0.norm1.norm', 'block0.norm1.linear', 'block0.norm1', 'block0.attn1.to_q', 'block0.attn1.to_k', 'block0.attn1.to_v', 'block0.attn1.norm_q', 'block0.attn1.norm_k', 'block0.attn1', 'block0.norm2', 'block0.norm2_context', 'block0.ff', 'transformer_blocks[0]', 'transformer_blocks[1]', 'transformer_blocks[2]', 'transformer_blocks[3]', 'transformer_blocks[4]', 'transformer_blocks[5]', 'transformer_blocks[6]', 'transformer_blocks[7]', 'transformer_blocks[8]', 'transformer_blocks[9]', 'transformer_blocks[10]', 'transformer_blocks[11]', 'transformer_blocks[12]', 'transformer_blocks[13]', 'transformer_blocks[14]', 'transformer_blocks[15]', 'transformer_blocks[16]', 'transformer_blocks[17]', 'transformer_blocks[18]', 'transformer_blocks[19]', 'transformer_blocks[20]', 'transformer_blocks[21]', 'transformer_blocks[22]', 'transformer_blocks[23]', 'transformer_blocks[24]', 'transformer_blocks[25]', 'transformer_blocks[26]', 'transformer_blocks[27]', 'transformer_blocks[28]', 'transformer_blocks[29]', 'norm_out', 'proj_out']
Done.

System Info

macOS (Apple Silicon), MPS backend
Python 3.12
PyTorch with MPS support
diffusers 0.37.0 (GlmImagePipeline)
torch 2.10.0
Model: zai-org/GLM-Image (bf16 safetensors)

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions