The output is different when not using deepspeed zero and when using zero3.

For the same input on the first sample, the output from [self.model()](https://github.com/microsoft/GUI-Actor/blob/main/src/gui_actor/modeling_qwen25vl.py#L232) is different under the two configurations: zero3 and not using deepspeed. Also, after [self.attn](https://github.com/microsoft/GUI-Actor/blob/main/src/gui_actor/modeling_qwen25vl.py#L73) followed by mlp, the result is Nan, causing the loss to be 0. The environment is A100.

The output is as follows:

![Image](https://github.com/user-attachments/assets/bd4919b0-6cb2-44bd-b38a-864b3b6c86ec)

![Image](https://github.com/user-attachments/assets/6d27538d-24f4-400f-92cb-94aac7001a6b)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The output is different when not using deepspeed zero and when using zero3. #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The output is different when not using deepspeed zero and when using zero3. #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions