Skip to content

Conversation

@JunnYu
Copy link
Member

@JunnYu JunnYu commented Dec 9, 2025

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

PR changes

Description

为了对齐PaddleFormers的Qwen2.5,特此进行了修改。

  • 对齐rope旋转位置编码精度,提升精度采用FP32进行计算。
  • 对齐fuse配置逻辑,开启fuse_swiglu, 开启 fuse_rms_norm
  • 对齐fuse rms norm的实现,均采用框架实现(关闭fast math编译)
  • 对齐LM Head的权重shape
    • paddleformers的实现:logits = paddle.matmul(x, weight, transpose_y=True) weight的shape[vocab_size, hidden_size]
    • paddlenlp的实现:
      • logits = paddle.matmul(x, weight, transpose_y=False) weight的shape[hidden_size, vocab_size]
      • [实现1] 如果将 logits = paddle.matmul(x, weight.t(), transpose_y=True) weight的shape[hidden_size, vocab_size] 这样实现,前向精度能保持一致,但是反向的时候在训练过程中如果开启main_grad后,梯度不一致。
      • [实现2] 因此,必须修改PaddleNLP的组网,使得paddlenlp的weight的shape与paddleformers要一模一样!

@paddle-bot
Copy link

paddle-bot bot commented Dec 9, 2025

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant