Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added _static/images/volcano.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 22 additions & 3 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
sources/torchchat/index.rst
sources/torchtitan/index.rst
sources/sglang/index.rst
sources/verl/index.rst


选择您的偏好,并按照 :doc:`快速安装昇腾环境<sources/ascend/quick_install>` 的安装指导进行操作。
Expand Down Expand Up @@ -382,7 +383,7 @@
<div class="img w-16 h-16 rounded-md mr-4" style="background-image: url('_static/images/pytorch.png')"></div>
<div>
<h2 class="text-lg font-semibold">TorchTitan</h2>
<p class="text-gray-600 desc">用于语言大模型训练的PyTorch原生库</p>
<p class="text-gray-600 desc">用于语言大模型训练的 PyTorch 原生库</p>
</div>
</div>
<div class="flex-grow"></div>
Expand All @@ -400,7 +401,7 @@
<div class="img w-16 h-16 rounded-md mr-4" style="background-image: url('_static/images/sglang.png')"></div>
<div>
<h2 class="text-lg font-semibold">SGLang</h2>
<p class="text-gray-600 desc">用于LLM和VLM的高速服务框架</p>
<p class="text-gray-600 desc">用于 LLM 和 VLM 的高速服务框架</p>
</div>
</div>
<div class="flex-grow"></div>
Expand All @@ -411,6 +412,24 @@
<span class="split">|</span>
<a href="sources/sglang/quick_start.html">快速上手</a>
</div>
</div>
</div>
<!-- Card 21 -->
<div class="box rounded-lg p-4 flex flex-col items-center">
<div class="flex items-center mb-4">
<div class="img w-16 h-16 rounded-md mr-4" style="background-image: url('_static/images/volcano.png')"></div>
<div>
<h2 class="text-lg font-semibold">verl</h2>
<p class="text-gray-600 desc">用于 LLM 的强化学习训练库</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not include VLM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course it supports VLMs, but this is how they describe it on GitHub:
‘verl is a flexible, efficient, and production-ready RL training library for large language models (LLMs).’
And in practice, VLM support isn’t really one of its standout features — there are many other aspects that matter more.

</div>
</div>
<div class="flex-grow"></div>
<div class="flex space-x-4 text-blue-600">
<a href="https://github.com/volcengine/verl ">官方链接</a>
<span class="split">|</span>
<a href="sources/verl/install.html">安装指南</a>
<span class="split">|</span>
<a href="sources/verl/quick_start.html">快速上手</a>
</div>
</div>
</div>
</div>
8 changes: 8 additions & 0 deletions sources/verl/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
verl
============

.. toctree::
:maxdepth: 2

install.rst
quick_start.rst
118 changes: 118 additions & 0 deletions sources/verl/install.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
安装指南
==============

本教程面向使用 verl & Ascend 的开发者,帮助完成昇腾环境下 verl 的安装。

昇腾环境安装
------------

请根据已有昇腾产品型号及 CPU 架构等按照 :doc:`快速安装昇腾环境指引 <../ascend/quick_install>` 进行昇腾环境安装。

.. warning::
CANN 最低版本为 8.3.RC1,安装 CANN 时,请同时安装 Kernel 算子包以及 nnal 加速库软件包。

Python 环境创建
----------------------

.. code-block:: shell
:linenos:

# 创建名为 verl 的 python 3.11 的虚拟环境
conda create -y -n verl python==3.11
# 激活虚拟环境
conda activate verl

Torch 安装创建
----------------------

.. code-block:: shell
:linenos:

# 安装 torch 2.7.1 及 torch-npu 2.7.1 的 CPU 版本
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cpu

# 安装 torch-npu 2.7.1
pip install torch-npu==2.7.1

vllm & vllm-ascend 安装
----------------------


方法一:使用以下命令编译安装 vllm 和 vllm-ascend。请注意根据机器类型区分安装方式。

.. code-block:: shell
:linenos:

# vllm
git clone -b v0.11.0 --depth 1 https://github.com/vllm-project/vllm.git
cd vllm
pip install -r requirements-build.txt

# for Atlas 200T A2 Box16
VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/

# for Atlas 900 A2 PODc or Atlas 800T A3
VLLM_TARGET_DEVICE=empty pip install -e .

.. code-block:: shell
:linenos:

# vllm-ascend
git clone -b v0.11.0rc1 --depth 1 https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -e .


方法二:使用以下命令直接安装预编译好的 vllm 和 vllm-ascend。

.. code-block:: shell
:linenos:

# Install vllm-project/vllm. The newest supported version is v0.11.0.
pip install vllm==0.11.0

# Install vllm-project/vllm-ascend from pypi.
pip install vllm-ascend==0.11.0rc1

安装 verl
----------------------

使用以下指令安装 verl 及相关依赖:

.. code-block:: shell
:linenos:

git clone https://github.com/volcengine/verl.git
cd verl

# Install verl NPU dependencies
pip install -r requirements-npu.txt
pip install -e .


其他第三方库说明
----------------------

+----------------------+---------------------------+
| Software | Description |
+======================+===========================+
| transformers | >=v4.57.1 |
+----------------------+---------------------------+
| flash_attn | not supported |
+----------------------+---------------------------+
| liger-kernel | not supported |
+----------------------+---------------------------+


1. 支持通过 transformers 使能 –flash_attention_2, transformers 需大于等于 4.57.1版本。

2. 不支持通过 flash_attn 使能 flash attention 加速。

3. 不支持 liger-kernel 使能。

4. 针对 x86 服务器,需要安装 cpu 版本的 torchvision。

.. code-block:: shell
:linenos:

pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu
124 changes: 124 additions & 0 deletions sources/verl/quick_start.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
快速开始
==================

.. note::

阅读本篇前,请确保已按照 :doc:`安装教程 <./install>` 准备好昇腾环境及 verl 所需的环境。

本篇教程将介绍如何使用 verl 进行快速训练,帮助您快速上手 verl 。

本文档帮助昇腾开发者快速使用 verl × 昇腾 进行 LLM 强化学习训练。可以访问 `这篇官方文档 <https://verl.readthedocs.io/en/latest/start/install.html#>`_ 获取更多信息。

也可以参考官方的 `昇腾快速开始文档 <https://verl.readthedocs.io/en/latest/ascend_tutorial/ascend_quick_start.html>`_

正式使用前,建议通过对 Qwen2.5-0.5B PPO 的训练尝试以检验环境准备和安装的正确性,并熟悉基本的使用流程。

接下来将介绍如何使用单张 NPU 卡使用 verl 进行 PPO 训练:

基于 GSM8K 数据集对 Qwen2.5-0.5B 模型进行 PPO 训练
------------------------

使用 GSM8K 数据集 post-train Qwen2.5-0.5B 模型.

数据集介绍
^^^^^^^^^^^^^^^^^^^^^^

GSM8K 是一个包含初等数学问题的数据集,用于 LLM 的数学推理能力的训练或评估。以下是一组 prompt solution 示例:

Prompt

James writes a 3-page letter to 2 different friends twice a week.
How many pages does he write a year?

Solution

He writes each friend 3*2=<<3*2=6>>6 pages a week So he writes
6*2=<<6*2=12>>12 pages every week That means he writes
12*52=<<12*52=624>>624 pages a year #### 624

准备数据集
^^^^^^^^^^^^^^^^^^^^^^

用户可以根据实际需要修改 ``--local_save_dir`` 参数指定数据集的保存路径。

.. code-block:: bash

python3 examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8k

准备模型
^^^^^^^^^^^^^^^^^^^^^^

在本实例中,使用 Qwen2.5-0.5B-Instruct 作为基础模型进行 PPO 训练。

用户可以设置 ``VERL_USE_MODELSCOPE=True`` 由 `modelscope <https://www.modelscope.cn>`_ 下载模型。

.. code-block:: bash

python3 -c "import transformers; transformers.pipeline('text-generation', model='Qwen/Qwen2.5-0.5B-Instruct')"

启动 PPO 训练
^^^^^^^^^^^^^^^^^^^^^^

**Reward Model/Function**

在本实例中,我们使用一个简单的奖励函数来评估生成答案的正确性。我们认为模型产生的位于 “####” 符号后的数值为其给出的答案。
如果该答案与正确答案匹配,则 reward 为 1,否则为 0。

对于其他细节,可以参考 `verl/utils/reward_score/gsm8k.py <https://github.com/volcengine/verl/blob/v0.4.1/verl/utils/reward_score/gsm8k.py>`_.

**Training Script**

根据用户的数据集以及模型的实际位置修改 ``data.train_files`` ,\ ``data.val_files``, ``actor_rollout_ref.model.path`` , ``critic.model.path`` 等参数即可。

.. code-block:: bash

PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
data.train_files=$HOME/data/gsm8k/train.parquet \
data.val_files=$HOME/data/gsm8k/test.parquet \
data.train_batch_size=256 \
data.max_prompt_length=512 \
data.max_response_length=512 \
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.actor.ppo_mini_batch_size=64 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
critic.optim.lr=1e-5 \
critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
critic.ppo_micro_batch_size_per_gpu=4 \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.logger=console \
trainer.val_before_train=False \
trainer.n_gpus_per_node=1 \
trainer.nnodes=1 \
trainer.save_freq=10 \
trainer.test_freq=10 \
trainer.total_epochs=15 \
trainer.device=npu 2>&1 | tee verl_demo.log

如果顺利配置环境并运行,将看到如下类似的输出:

.. code-block:: bash

step:0 - timing/gen:21.470 - timing/ref:4.360 - timing/values:5.800 - actor/reward_kl_penalty:0.000 - actor/reward_kl_penalty_coeff:0.001 - timing/adv:0.109 - timing/update_critic:15.664
- critic/vf_loss:14.947 - critic/vf_clipfrac:0.000 - critic/vpred_mean:-2.056 - critic/grad_norm:1023.278 - critic/lr(1e-4):0.100 - timing/update_actor:20.314 - actor/entropy_loss:0.433
- actor/pg_loss:-0.005 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/grad_norm:1.992 - actor/lr(1e-4):0.010 - critic/score/mean:0.004 - critic/score/max:1.000
- critic/score/min:0.000 - critic/rewards/mean:0.004 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.000 - critic/advantages/max:2.360
- critic/advantages/min:-2.280 - critic/returns/mean:0.003 - critic/returns/max:0.000 - critic/returns/min:0.000 - critic/values/mean:-2.045 - critic/values/max:9.500
- critic/values/min:-14.000 - response_length/mean:239.133 - response_length/max:256.000 - response_length/min:77.000 - prompt_length/mean:104.883 - prompt_length/max:175.000
- prompt_length/min:68.000
step:1 - timing/gen:23.020 - timing/ref:4.322 - timing/values:5.953 - actor/reward_kl_penalty:0.000 - actor/reward_kl_penalty:0.001 - timing/adv:0.118 - timing/update_critic:15.646
- critic/vf_loss:18.472 - critic/vf_clipfrac:0.384 - critic/vpred_mean:1.038 - critic/grad_norm:942.924 - critic/lr(1e-4):0.100 - timing/update_actor:20.526 - actor/entropy_loss:0.440
- actor/pg_loss:0.000 - actor/pg_clipfrac:0.002 - actor/ppo_kl:0.000 - actor/grad_norm:2.060 - actor/lr(1e-4):0.010 - critic/score/mean:0.000 - critic/score/max:0.000
- critic/score/min:0.000 - critic/rewards/mean:0.000 - critic/rewards/max:0.000 - critic/rewards/min:0.000 - critic/advantages/mean:0.000 - critic/advantages/max:2.702
- critic/advantages/min:-2.616 - critic/returns/mean:0.000 - critic/returns/max:0.000 - critic/returns/min:0.000 - critic/values/mean:-2.280 - critic/values/max:11.000
- critic/values/min:-16.000 - response_length/mean:232.242 - response_length/max:256.000 - response_length/min:91.000 - prompt_length/mean:102.398 - prompt_length/max:185.000
- prompt_length/min:70.000

References

.. [1] https://verl.readthedocs.io/en/latest/start/install.html