-
Notifications
You must be signed in to change notification settings - Fork 14
ADD verl doc #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
ADD verl doc #92
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| verl | ||
| ============ | ||
|
|
||
| .. toctree:: | ||
| :maxdepth: 2 | ||
|
|
||
| install.rst | ||
| quick_start.rst |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| 安装指南 | ||
| ============== | ||
|
|
||
| 本教程面向使用 verl & Ascend 的开发者,帮助完成昇腾环境下 verl 的安装。 | ||
|
|
||
| 昇腾环境安装 | ||
| ------------ | ||
|
|
||
| 请根据已有昇腾产品型号及 CPU 架构等按照 :doc:`快速安装昇腾环境指引 <../ascend/quick_install>` 进行昇腾环境安装。 | ||
|
|
||
| .. warning:: | ||
| CANN 最低版本为 8.3.RC1,安装 CANN 时,请同时安装 Kernel 算子包以及 nnal 加速库软件包。 | ||
|
|
||
| Python 环境创建 | ||
| ---------------------- | ||
|
|
||
| .. code-block:: shell | ||
| :linenos: | ||
|
|
||
| # 创建名为 verl 的 python 3.11 的虚拟环境 | ||
| conda create -y -n verl python==3.11 | ||
| # 激活虚拟环境 | ||
| conda activate verl | ||
|
|
||
| Torch 安装创建 | ||
| ---------------------- | ||
|
|
||
| .. code-block:: shell | ||
| :linenos: | ||
|
|
||
| # 安装 torch 2.7.1 及 torch-npu 2.7.1 的 CPU 版本 | ||
| pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cpu | ||
|
|
||
| # 安装 torch-npu 2.7.1 | ||
| pip install torch-npu==2.7.1 | ||
|
|
||
| vllm & vllm-ascend 安装 | ||
| ---------------------- | ||
|
|
||
|
|
||
| 方法一:使用以下命令编译安装 vllm 和 vllm-ascend。请注意根据机器类型区分安装方式。 | ||
|
|
||
| .. code-block:: shell | ||
| :linenos: | ||
|
|
||
| # vllm | ||
| git clone -b v0.11.0 --depth 1 https://github.com/vllm-project/vllm.git | ||
| cd vllm | ||
| pip install -r requirements-build.txt | ||
|
|
||
| # for Atlas 200T A2 Box16 | ||
| VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/ | ||
|
|
||
| # for Atlas 900 A2 PODc or Atlas 800T A3 | ||
| VLLM_TARGET_DEVICE=empty pip install -e . | ||
|
|
||
| .. code-block:: shell | ||
| :linenos: | ||
|
|
||
| # vllm-ascend | ||
| git clone -b v0.11.0rc1 --depth 1 https://github.com/vllm-project/vllm-ascend.git | ||
| cd vllm-ascend | ||
| pip install -e . | ||
|
|
||
|
|
||
| 方法二:使用以下命令直接安装预编译好的 vllm 和 vllm-ascend。 | ||
|
|
||
| .. code-block:: shell | ||
| :linenos: | ||
|
|
||
| # Install vllm-project/vllm. The newest supported version is v0.11.0. | ||
| pip install vllm==0.11.0 | ||
|
|
||
| # Install vllm-project/vllm-ascend from pypi. | ||
| pip install vllm-ascend==0.11.0rc1 | ||
|
|
||
| 安装 verl | ||
| ---------------------- | ||
|
|
||
| 使用以下指令安装 verl 及相关依赖: | ||
|
|
||
| .. code-block:: shell | ||
| :linenos: | ||
|
|
||
| git clone https://github.com/volcengine/verl.git | ||
| cd verl | ||
|
|
||
| # Install verl NPU dependencies | ||
| pip install -r requirements-npu.txt | ||
| pip install -e . | ||
|
|
||
|
|
||
| 其他第三方库说明 | ||
| ---------------------- | ||
|
|
||
| +----------------------+---------------------------+ | ||
| | Software | Description | | ||
| +======================+===========================+ | ||
| | transformers | >=v4.57.1 | | ||
| +----------------------+---------------------------+ | ||
| | flash_attn | not supported | | ||
| +----------------------+---------------------------+ | ||
| | liger-kernel | not supported | | ||
| +----------------------+---------------------------+ | ||
|
|
||
|
|
||
| 1. 支持通过 transformers 使能 –flash_attention_2, transformers 需大于等于 4.57.1版本。 | ||
|
|
||
| 2. 不支持通过 flash_attn 使能 flash attention 加速。 | ||
|
|
||
| 3. 不支持 liger-kernel 使能。 | ||
|
|
||
| 4. 针对 x86 服务器,需要安装 cpu 版本的 torchvision。 | ||
|
|
||
| .. code-block:: shell | ||
| :linenos: | ||
|
|
||
| pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| 快速开始 | ||
| ================== | ||
|
|
||
| .. note:: | ||
|
|
||
| 阅读本篇前,请确保已按照 :doc:`安装教程 <./install>` 准备好昇腾环境及 verl 所需的环境。 | ||
|
|
||
| 本篇教程将介绍如何使用 verl 进行快速训练,帮助您快速上手 verl 。 | ||
|
|
||
| 本文档帮助昇腾开发者快速使用 verl × 昇腾 进行 LLM 强化学习训练。可以访问 `这篇官方文档 <https://verl.readthedocs.io/en/latest/start/install.html#>`_ 获取更多信息。 | ||
|
|
||
| 也可以参考官方的 `昇腾快速开始文档 <https://verl.readthedocs.io/en/latest/ascend_tutorial/ascend_quick_start.html>`_ | ||
|
|
||
| 正式使用前,建议通过对 Qwen2.5-0.5B PPO 的训练尝试以检验环境准备和安装的正确性,并熟悉基本的使用流程。 | ||
|
|
||
| 接下来将介绍如何使用单张 NPU 卡使用 verl 进行 PPO 训练: | ||
|
|
||
| 基于 GSM8K 数据集对 Qwen2.5-0.5B 模型进行 PPO 训练 | ||
| ------------------------ | ||
|
|
||
| 使用 GSM8K 数据集 post-train Qwen2.5-0.5B 模型. | ||
|
|
||
| 数据集介绍 | ||
| ^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| GSM8K 是一个包含初等数学问题的数据集,用于 LLM 的数学推理能力的训练或评估。以下是一组 prompt solution 示例: | ||
|
|
||
| Prompt | ||
|
|
||
| James writes a 3-page letter to 2 different friends twice a week. | ||
| How many pages does he write a year? | ||
|
|
||
| Solution | ||
|
|
||
| He writes each friend 3*2=<<3*2=6>>6 pages a week So he writes | ||
| 6*2=<<6*2=12>>12 pages every week That means he writes | ||
| 12*52=<<12*52=624>>624 pages a year #### 624 | ||
|
|
||
| 准备数据集 | ||
| ^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| 用户可以根据实际需要修改 ``--local_save_dir`` 参数指定数据集的保存路径。 | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| python3 examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8k | ||
|
|
||
| 准备模型 | ||
| ^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| 在本实例中,使用 Qwen2.5-0.5B-Instruct 作为基础模型进行 PPO 训练。 | ||
|
|
||
| 用户可以设置 ``VERL_USE_MODELSCOPE=True`` 由 `modelscope <https://www.modelscope.cn>`_ 下载模型。 | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| python3 -c "import transformers; transformers.pipeline('text-generation', model='Qwen/Qwen2.5-0.5B-Instruct')" | ||
|
|
||
| 启动 PPO 训练 | ||
| ^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| **Reward Model/Function** | ||
|
|
||
| 在本实例中,我们使用一个简单的奖励函数来评估生成答案的正确性。我们认为模型产生的位于 “####” 符号后的数值为其给出的答案。 | ||
| 如果该答案与正确答案匹配,则 reward 为 1,否则为 0。 | ||
|
|
||
| 对于其他细节,可以参考 `verl/utils/reward_score/gsm8k.py <https://github.com/volcengine/verl/blob/v0.4.1/verl/utils/reward_score/gsm8k.py>`_. | ||
|
|
||
| **Training Script** | ||
|
|
||
| 根据用户的数据集以及模型的实际位置修改 ``data.train_files`` ,\ ``data.val_files``, ``actor_rollout_ref.model.path`` , ``critic.model.path`` 等参数即可。 | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ | ||
| data.train_files=$HOME/data/gsm8k/train.parquet \ | ||
| data.val_files=$HOME/data/gsm8k/test.parquet \ | ||
| data.train_batch_size=256 \ | ||
| data.max_prompt_length=512 \ | ||
| data.max_response_length=512 \ | ||
| actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ | ||
| actor_rollout_ref.actor.optim.lr=1e-6 \ | ||
| actor_rollout_ref.actor.ppo_mini_batch_size=64 \ | ||
| actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ | ||
| actor_rollout_ref.rollout.name=vllm \ | ||
| actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ | ||
| actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ | ||
| actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ | ||
| actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ | ||
| critic.optim.lr=1e-5 \ | ||
| critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ | ||
| critic.ppo_micro_batch_size_per_gpu=4 \ | ||
| algorithm.kl_ctrl.kl_coef=0.001 \ | ||
| trainer.logger=console \ | ||
| trainer.val_before_train=False \ | ||
| trainer.n_gpus_per_node=1 \ | ||
| trainer.nnodes=1 \ | ||
| trainer.save_freq=10 \ | ||
| trainer.test_freq=10 \ | ||
| trainer.total_epochs=15 \ | ||
| trainer.device=npu 2>&1 | tee verl_demo.log | ||
|
|
||
| 如果顺利配置环境并运行,将看到如下类似的输出: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| step:0 - timing/gen:21.470 - timing/ref:4.360 - timing/values:5.800 - actor/reward_kl_penalty:0.000 - actor/reward_kl_penalty_coeff:0.001 - timing/adv:0.109 - timing/update_critic:15.664 | ||
| - critic/vf_loss:14.947 - critic/vf_clipfrac:0.000 - critic/vpred_mean:-2.056 - critic/grad_norm:1023.278 - critic/lr(1e-4):0.100 - timing/update_actor:20.314 - actor/entropy_loss:0.433 | ||
| - actor/pg_loss:-0.005 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/grad_norm:1.992 - actor/lr(1e-4):0.010 - critic/score/mean:0.004 - critic/score/max:1.000 | ||
| - critic/score/min:0.000 - critic/rewards/mean:0.004 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.000 - critic/advantages/max:2.360 | ||
| - critic/advantages/min:-2.280 - critic/returns/mean:0.003 - critic/returns/max:0.000 - critic/returns/min:0.000 - critic/values/mean:-2.045 - critic/values/max:9.500 | ||
| - critic/values/min:-14.000 - response_length/mean:239.133 - response_length/max:256.000 - response_length/min:77.000 - prompt_length/mean:104.883 - prompt_length/max:175.000 | ||
| - prompt_length/min:68.000 | ||
| step:1 - timing/gen:23.020 - timing/ref:4.322 - timing/values:5.953 - actor/reward_kl_penalty:0.000 - actor/reward_kl_penalty:0.001 - timing/adv:0.118 - timing/update_critic:15.646 | ||
| - critic/vf_loss:18.472 - critic/vf_clipfrac:0.384 - critic/vpred_mean:1.038 - critic/grad_norm:942.924 - critic/lr(1e-4):0.100 - timing/update_actor:20.526 - actor/entropy_loss:0.440 | ||
| - actor/pg_loss:0.000 - actor/pg_clipfrac:0.002 - actor/ppo_kl:0.000 - actor/grad_norm:2.060 - actor/lr(1e-4):0.010 - critic/score/mean:0.000 - critic/score/max:0.000 | ||
| - critic/score/min:0.000 - critic/rewards/mean:0.000 - critic/rewards/max:0.000 - critic/rewards/min:0.000 - critic/advantages/mean:0.000 - critic/advantages/max:2.702 | ||
| - critic/advantages/min:-2.616 - critic/returns/mean:0.000 - critic/returns/max:0.000 - critic/returns/min:0.000 - critic/values/mean:-2.280 - critic/values/max:11.000 | ||
| - critic/values/min:-16.000 - response_length/mean:232.242 - response_length/max:256.000 - response_length/min:91.000 - prompt_length/mean:102.398 - prompt_length/max:185.000 | ||
| - prompt_length/min:70.000 | ||
|
|
||
| References | ||
|
|
||
| .. [1] https://verl.readthedocs.io/en/latest/start/install.html |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not include VLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course it supports VLMs, but this is how they describe it on GitHub:
‘verl is a flexible, efficient, and production-ready RL training library for large language models (LLMs).’
And in practice, VLM support isn’t really one of its standout features — there are many other aspects that matter more.