-
Notifications
You must be signed in to change notification settings - Fork 221
[train][OpenAI] Make engine_init_kwargs.chat_template config apply to vllm /chat/completions #890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request extends the chat_template configuration for vLLM's /chat/completions endpoint, enabling customization of model behavior such as preserving thinking tokens. However, a critical security vulnerability has been identified: the get_custom_chat_template function loads templates from an unvalidated file path, leading to a Path Traversal vulnerability that can escalate to Remote Code Execution via Server-Side Template Injection due to the use of Jinja2. Detailed remediation is provided in a line comment. Furthermore, there's a minor typo in the configuration, a suggestion to add a default value when popping chat_template from kwargs for robustness, and a potential TypeError in the synchronous VLLMInferenceEngine if chat_template is not properly handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable feature by allowing custom chat templates for vLLM's /chat/completions endpoint, which is particularly useful for models like Qwen3. However, the implementation for loading templates from a file is vulnerable to a high-severity path traversal attack, potentially allowing an attacker to read arbitrary files on the filesystem. This critical security flaw must be addressed by validating file paths to ensure they are within expected directories. Additionally, a critical bug causing a crash with the synchronous inference engine was identified, along with a minor typo in a configuration file and an opportunity to improve code robustness in a test file. Addressing these issues will significantly enhance the security, stability, and reliability of the implementation.
skyrl-train/tests/gpu/gpu_ci/test_inference_engine_client_http_endpoint.py
Show resolved
Hide resolved
… vllm /chat/completions
73f31c8 to
5e8d1f0
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request extends the chat_template configuration to vLLM's /chat/completions endpoint, enhancing support for models like Qwen3. However, it introduces a critical path traversal vulnerability in the file path validation, allowing arbitrary file reads from the local filesystem. Addressing this vulnerability is crucial, and a minor refactoring in the test code could also improve clarity and robustness.
skyrl-train/tests/gpu/gpu_ci/test_inference_engine_client_http_endpoint.py
Show resolved
Hide resolved
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable feature for customizing chat templates in vLLM's /chat/completions endpoint, which is particularly useful for models like Qwen3. The implementation is clean, and the inclusion of a new template, documentation, and a dedicated unit test is excellent. My review focuses on enhancing the security of the new file path validation logic and improving code professionalism.
|
Passed GPU CI: https://github.com/NovaSky-AI/SkyRL/actions/runs/21121604761 |
The goal of this PR is to allow
/chat/completionsrequests to vllm to be able to templated by a custom chat template. One motivation is to not strip thinking tokens when trainingQwen3models.Users can now do it with
+generator.engine_init_kwargs.chat_template=/path/to/templates/qwen3_acc_thinking.jinja2.We enable this by parsing the
engine_init_kwargs.chat_templateconfiguration and passing it in to vllm'sOpenAIServingChatin ourvllm_engine.py.The effect can be seen in the added unit test.
This is only used when:
run_engines_locally: True,backend: "vllm".We add a folder of
skyrl_train/utils/templateto include the Qwen3 template that does not strip thinking tokens (detailed description in the folder's README)The
generator.chat_templatestill only applies toSkyRLGymGenerator.