feat: add OpenAI /v1/completions adapter for vLLM gpt-oss-120b accuracy by arekay-nv · Pull Request #308 · mlcommons/endpoints

arekay-nv · 2026-05-09T11:37:56Z

Adds APIType.OPENAI_COMPLETIONS routing to /v1/completions, which accepts pre-tokenized token ID arrays and bypasses vLLM's chat template — required for gpt-oss-120b where the Harmony format must be applied client-side.

Add APIType.OPENAI_COMPLETIONS with default_route "/v1/completions"
Add TextCompletionRequest/Response/SSE msgspec types
Add OpenAITextCompletionsAdapter (mirrors SGLang adapter, reuses OpenAISSEAccumulator)
Register adapter and accumulator in endpoint_client/config.py
Rename gptoss → gptoss_sglang presets; add gptoss_vllm across aime25/gpqa/livecodebench
Update sglang_gptoss_120b_example.yaml to use gptoss_sglang presets
Update vllm_gptoss_120b_example.yaml to use openai_completions + gptoss_vllm presets
Add 18 unit tests covering adapter, SSE, preset existence, and APIType integration

fix: move lazy test imports to module level; fix decode_sse_message return type

Move all inline imports in test_completions_adapter.py to file-level
Add test for empty-text SSE choice path
Fix HttpRequestAdapter.decode_sse_message abstract annotation from str -> Any (SGLang and completions adapters both return SSEDelta structs, not str)

examples/04_GPTOSS120B_Example/Readme.md:

Replace stale chat-completions note with accurate openai_completions description
Update performance-only vLLM api_type reference from "openai" to "openai_completions"

What does this PR do?

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

Adds APIType.OPENAI_COMPLETIONS routing to /v1/completions, which accepts pre-tokenized token ID arrays and bypasses vLLM's chat template — required for gpt-oss-120b where the Harmony format must be applied client-side. - Add APIType.OPENAI_COMPLETIONS with default_route "/v1/completions" - Add TextCompletionRequest/Response/SSE msgspec types - Add OpenAITextCompletionsAdapter (mirrors SGLang adapter, reuses OpenAISSEAccumulator) - Register adapter and accumulator in endpoint_client/config.py - Rename gptoss → gptoss_sglang presets; add gptoss_vllm across aime25/gpqa/livecodebench - Update sglang_gptoss_120b_example.yaml to use gptoss_sglang presets - Update vllm_gptoss_120b_example.yaml to use openai_completions + gptoss_vllm presets - Add 18 unit tests covering adapter, SSE, preset existence, and APIType integration fix: move lazy test imports to module level; fix decode_sse_message return type - Move all inline imports in test_completions_adapter.py to file-level - Add test for empty-text SSE choice path - Fix HttpRequestAdapter.decode_sse_message abstract annotation from str -> Any (SGLang and completions adapters both return SSEDelta structs, not str) examples/04_GPTOSS120B_Example/Readme.md: - Replace stale chat-completions note with accurate openai_completions description - Update performance-only vLLM api_type reference from "openai" to "openai_completions"

github-actions · 2026-05-09T11:38:10Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request introduces a new openai_completions API type and adapter to support the OpenAI /v1/completions endpoint, enabling the use of pre-tokenized input with vLLM. This change allows users to bypass server-side chat templates, ensuring parity with SGLang results for specific models like gpt-oss-120b. The implementation includes the OpenAITextCompletionsAdapter, updated configuration templates, documentation, and new unit tests. I have no feedback to provide.

nvzhihanj · 2026-05-11T05:35:45Z

    """

    OPENAI = "openai"
+    OPENAI_COMPLETIONS = "openai_completions"


We might want to be explicit and say it's v1 completion (vs v1_chat_completions). Anticipate OAI to come up with some new template in the future and we would have to refactor here as well (also above)

nvzhihanj · 2026-05-11T05:37:08Z

+def gptoss_sglang() -> list[Transform]:
+    return [UserPromptFormatter(user_prompt_format=_FORMAT)]
+

-def gptoss() -> list[Transform]:
-    return [
-        UserPromptFormatter(
-            user_prompt_format=(
-                "You are a python coding expert that solves problems step-by-step.\n"
-                "You must provide the reasoning to arriving at your solution and the code to solve the problem.\n"
-                "Do not try simulating the code execution. The code must be enclosed within ```python delimiters.\n\n\n"
-                "{question}\n"
-                "### Format: You will use the following starter code to write the solution to the problem and enclose your code within delimiters.\n"
-                "```python\n"
-                "{starter_code}\n"
-                "```\n"
-            ),
-        ),
-    ]
+def gptoss_vllm() -> list[Transform]:
+    return [UserPromptFormatter(user_prompt_format=_FORMAT)]


This seems like a duplicate, and different function calling the same implementation

nvzhihanj · 2026-05-11T05:37:25Z


-def gptoss() -> list[Transform]:
-    return [
-        # Step 1: Format the prompt from question and choices


Seems like a duplicate

nvzhihanj · 2026-05-11T05:43:14Z

+def gptoss_sglang() -> list[Transform]:
+    return [UserPromptFormatter(user_prompt_format=_FORMAT)]
+
+
+def gptoss_vllm() -> list[Transform]:
+    return [UserPromptFormatter(user_prompt_format=_FORMAT)]


It seems like the tranformation is the same between sglang and vLLM, and the dinstinct behavior here is whether we pre-tokenize or not. Using gptoss_ will be misleading.

And the part which determines the pre-tokenization is the api_type: "openai_completions" not the trasnformation here right? Seems like we should deduplicate these 2.

nvzhihanj · 2026-05-11T05:43:47Z

    - "http://localhost:8000"
  api_key: null
-  api_type: "openai"
+  api_type: "openai_completions"


Am I understanding correctly this flag is where we control pre-tokenization?

nvzhihanj · 2026-05-11T05:44:09Z

+│   ├── types.py               # OpenAI response types (chat + text completion)
+│   ├── openai_adapter.py      # Chat completions adapter (/v1/chat/completions)
+│   ├── openai_msgspec_adapter.py  # msgspec-based chat completions adapter (fast path)
+│   ├── completions_adapter.py # Text completions adapter (/v1/completions, pre-tokenized input)


v1 complettion adapter

nvzhihanj · 2026-05-11T05:44:30Z

-│   ├── openai_msgspec_adapter.py  # msgspec-based adapter (fast path)
-│   ├── accumulator.py         # Streaming response accumulator
+│   ├── types.py               # OpenAI response types (chat + text completion)
+│   ├── openai_adapter.py      # Chat completions adapter (/v1/chat/completions)


Probably want to rename this as v1_chat_completion-adapter as well (not a must have for this PR)

nvzhihanj · 2026-05-11T05:45:48Z

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""OpenAI /v1/completions adapter for vLLM with pre-tokenized prompts."""


Probably shouldn't say it's for vLLM?

github-actions Bot requested a review from nvzhihanj May 9, 2026 11:38

arekay-nv requested review from nv-alicheng and viraatc May 9, 2026 11:38

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

arekay-nv requested a review from tianmu-li May 9, 2026 11:40

arekay-nv marked this pull request as ready for review May 11, 2026 02:38

arekay-nv requested a review from a team May 11, 2026 02:38

arekay-nv mentioned this pull request May 11, 2026

[Feature]: openai/gpt-oss-120b reference implementation #298

Open

nvzhihanj reviewed May 11, 2026

View reviewed changes

nvzhihanj approved these changes May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add OpenAI /v1/completions adapter for vLLM gpt-oss-120b accuracy#308

feat: add OpenAI /v1/completions adapter for vLLM gpt-oss-120b accuracy#308
arekay-nv wants to merge 1 commit intomainfrom
arekay/openai-completions-adapter

arekay-nv commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

nvzhihanj May 11, 2026

Uh oh!

nvzhihanj May 11, 2026

Uh oh!

nvzhihanj May 11, 2026

Uh oh!

nvzhihanj May 11, 2026

Uh oh!

nvzhihanj May 11, 2026

Uh oh!

nvzhihanj May 11, 2026

Uh oh!

nvzhihanj May 11, 2026

Uh oh!

nvzhihanj May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arekay-nv commented May 9, 2026

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants