feat: add MiniMax as generation backend by octo-patch · Pull Request #365 · OpenBMB/UltraRAG

octo-patch · 2026-03-21T08:29:24Z

Summary

Add MiniMax as a first-class LLM generation backend alongside the existing vllm, openai, and hf backends.

MiniMax provides OpenAI-compatible cloud APIs with models featuring up to 1M context windows, making them well-suited for RAG workloads that require processing large retrieved contexts.

Changes

servers/generation/src/generation.py — Added minimax backend with:
- Auto-detection of MINIMAX_API_KEY environment variable
- Temperature clamping to MiniMax's accepted (0, 1] range
- Automatic <think>...</think> tag stripping (configurable via strip_think_tags)
- Default model: MiniMax-M2.7 (1M context)
- Concurrent request support with exponential backoff retry
- Two static helper methods: _clamp_temperature() and _strip_think_tags()
servers/generation/parameter.yaml — Added MiniMax config section with all available options
examples/minimax_rag.yaml — Example RAG pipeline using MiniMax backend
examples/parameter/minimax_generation_parameter.yaml — Full parameter reference
README.md / docs/README_zh.md — Added "Supported Cloud LLM Backends" table documenting all four backends with MiniMax usage instructions
tests/test_minimax_generation.py — 38 unit tests covering temperature clamping, think-tag stripping, initialization, and generation
tests/test_minimax_integration.py — 3 integration tests (auto-skipped when MINIMAX_API_KEY is not set)

Supported Models

Model	Context	Notes
`MiniMax-M2.7`	1M tokens	Latest, default
`MiniMax-M2.7-highspeed`	1M tokens	Fast variant
`MiniMax-M2.5`	256K tokens	Previous generation
`MiniMax-M2.5-highspeed`	204K tokens	Fast, long context

Usage

export MINIMAX_API_KEY="your-api-key"
ultrarag run examples/minimax_rag.yaml

Or set backend: minimax in your generation parameter file.

Test Plan

38 unit tests pass (temperature clamping, think-tag stripping, init validation, mock generation)
3 integration tests pass against live MiniMax API
Verify no regression on existing vllm/openai/hf backends

9 files changed, 857 additions(+), 3 deletions(-)

Add MiniMax as a first-class LLM provider in the generation server, alongside vllm, openai, and hf backends. MiniMax provides OpenAI-compatible cloud APIs with M2.7 and M2.5 model series. Features: - Dedicated minimax backend with auto-detection of MINIMAX_API_KEY - Temperature clamping to MiniMax's (0, 1] range - Automatic <think>...</think> tag stripping (configurable) - Default model: MiniMax-M2.7 (1M context window) - Concurrent request support with retry logic - Example YAML pipeline and parameter configuration - 38 unit tests + 3 integration tests - Documentation in both English and Chinese READMEs Supported models: MiniMax-M2.7, MiniMax-M2.7-highspeed, MiniMax-M2.5, MiniMax-M2.5-highspeed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add MiniMax as generation backend#365

feat: add MiniMax as generation backend#365
octo-patch wants to merge 1 commit intoOpenBMB:mainfrom
octo-patch:feature/add-minimax-provider

octo-patch commented Mar 21, 2026 •

edited by xhd0728

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Mar 21, 2026 • edited by xhd0728 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Supported Models

Usage

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

octo-patch commented Mar 21, 2026 •

edited by xhd0728

Loading