Skip to content

feat: add MiniMax as generation backend#365

Open
octo-patch wants to merge 1 commit intoOpenBMB:mainfrom
octo-patch:feature/add-minimax-provider
Open

feat: add MiniMax as generation backend#365
octo-patch wants to merge 1 commit intoOpenBMB:mainfrom
octo-patch:feature/add-minimax-provider

Conversation

@octo-patch
Copy link
Copy Markdown

@octo-patch octo-patch commented Mar 21, 2026

Summary

Add MiniMax as a first-class LLM generation backend alongside the existing vllm, openai, and hf backends.

MiniMax provides OpenAI-compatible cloud APIs with models featuring up to 1M context windows, making them well-suited for RAG workloads that require processing large retrieved contexts.

Changes

  • servers/generation/src/generation.py — Added minimax backend with:
    • Auto-detection of MINIMAX_API_KEY environment variable
    • Temperature clamping to MiniMax's accepted (0, 1] range
    • Automatic <think>...</think> tag stripping (configurable via strip_think_tags)
    • Default model: MiniMax-M2.7 (1M context)
    • Concurrent request support with exponential backoff retry
    • Two static helper methods: _clamp_temperature() and _strip_think_tags()
  • servers/generation/parameter.yaml — Added MiniMax config section with all available options
  • examples/minimax_rag.yaml — Example RAG pipeline using MiniMax backend
  • examples/parameter/minimax_generation_parameter.yaml — Full parameter reference
  • README.md / docs/README_zh.md — Added "Supported Cloud LLM Backends" table documenting all four backends with MiniMax usage instructions
  • tests/test_minimax_generation.py — 38 unit tests covering temperature clamping, think-tag stripping, initialization, and generation
  • tests/test_minimax_integration.py — 3 integration tests (auto-skipped when MINIMAX_API_KEY is not set)

Supported Models

Model Context Notes
MiniMax-M2.7 1M tokens Latest, default
MiniMax-M2.7-highspeed 1M tokens Fast variant
MiniMax-M2.5 256K tokens Previous generation
MiniMax-M2.5-highspeed 204K tokens Fast, long context

Usage

export MINIMAX_API_KEY="your-api-key"
ultrarag run examples/minimax_rag.yaml

Or set backend: minimax in your generation parameter file.

Test Plan

  • 38 unit tests pass (temperature clamping, think-tag stripping, init validation, mock generation)
  • 3 integration tests pass against live MiniMax API
  • Verify no regression on existing vllm/openai/hf backends

9 files changed, 857 additions(+), 3 deletions(-)

Add MiniMax as a first-class LLM provider in the generation server,
alongside vllm, openai, and hf backends. MiniMax provides
OpenAI-compatible cloud APIs with M2.7 and M2.5 model series.

Features:
- Dedicated minimax backend with auto-detection of MINIMAX_API_KEY
- Temperature clamping to MiniMax's (0, 1] range
- Automatic <think>...</think> tag stripping (configurable)
- Default model: MiniMax-M2.7 (1M context window)
- Concurrent request support with retry logic
- Example YAML pipeline and parameter configuration
- 38 unit tests + 3 integration tests
- Documentation in both English and Chinese READMEs

Supported models: MiniMax-M2.7, MiniMax-M2.7-highspeed,
MiniMax-M2.5, MiniMax-M2.5-highspeed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant