Skip to content

auto-derive max_req_total_len from model config#1297

Open
Owleye4 wants to merge 9 commits intoModelTC:mainfrom
Owleye4:main
Open

auto-derive max_req_total_len from model config#1297
Owleye4 wants to merge 9 commits intoModelTC:mainfrom
Owleye4:main

Conversation

@Owleye4
Copy link
Copy Markdown

@Owleye4 Owleye4 commented May 8, 2026

Auto-derive max_req_total_len from model_dir/config.json at API start time.
If derivation fails, fall back to the previous default value to keep existing behavior.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements automatic derivation of max_req_total_len from model configurations, replacing the previous hardcoded default. It introduces logic to handle various RoPE scaling types and ensures consistency across processes by publishing the effective, KV-cache-clamped value via shared memory. Documentation has been updated to reflect these changes. Feedback suggests using canonical paths for cached configuration lookups, defining the safety margin for token clamping as a named constant, and considering a more dynamic cap for CUDA graph capture lengths.

Comment thread lightllm/utils/config_utils.py Outdated
Comment thread lightllm/server/router/manager.py Outdated
Comment thread lightllm/server/api_start.py Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements automatic derivation of the --max_req_total_len parameter from model configurations, such as max_sequence_length or max_position_embeddings adjusted by RoPE scaling factors. It introduces logic to soft-clamp this value against the actual KV-cache pool capacity and uses shared memory to synchronize the effective limit across different server processes. Additionally, the changes include early S3 model preparation and updates to documentation. Feedback was provided suggesting that a hardcoded margin of 8 used during KV pool clamping should be replaced with a named constant to improve code maintainability.

Comment thread lightllm/server/router/manager.py Outdated
@Owleye4 Owleye4 force-pushed the main branch 2 times, most recently from 127de7b to 698e0b7 Compare May 8, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants