feat(miles): per-engine process-resident GPU residual gate + forward MILES_MAX_RESIDUAL_GPU_MEM_GB)#17
Open
howard989 wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Replace the hardcoded free-memory gate in
MilesPipeline._wait_for_overlap_engines_offloaded()with a configurable residual gate delegated to MILESshrink_engines.The threshold is controlled by:
Default: 3.0 GiB
Sender side: rlops/miles branch
howard/m11-forward-residual-gpu-env-v2.Why
Per @taoluo review (R02-01): "free memory is gpu-model dependent e.g. 24gb vs 80gb gpu. it would be more robust to check the residual memory allocation."
The base used
target_free_gb = 20.0againstnvidia-smi --query-gpu=memory.free, which is GPU-capacity dependent and not portable. The condition we need beforewake_upis not "at least N GB free"; it is "the previous tenant released its GPU memory", i.e. residual allocation.We iterated through a few signals:
memory.free: GPU-capacity dependent.memory.used: includes train actor / neighbor pipeline / unrelated CUDA context./server_infoweight+kvcache+graph: accounting/static-pool size, not resident memory aftertorch_memory_saverpause.On Vast, a slept SGLang engine reported ~9.32 GiB in
/server_infoaccounting but only ~1.81 GiB real process-resident memory innvidia-smi. Therefore the paired MILES PR now gates on each engine's per-process resident GPU memory insideshrink_engines.What This PR Does
rlix/utils/env.pyparse_env_positive_floatrlix/pipeline/miles_coordinator.pyMILES_MAX_RESIDUAL_GPU_MEM_GBinto per-pipeline runtime env3.0shrink_engines(post_sleep_vram_threshold_gb=...)rlix/pipeline/miles_pipeline.pytarget_free_gb = 20.0free-memory hard gatestate == offloadedpolling as the liveness gatenvidia-smi --query-gpu=memory.usedas diagnostic logging onlyDefault 3.0 Rationale
Vast smoke with Qwen2.5-0.5B on RTX 5090 / CUDA 12.9 measured 1.81-1.83 GiB per-engine resident memory after offload. This is mostly non-offloadable CUDA/runtime baseline, not model memory.
2.0leaves only ~0.17 GiB margin, which is too tight across GPU/driver/SGLang versions.3.0leaves ~1.2 GiB margin while still catching large residuals such as an unoffloaded KV pool.This is a smoke-measured heuristic, not a model-derived value. It remains overridable via
MILES_MAX_RESIDUAL_GPU_MEM_GB.Diff Baseline Note
This is a clean branch off latest
zhenyu/miles-mvp-e2e. The closed #11 used10.0as an intermediate whole-GPU residual threshold. This PR's effective change is 20.0 free memory -> 3.0 per-engine process-resident residual.Tests
Result:
E2E Verification
Vast dual smoke with paired MILES branch:
Known shutdown
RolloutManager500 /RemoteProtocolErrorteardown noise appears while residual/generaterequests are cancelled. Training completed and both pipelines reachedshutdown_hard;EXIT_CODE=0.Scope
Gate signal + configurability only. No model-size-derived threshold. Option Beta / hooks are already upstream and untouched.
Refs:
plans/m11-review.review-report/R02.md(R02-01, MEDIUM).