System Info
spikes of TPS/User is seen in the IBP graphs, especially for VLLM as seen in the image below. This is due to cpu bottleneck in the sampler, and can be fixed by increasing the number of api workers. See this: https://nvidia.slack.com/archives/C09H79C4MB8/p1772128612444079?thread_ts=1772088513.942399&cid=C09H79C4MB8
This should be fixed in all our supported frameworks.
Who can help?
No response
Information
Tasks
Reproduction
na
Expected behavior
na
actual behavior
na
additional notes
na
Before submitting a new issue...
System Info
spikes of TPS/User is seen in the IBP graphs, especially for VLLM as seen in the image below. This is due to cpu bottleneck in the sampler, and can be fixed by increasing the number of api workers. See this: https://nvidia.slack.com/archives/C09H79C4MB8/p1772128612444079?thread_ts=1772088513.942399&cid=C09H79C4MB8
This should be fixed in all our supported frameworks.
Who can help?
No response
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
na
Expected behavior
na
actual behavior
na
additional notes
na
Before submitting a new issue...