Draft
Conversation
… slider, qps median and peak
Load test results for agent app with 1000ms simulated latency across medium (2/4/6/8 workers) and large (6/8/10/12 workers) configurations, including dashboard, Locust reports, and analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Across all 5 runs (20 config-runs each):
┌─────────┬──────────────┬─────────┐
│ Compute │ Avg Peak QPS │ Avg QPS │
├─────────┼──────────────┼─────────┤
│ Medium │ 123.5 │ 45.5 │
├─────────┼──────────────┼─────────┤
│ Large │ 278.0 │ 100.1 │
└─────────┴──────────────┴─────────┘
Large is ~2.2x medium on both peak and average QPS.
Recommendations:
than 4 workers. More workers on medium compute likely
causes contention on the limited CPU/memory, so the
overhead of managing extra processes outweighs the
parallelism benefit.
within ~7%), so any of these work. 10w edges out slightly
but it's within noise. I'd recommend 8 workers as a safe
default since it's in the middle of the plateau and avoids
the slight dip at 12w.
The takeaway: on smaller compute, fewer workers is better
(less overhead). On larger compute, there's a sweet spot in
the middle — adding workers beyond that introduces
diminishing returns and eventually contention.
So it looks like on average we’re seeing large compute with ~2x medium on peak (278 peak avg vs 123.5 avg) and average QPS, but more workers is not necessarily always better