Agents load testing results by jennsun · Pull Request #182 · databricks/app-templates

jennsun · 2026-04-03T13:56:54Z

Across all 5 runs (20 config-runs each):
┌─────────┬──────────────┬─────────┐
│ Compute │ Avg Peak QPS │ Avg QPS │
├─────────┼──────────────┼─────────┤
│ Medium │ 123.5 │ 45.5 │
├─────────┼──────────────┼─────────┤
│ Large │ 278.0 │ 100.1 │
└─────────┴──────────────┴─────────┘

Large is ~2.2x medium on both peak and average QPS.

Recommendations:

Medium: 2 workers — clearly the best, 33% higher peak QPS
than 4 workers. More workers on medium compute likely
causes contention on the limited CPU/memory, so the
overhead of managing extra processes outweighs the
parallelism benefit.
Large: 6-10 workers — the differences are small (268-288,
within ~7%), so any of these work. 10w edges out slightly
but it's within noise. I'd recommend 8 workers as a safe
default since it's in the middle of the plateau and avoids
the slight dip at 12w.

The takeaway: on smaller compute, fewer workers is better
(less overhead). On larger compute, there's a sweet spot in
the middle — adding workers beyond that introduces
diminishing returns and eventually contention.

So it looks like on average we’re seeing large compute with ~2x medium on peak (278 peak avg vs 123.5 avg) and average QPS, but more workers is not necessarily always better

… slider, qps median and peak

Load test results for agent app with 1000ms simulated latency across medium (2/4/6/8 workers) and large (6/8/10/12 workers) configurations, including dashboard, Locust reports, and analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jennsun and others added 8 commits March 31, 2026 08:51

fmapi benchmark results

23d3d05

fmapi responses api results / http

b0d6acb

mockopenai app for load testing

fda845e

only use delayed streams for post-tool client stream

d4c80be

load testing skill

e323e69

update dashboard template - req/s for more visible failures, max user…

466a595

… slider, qps median and peak

Add agent_app_1000_load_test results

4b427ff

Load test results for agent app with 1000ms simulated latency across medium (2/4/6/8 workers) and large (6/8/10/12 workers) configurations, including dashboard, Locust reports, and analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

r1 to r5 results

7fdf576

jennsun changed the title ~~Agents load testing results - agent_app_1000_load_test~~ Agents load testing results Apr 6, 2026

medium 3 workers

16af453

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agents load testing results#182

Agents load testing results#182
jennsun wants to merge 9 commits intodatabricks:mainfrom
jennsun:agents-load-testing-results

jennsun commented Apr 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jennsun commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jennsun commented Apr 3, 2026 •

edited

Loading