Skip to content

Add load-testing skill to all agent templates: Load Testing Template + Skill + Locust Load Testing Script + Dashboard Generation#184

Open
jennsun wants to merge 3 commits intodatabricks:mainfrom
jennsun:load-testing-skill
Open

Add load-testing skill to all agent templates: Load Testing Template + Skill + Locust Load Testing Script + Dashboard Generation#184
jennsun wants to merge 3 commits intodatabricks:mainfrom
jennsun:load-testing-skill

Conversation

@jennsun
Copy link
Copy Markdown
Contributor

@jennsun jennsun commented Apr 10, 2026

Adds a shared load-testing Claude Code skill that helps users benchmark their Databricks Apps QPS using Locust. The skill guides users through mocking LLM calls, setting up load test scripts, running ramp-to-saturation tests, and viewing interactive dashboards.

See what example resulting app template looks like from this repo:
#180

Users can use skill or run load testing script by running command like:

cd /Users/jenny.sun/app-templates/agent-load-testing/load-test-scripts && \
export DATABRICKS_HOST="https://eng-ml-inference-team-eu-central-1.cloud.databricks.com/" && \
export DATABRICKS_CLIENT_ID="redacted" && \
export DATABRICKS_CLIENT_SECRET="redacted" && \
python3 run_load_test.py \
  --app-url https://agent-load-test-medium-w2-5763476629593792.aws.databricksapps.com/ \
  --app-url https://agent-load-test-medium-w4-5763476629593792.aws.databricksapps.com/ \
  --app-url https://agent-load-test-medium-w6-5763476629593792.aws.databricksapps.com/ \
  --app-url https://agent-load-test-medium-w8-5763476629593792.aws.databricksapps.com/ \
  --app-url https://agent-load-test-large-w6-5763476629593792.aws.databricksapps.com/ \
  --app-url https://agent-load-test-large-w8-5763476629593792.aws.databricksapps.com/ \
  --app-url https://agent-load-test-large-w10-5763476629593792.aws.databricksapps.com/ \
  --app-url https://agent-load-test-large-w12-5763476629593792.aws.databricksapps.com/ \
  --label medium_2w --label medium_4w --label medium_6w --label medium_8w \
  --label large_6w --label large_8w --label large_10w --label large_12w \
  --compute-size medium --compute-size medium --compute-size medium --compute-size medium \
  --compute-size large --compute-size large --compute-size large --compute-size large \
  --max-users 1000 --step-size 20 --step-duration 45 \
  --dashboard --run-name agent_app_1000_load_test

which will allow them to see progress while load tests run
image
image
image

generates HTML dashboard at end
https://agent-app-load-test-results-5763476629593792.aws.databricksapps.com/
load-testing-dashboard

@jennsun jennsun requested review from bbqiu and dhruv0811 April 10, 2026 02:45
@jennsun jennsun changed the title Add load-testing skill to all agent templates Add load-testing skill to all agent templates: Load Testing Template + Skill + Locust Load Testing Script + Dashboard Generation Apr 10, 2026
Comment thread .claude/skills/load-testing/SKILL.md Outdated
Comment thread .claude/skills/load-testing/SKILL.md Outdated
Adds a shared /load-testing Claude Code skill that guides users through
load testing their Databricks Apps to find maximum QPS. Includes mock LLM
setup, Locust test scripts, deployment matrix, and dashboard generation.

- Add source skill at .claude/skills/load-testing/SKILL.md
- Add "load-testing" to sync-skills.py shared skills list
- Sync skill to all 5 registered agent templates
- Whitelist skill in .gitignore for all 7 agent templates (including advanced)
- Use `uv add` instead of `pip install` for dependency management
- Pass --workers directly to start-server (no wrapper script needed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jennsun jennsun force-pushed the load-testing-skill branch from 589c6dc to fce48b0 Compare April 14, 2026 19:55
Comment thread .claude/skills/load-testing/SKILL.md Outdated
Comment thread .claude/skills/load-testing/SKILL.md Outdated
Comment thread .claude/skills/load-testing/SKILL.md Outdated
Comment thread .claude/skills/load-testing/SKILL.md Outdated
Comment thread .claude/skills/load-testing/SKILL.md Outdated
Copy link
Copy Markdown
Contributor

@bbqiu bbqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a few comments!

… specific results

- Make Step 2 (mocking) explicitly optional — users can test real agents E2E
- Lower defaults to keep tests under 1 hour (max_users 300, step_duration 30s)
- Change default workers from 4 to 2
- Switch all commands from `python` to `uv run`
- Remove internal-specific reference results (too specific to mocked LLM case)
- Re-sync skill to all templates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@dhruv0811 dhruv0811 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!! It will be super useful for customers to be able to load test their agents easily now.

Left a few comments, also will need to rebase to use the new templates!

Comment thread .scripts/sync-skills.py
copy_skill(SOURCE / "quickstart", dest / "quickstart", quickstart_subs)

# Shared skills (no substitution needed)
for skill in ["run-locally", "discover-tools", "migrate-from-model-serving"]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we exclude agent-non-conversational from getting this skill maybe? I think the input format we are using in this skill doesn't work with the non-conv template's expectations


### Install Dependencies

```bash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we clarify to either cd load-test-scripts/ or use a separate pyproject.toml, otherwise we end up adding locust and requests to the template's own pyproject.toml which feels like we are polluting the agent's prod setup.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add load-test-runs/<run-name>/ to the gitignore out of the box as well?

Comment thread .claude/skills/load-testing/SKILL.md Outdated
### Install Dependencies

```bash
uv add locust requests
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also seems that latest locust >=2.43 throws a RecursionError? The workaround was pinning locust>=2.32,<2.40 and urllib3<2.3. Maybe we can pin these in the instructions here too?

…gitignore runs

- Exclude agent-non-conversational from load-testing skill (incompatible input format)
- Use separate pyproject.toml in load-test-scripts/ to avoid polluting agent deps
- Pin locust>=2.32,<2.40 and urllib3<2.3 to avoid RecursionError in locust>=2.43
- Add load-test-runs/ to .gitignore in all templates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants