Skip to content

[HWORKS-2829] Auto-size the WebSocket executor pool from worker CPU and memory#594

Draft
o-alex wants to merge 2 commits into
logicalclocks:mainfrom
o-alex:HWORKS-2829
Draft

[HWORKS-2829] Auto-size the WebSocket executor pool from worker CPU and memory#594
o-alex wants to merge 2 commits into
logicalclocks:mainfrom
o-alex:HWORKS-2829

Conversation

@o-alex
Copy link
Copy Markdown
Contributor

@o-alex o-alex commented May 31, 2026

Summary

Admin-guide documentation for the auto-size path of the WebSocket proxy pool, on top of the fixed-thread documentation in HWORKS-2682.

Blocked on logicalclocks.github.io#593 (HWORKS-2682) merging first. This branch is built on top of HWORKS-2682, so the diff against main currently includes the HWORKS-2682 docs commits. Once HWORKS-2682 merges, this PR's diff narrows to the autoSize section in setup_installation/admin/monitoring/websocket-pool.md.

Extends the values.yaml Tuning example with the auto-size knobs and adds a new Auto-size section that walks the CPU formula, the no-limit fallback, the memory cap, and three sample pod shapes.

JIRA: https://hopsworks.atlassian.net/browse/HWORKS-2829

Test plan

  • touch docs/javadoc && uv run mkdocs build -s; rm docs/javadoc succeeds with no warnings beyond the pre-existing databricks/integrations nav-misses.
  • Page renders in mkdocs serve with the new Auto-size heading and the three-row sizing table.

🤖 Generated with Claude Code

@o-alex o-alex force-pushed the HWORKS-2829 branch 2 times, most recently from 09effa2 to cc39528 Compare May 31, 2026 16:34
o-alex and others added 2 commits May 31, 2026 19:49
… handling and observability

https://hopsworks.atlassian.net/browse/HWORKS-2682

The WebSocket proxy in the Hopsworks Payara backend used to run its
forward stream direction inline on the calling HTTP thread. Under
load the HTTP request pool filled with pinned pumps and all REST
traffic stalled. The fix in hopsworks-ee moves both pump directions
to a dedicated managed executor, gates new sessions when that pool
saturates, and surfaces capacity state in the UI; this site
documents the new admin observability surface and explains the
user-facing capacity badges.

A new admin page under
setup_installation/admin/monitoring/websocket-pool.md documents the
pool model (two threads per WebSocket connection, the single-owner
pool, the zero-length task queue), the Grafana panels (sessions,
duration percentiles, rejection rate, pool CPU, pool allocation
rate), the MP-Metrics gauges and the rejection counter, the
relevant Helm values (corePoolSize, maximumPoolSize, taskQueueCapacity,
threadPriority), and the Grizzly idle timeouts.

A new user guide under
user_guides/projects/jupyter/session_capacity_warnings.md describes
the instance and cluster badge matrix (orange WARNING, red CRITICAL,
no badge OK), explains where each badge appears (Jupyter server
card, terminal panel, apps list), and lists the recovery steps when
a badge turns red. mkdocs.yml gets nav entries under Setup ->
Administration -> Monitoring and Projects -> Jupyter.

Auto-sizing of the pool from worker CPU and memory budget is
tracked separately in HWORKS-2829.

Reviewed-by: Copilot
Signed-off-by: Alex Ormenisan <alex@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd memory

https://hopsworks.atlassian.net/browse/HWORKS-2829

Add the admin-guide documentation for the auto-size path of the
WebSocket proxy pool, on top of the fixed-thread documentation that
HWORKS-2682 lands. Off by default; this content explains the opt-in.

Extend the values.yaml Tuning example with autoSize, threadsPerCore,
corePoolSize, and maximumPoolSize annotations so admins reading the
fixed-thread block can see the auto-size knobs in context. Add the
Auto-size section: when autoSize=true the pool is sized from the
worker container's CPU request and limit through the threadsPerCore
multipliers, the no-limit branch falls back to requests.cpu *
noLimitBurstFactor, and a memory cap derived from the worker's JVM
buffer clamps the burst ceiling. Show three representative pod
shapes at chart-default settings.

Signed-off-by: Alex Ormenisan <alex@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant