[HWORKS-2829] Auto-size the WebSocket executor pool from worker CPU and memory#594
Draft
o-alex wants to merge 2 commits into
Draft
[HWORKS-2829] Auto-size the WebSocket executor pool from worker CPU and memory#594o-alex wants to merge 2 commits into
o-alex wants to merge 2 commits into
Conversation
09effa2 to
cc39528
Compare
… handling and observability https://hopsworks.atlassian.net/browse/HWORKS-2682 The WebSocket proxy in the Hopsworks Payara backend used to run its forward stream direction inline on the calling HTTP thread. Under load the HTTP request pool filled with pinned pumps and all REST traffic stalled. The fix in hopsworks-ee moves both pump directions to a dedicated managed executor, gates new sessions when that pool saturates, and surfaces capacity state in the UI; this site documents the new admin observability surface and explains the user-facing capacity badges. A new admin page under setup_installation/admin/monitoring/websocket-pool.md documents the pool model (two threads per WebSocket connection, the single-owner pool, the zero-length task queue), the Grafana panels (sessions, duration percentiles, rejection rate, pool CPU, pool allocation rate), the MP-Metrics gauges and the rejection counter, the relevant Helm values (corePoolSize, maximumPoolSize, taskQueueCapacity, threadPriority), and the Grizzly idle timeouts. A new user guide under user_guides/projects/jupyter/session_capacity_warnings.md describes the instance and cluster badge matrix (orange WARNING, red CRITICAL, no badge OK), explains where each badge appears (Jupyter server card, terminal panel, apps list), and lists the recovery steps when a badge turns red. mkdocs.yml gets nav entries under Setup -> Administration -> Monitoring and Projects -> Jupyter. Auto-sizing of the pool from worker CPU and memory budget is tracked separately in HWORKS-2829. Reviewed-by: Copilot Signed-off-by: Alex Ormenisan <alex@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd memory https://hopsworks.atlassian.net/browse/HWORKS-2829 Add the admin-guide documentation for the auto-size path of the WebSocket proxy pool, on top of the fixed-thread documentation that HWORKS-2682 lands. Off by default; this content explains the opt-in. Extend the values.yaml Tuning example with autoSize, threadsPerCore, corePoolSize, and maximumPoolSize annotations so admins reading the fixed-thread block can see the auto-size knobs in context. Add the Auto-size section: when autoSize=true the pool is sized from the worker container's CPU request and limit through the threadsPerCore multipliers, the no-limit branch falls back to requests.cpu * noLimitBurstFactor, and a memory cap derived from the worker's JVM buffer clamps the burst ceiling. Show three representative pod shapes at chart-default settings. Signed-off-by: Alex Ormenisan <alex@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Admin-guide documentation for the auto-size path of the WebSocket proxy pool, on top of the fixed-thread documentation in HWORKS-2682.
Blocked on logicalclocks.github.io#593 (HWORKS-2682) merging first. This branch is built on top of HWORKS-2682, so the diff against
maincurrently includes the HWORKS-2682 docs commits. Once HWORKS-2682 merges, this PR's diff narrows to the autoSize section insetup_installation/admin/monitoring/websocket-pool.md.Extends the
values.yamlTuning example with the auto-size knobs and adds a new Auto-size section that walks the CPU formula, the no-limit fallback, the memory cap, and three sample pod shapes.JIRA: https://hopsworks.atlassian.net/browse/HWORKS-2829
Test plan
touch docs/javadoc && uv run mkdocs build -s; rm docs/javadocsucceeds with no warnings beyond the pre-existing databricks/integrations nav-misses.mkdocs servewith the new Auto-size heading and the three-row sizing table.🤖 Generated with Claude Code