[HWORKS-2682] Drop Grizzly timeout overrides, rely on Payara defaults#595
Closed
o-alex wants to merge 2 commits into
Closed
[HWORKS-2682] Drop Grizzly timeout overrides, rely on Payara defaults#595o-alex wants to merge 2 commits into
o-alex wants to merge 2 commits into
Conversation
… handling and observability https://hopsworks.atlassian.net/browse/HWORKS-2682 The WebSocket proxy in the Hopsworks Payara backend used to run its forward stream direction inline on the calling HTTP thread. Under load the HTTP request pool filled with pinned pumps and all REST traffic stalled. The fix in hopsworks-ee moves both pump directions to a dedicated managed executor, gates new sessions when that pool saturates, and surfaces capacity state in the UI; this site documents the new admin observability surface and explains the user-facing capacity badges. A new admin page under setup_installation/admin/monitoring/websocket-pool.md documents the pool model (two threads per WebSocket connection, the single-owner pool, the zero-length task queue), the Grafana panels (sessions, duration percentiles, rejection rate, pool CPU, pool allocation rate), the MP-Metrics gauges and the rejection counter, the relevant Helm values (corePoolSize, maximumPoolSize, taskQueueCapacity, threadPriority), and the Grizzly idle timeouts. A new user guide under user_guides/projects/jupyter/session_capacity_warnings.md describes the instance and cluster badge matrix (orange WARNING, red CRITICAL, no badge OK), explains where each badge appears (Jupyter server card, terminal panel, apps list), and lists the recovery steps when a badge turns red. mkdocs.yml gets nav entries under Setup -> Administration -> Monitoring and Projects -> Jupyter. Auto-sizing of the pool from worker CPU and memory budget is tracked separately in HWORKS-2829. Reviewed-by: Copilot Signed-off-by: Alex Ormenisan <alex@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
https://hopsworks.atlassian.net/browse/HWORKS-2682 Follow-up on HWORKS-2682 (logicalclocks#593). Trim the admin WebSocket Pool guide to match: the chart no longer overrides Grizzly's request-timeout-seconds or websockets-timeout-seconds, and the page reflects Payara defaults. Empirical evidence that request-timeout-seconds applies to established WebSocket sessions after the 101 handshake is weak; the framing layer's separate websockets-timeout-seconds is what governs the established WebSocket's idle window. The guide now says so and points at the plural attribute as the override knob if a longer idle window is needed. Companion changes: hopsworks-helm#<TBD> drops the chart values + asadmin boot lines; hopsworks-ee#2875 drops the matching baked-in asadmin set commands in docker/payara-server/Dockerfile. Signed-off-by: Alex Ormenisan <alex@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Superseded — commit moved onto the HWORKS-2682 PR (#593) as an additional commit, per preference for separate commits over standalone follow-up PRs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up on logicalclocks.github.io#593. Trim the admin WebSocket Pool guide to match the chart revert: the chart no longer overrides Grizzly's
request-timeout-secondsorwebsockets-timeout-seconds, and the page reflects Payara defaults.Empirical evidence that
request-timeout-secondsapplies to established WebSocket sessions after the 101 handshake is weak; the framing layer's separatewebsockets-timeout-secondsis what governs the established-WebSocket idle window. The guide now says so and points at the plural attribute as the override knob if a longer idle window is needed.Companion PRs
asadmin setcommands indocker/payara-server/Dockerfile.Test plan
touch docs/javadoc && uv run mkdocs build -sclean (the three pre-existing databricks/integrations nav warnings are unrelated).mkdocs serve; cross-link to the user guide still resolves.🤖 Generated with Claude Code