Skip to content

[HWORKS-2682] Drop Grizzly timeout overrides, rely on Payara defaults#595

Closed
o-alex wants to merge 2 commits into
logicalclocks:mainfrom
o-alex:HWORKS-2682-drop-grizzly-timeouts
Closed

[HWORKS-2682] Drop Grizzly timeout overrides, rely on Payara defaults#595
o-alex wants to merge 2 commits into
logicalclocks:mainfrom
o-alex:HWORKS-2682-drop-grizzly-timeouts

Conversation

@o-alex
Copy link
Copy Markdown
Contributor

@o-alex o-alex commented Jun 1, 2026

Summary

Follow-up on logicalclocks.github.io#593. Trim the admin WebSocket Pool guide to match the chart revert: the chart no longer overrides Grizzly's request-timeout-seconds or websockets-timeout-seconds, and the page reflects Payara defaults.

Empirical evidence that request-timeout-seconds applies to established WebSocket sessions after the 101 handshake is weak; the framing layer's separate websockets-timeout-seconds is what governs the established-WebSocket idle window. The guide now says so and points at the plural attribute as the override knob if a longer idle window is needed.

Companion PRs

  • hopsworks-helm#1959 — chart values + boot script revert.
  • hopsworks-ee#2875 — already amended to drop the matching baked-in asadmin set commands in docker/payara-server/Dockerfile.

Test plan

  • touch docs/javadoc && uv run mkdocs build -s clean (the three pre-existing databricks/integrations nav warnings are unrelated).
  • Page renders correctly via mkdocs serve; cross-link to the user guide still resolves.

🤖 Generated with Claude Code

o-alex and others added 2 commits May 31, 2026 19:49
… handling and observability

https://hopsworks.atlassian.net/browse/HWORKS-2682

The WebSocket proxy in the Hopsworks Payara backend used to run its
forward stream direction inline on the calling HTTP thread. Under
load the HTTP request pool filled with pinned pumps and all REST
traffic stalled. The fix in hopsworks-ee moves both pump directions
to a dedicated managed executor, gates new sessions when that pool
saturates, and surfaces capacity state in the UI; this site
documents the new admin observability surface and explains the
user-facing capacity badges.

A new admin page under
setup_installation/admin/monitoring/websocket-pool.md documents the
pool model (two threads per WebSocket connection, the single-owner
pool, the zero-length task queue), the Grafana panels (sessions,
duration percentiles, rejection rate, pool CPU, pool allocation
rate), the MP-Metrics gauges and the rejection counter, the
relevant Helm values (corePoolSize, maximumPoolSize, taskQueueCapacity,
threadPriority), and the Grizzly idle timeouts.

A new user guide under
user_guides/projects/jupyter/session_capacity_warnings.md describes
the instance and cluster badge matrix (orange WARNING, red CRITICAL,
no badge OK), explains where each badge appears (Jupyter server
card, terminal panel, apps list), and lists the recovery steps when
a badge turns red. mkdocs.yml gets nav entries under Setup ->
Administration -> Monitoring and Projects -> Jupyter.

Auto-sizing of the pool from worker CPU and memory budget is
tracked separately in HWORKS-2829.

Reviewed-by: Copilot
Signed-off-by: Alex Ormenisan <alex@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
https://hopsworks.atlassian.net/browse/HWORKS-2682

Follow-up on HWORKS-2682 (logicalclocks#593). Trim the admin WebSocket Pool guide
to match: the chart no longer overrides Grizzly's request-timeout-seconds
or websockets-timeout-seconds, and the page reflects Payara defaults.

Empirical evidence that request-timeout-seconds applies to established
WebSocket sessions after the 101 handshake is weak; the framing layer's
separate websockets-timeout-seconds is what governs the established
WebSocket's idle window. The guide now says so and points at the
plural attribute as the override knob if a longer idle window is
needed.

Companion changes: hopsworks-helm#<TBD> drops the chart values + asadmin
boot lines; hopsworks-ee#2875 drops the matching baked-in asadmin set
commands in docker/payara-server/Dockerfile.

Signed-off-by: Alex Ormenisan <alex@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@o-alex
Copy link
Copy Markdown
Contributor Author

o-alex commented Jun 1, 2026

Superseded — commit moved onto the HWORKS-2682 PR (#593) as an additional commit, per preference for separate commits over standalone follow-up PRs.

@o-alex o-alex closed this Jun 1, 2026
@o-alex o-alex deleted the HWORKS-2682-drop-grizzly-timeouts branch June 1, 2026 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant