Skip to content

FELIX-6828 Fix whiteboard startup race that produces permanent 404s#503

Open
royteeuwen wants to merge 1 commit into
apache:masterfrom
royteeuwen:whiteboard-startup-race-fix
Open

FELIX-6828 Fix whiteboard startup race that produces permanent 404s#503
royteeuwen wants to merge 1 commit into
apache:masterfrom
royteeuwen:whiteboard-startup-race-fix

Conversation

@royteeuwen
Copy link
Copy Markdown

Reorder WhiteboardManager.stop() so trackers close before webContext is nulled, fold the remaining field clears under the contextMap lock, add a webContext null guard at the top of WhiteboardContextHandler.activate(), make every handler.getRegistry() chain in WhiteboardManager TOCTOU-safe by reading the registry into a local once and null-checking the local, and guard against an uninstalled registering bundle in unregisterWhiteboardService.

Reorder WhiteboardManager.stop() so trackers close before webContext
is nulled, fold the remaining field clears under the contextMap lock,
add a webContext null guard at the top of WhiteboardContextHandler.activate(),
make every handler.getRegistry() chain in WhiteboardManager TOCTOU-safe
by reading the registry into a local once and null-checking the local,
and guard against an uninstalled registering bundle in
unregisterWhiteboardService.
@paulrutter
Copy link
Copy Markdown
Contributor

Review: FELIX-6828 whiteboard startup race fix

Single commit, 2 files changed: WhiteboardContextHandler.java and WhiteboardManager.java. Builds on the prior 00e2640c37 shutdown-NPE fix already on master.

What the patch does (correctly)

1. stop() reordering + lock coverage (WhiteboardManager.java:198-227)

  • Closes trackers before nulling webContext, so any in-flight removedServiceremoveContextHelper callback still sees a valid webContext and webContext != null activate-next branch.
  • Wraps webContext = null and all map clears in synchronized(contextMap), so no other operation can observe partially-cleared state. The comment is accurate and useful.

2. addContextHelper null guard (WhiteboardManager.java:383-388)

  • Reads webContext into a local under the lock and bails with false if null. Prevents constructing a handler with a null @NotNull context — the actual fix for the startup/shutdown race that previously NPE'd deep inside activate() and left things in an unrecoverable state.

3. TOCTOU-safe getRegistry() reads

All 7 sites that chain handler.getRegistry().getEventListenerRegistry()... or handler.getRegistry().registerXxx(...) now read the volatile registry once into a local and null-check it. Grep confirms no remaining unguarded chains. The registry field is set to null in both WhiteboardContextHandler.deactivate() and on a failed activate(), so this is a real race even within the contextMap-synchronized paths (the volatile read isn't pinned by that lock — activate() itself transiently leaves registry non-null only after a fresh assignment).

4. Uninstalled-bundle guard in unregisterWhiteboardService (WhiteboardManager.java:881-927)

  • info.getServiceReference().getBundle() returns null when the registering bundle has been uninstalled (common at framework shutdown). The old code passed that directly to ungetServletContext(@NotNull Bundle) → NPE. Fix reads once, null-checks before each ungetServletContext call. Correct.
  • Also early-returns with a FAILURE_REASON_SERVLET_CONTEXT_FAILURE failure when reg == null in registerWhiteboardService — paired with removeWhiteboardService's failureStateHandler.remove(info, ctxId) check, so the un-incremented perBundleContextMap counter is not later decremented. Symmetry checks out.

Minor observations

  • Dead-code defensive check in WhiteboardContextHandler.activate():90-93: webContext is final and the constructor declares @NotNull, plus the only caller path now null-checks before construction. The new guard is unreachable in practice. It's harmless belt-and-suspenders, but if you want to keep it, a one-liner comment ("@NotNull is not runtime-enforced") would be worth its presence.

  • Not addressed (likely out of scope): addWhiteboardService for PreprocessorInfo at WhiteboardManager.java:581-583 still reads this.webContext unsynchronized and unguarded, then constructs a PreprocessorHandler with it. Same race shape as the SCH path, fewer consequences (no permanent 404), but inconsistent with the rest of the fix. Worth a follow-up.

  • Style: matches existing project conventions (spaces inside parens, @Nullable return on getRegistry(), imports grouped properly). New Bundle and PerContextHandlerRegistry imports are needed and clean.

  • Tests: no test coverage for WhiteboardManager exists in the repo (http/base/src/test has none for whiteboard). Race conditions like this are notoriously hard to unit-test, but it's worth noting the safety net is absent.

Verdict

Solid, well-scoped defensive patch. The reordering, lock extension, TOCTOU pattern, and Bundle null guard are all correct and address real failure modes. I'd recommend either deleting the unreachable webContext == null check in WhiteboardContextHandler.activate() or annotating it, and filing a follow-up for the preprocessor path. Otherwise: ready.

@royteeuwen can you look at the minor observations? Especially the dead-code part?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants