Skip to content

Add TLS OCSP stapling support with certificate store integration#6

Draft
andypost wants to merge 1 commit into
masterfrom
claude/tls-modernization-unit-NQehK
Draft

Add TLS OCSP stapling support with certificate store integration#6
andypost wants to merge 1 commit into
masterfrom
claude/tls-modernization-unit-NQehK

Conversation

@andypost
Copy link
Copy Markdown
Owner

@andypost andypost commented May 7, 2026

Proposed changes

This PR adds TLS OCSP (Online Certificate Status Protocol) stapling support to FreeUnit. OCSP stapling allows servers to proactively provide certificate status information during the TLS handshake, improving client privacy and reducing latency compared to client-side OCSP checks.

Key features:

  • New ocsp_staple boolean option on listener TLS configuration
  • Automatic loading of .ocsp DER-encoded response files from the certificate store (sibling to certificate files)
  • OpenSSL integration via SSL_CTX_set_tlsext_status_cb() for status extension handling
  • Graceful fallback when OCSP response file is missing (no error, handshake proceeds without staple)
  • Per-handshake buffer copying to comply with OpenSSL's memory ownership semantics

Implementation details:

  • nxt_openssl_ocsp_load(): Loads and validates .ocsp DER files (max 64KB) from disk during server initialization
  • nxt_openssl_ocsp_status_cb(): OpenSSL callback that provides the cached OCSP response during TLS handshake
  • nxt_cert_store_get_ocsp() / nxt_cert_store_get_ocsp_handler(): New RPC handlers for router to fetch OCSP responses from the main process certificate store
  • New port message type NXT_PORT_MSG_CERT_OCSP_GET for inter-process communication
  • Configuration validation for the ocsp_staple boolean in listener TLS settings

Test coverage:

  • test/test_tls_ocsp.py: Comprehensive test suite covering:
    • OCSP response generation and signing with OpenSSL tooling
    • Staple served when enabled and response file present
    • Staple omitted when disabled
    • Graceful handling of missing response files
    • Verification via openssl s_client -status

Checklist

  • I have read CONTRIBUTING.md
  • I have added tests (test/test_tls_ocsp.py with 3 test cases)
  • I have updated documentation (CHANGES file, config schema validation)

https://claude.ai/code/session_011sJSp3rLUtFdM1A1phEGLJ

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces TLS OCSP stapling support, enabling Unit to serve DER-encoded OCSP responses from files located in the certificate store. The implementation includes build-time feature detection, configuration schema updates, and asynchronous logic in the router to fetch and load OCSP data into the OpenSSL context. Additionally, a comprehensive implementation plan for a future task scheduler and /run endpoint is added to the roadmap. Feedback identifies several critical issues: a missing task initialization in file operations that could cause crashes during error logging, and multiple scenarios where file descriptors for certificates or OCSP responses could be leaked during configuration failures.

Comment thread src/nxt_openssl.c
Comment thread src/nxt_router.c
Comment thread src/nxt_router.c
Comment thread src/nxt_router.c
@andypost andypost force-pushed the claude/tls-modernization-unit-NQehK branch 2 times, most recently from 974d59d to 78c9545 Compare May 7, 2026 21:29
@andypost
Copy link
Copy Markdown
Owner Author

andypost commented May 7, 2026

@gemini-code-assist any more reviews?

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces TLS OCSP stapling support, enabling the server to serve OCSP responses from the certificate store during handshakes. Key updates include configuration validation, OpenSSL status callbacks, and response validation logic to ensure OCSP data is valid and current. Review feedback correctly identified potential file descriptor leaks in the router's RPC handler; specifically, if memory allocation or string duplication fails, the incoming file descriptor is not closed, which could lead to resource exhaustion.

Comment thread src/nxt_router.c
Comment thread src/nxt_router.c
@andypost andypost changed the base branch from roadmap to master May 7, 2026 21:54
@andypost andypost force-pushed the claude/tls-modernization-unit-NQehK branch 3 times, most recently from 5d35b44 to 1cede4b Compare May 7, 2026 22:01
@andypost
Copy link
Copy Markdown
Owner Author

andypost commented May 7, 2026

@gemini-code-assist any other suggestions?

@gemini-code-assist
Copy link
Copy Markdown

The implementation of TLS OCSP stapling is thorough and well-integrated. The test coverage in test/test_tls_ocsp.py effectively validates the core functionality, including error handling for missing files and invalid DER content. One minor suggestion for consideration: in nxt_openssl_ocsp_validate, the OCSP_check_validity function uses a hardcoded 300-second skew. While this is a reasonable default, ensure that this aligns with your expected deployment environments, especially if servers are expected to operate with significant clock drift or in environments where stricter validation is required.

@andypost
Copy link
Copy Markdown
Owner Author

andypost commented May 7, 2026

Pre-existing leaks in the cert/script store IPC pattern

This PR copies an existing pattern that has two latent leaks under malloc failure / RPC-stream exhaustion. They are not introduced by this PR, but the OCSP twins inherit them. Documenting here for a follow-up.

Leak A — sender side: unbalanced nxt_mp_retain

Affected call sites

File Line Function
src/nxt_cert.c 1117 nxt_cert_store_get
src/nxt_cert.c 1247 nxt_cert_store_get_ocsp (this PR)
src/nxt_script.c 483 nxt_script_store_get
src/nxt_router_access_log.c 579 access-log reopen

Pattern

b = nxt_buf_mem_alloc(mp, ...);
nxt_mp_retain(mp);                                    // refcount++
b->completion_handler = nxt_xxx_buf_completion;       // calls nxt_mp_release on send-done

stream = nxt_port_rpc_register_handler(...);
if (stream == 0)                  goto fail;          // ← LEAK: completion never runs
ret = nxt_port_socket_write(...);
if (ret != NXT_OK) {
    nxt_port_rpc_cancel(...);     goto fail;          // ← LEAK: completion never runs
}
return;

fail:
    handler(task, NULL, ctx);                         // doesn't touch buf

Why it leaks

Tracing through nxt_port_socket_write2nxt_port_msg_chk_insertnxt_port_write_handler:

  • Send succeeds: nxt_port_buf_completion (nxt_port_socket.c:457) schedules b->completion_handler for every buf in the chain. ✓
  • Async socket error after enqueue: nxt_port_error_handler (nxt_port_socket.c:1359-1372) iterates port->messages, runs b->completion_handler for each. ✓
  • nxt_port_msg_alloc malloc fails inside chk_insert → returns NXT_ERROR, msg never enters port->messages, error_handler never sees it. Buf completion never runs.
  • nxt_port_rpc_register_handler returns 0socket_write isn't reached. Buf completion never runs.

In both leak cases the buf completion (nxt_xxx_buf_completionnxt_mp_release) never executes; mp->retain stays incremented. The destroy gate in nxt_mp_release is if (mp->retain == 0) (nxt_mp.c:302), so the entire pool is held forever.

Concrete impact

mp is the router_temp_conf mem pool — it owns the entire pending config (sockets, bundles, routes, app refs). One unbalanced retain prevents nxt_mp_destroy and the full pool stays resident until process exit. Triggered by malloc failure of a 96-byte nxt_port_send_msg_t copy, or by RPC stream-id pool exhaustion.

Leak B — receiver side: fd not closed when send fails

Affected call sites

File Line Function
src/nxt_cert.c ~1227 nxt_cert_store_get_handler
src/nxt_cert.c ~1340 nxt_cert_store_get_ocsp_handler (this PR)
src/nxt_script.c 592 nxt_script_store_get_handler
src/nxt_main_process.c 1156 modules-fetch handler
src/nxt_main_process.c 1725-1732 conf-store handler

Pattern

ret = nxt_file_open(...);                                          // file.fd = real fd on success
if (ret == NXT_OK) {
    type = NXT_PORT_MSG_RPC_READY_LAST | NXT_PORT_MSG_CLOSE_FD;
}
reply:
    (void) nxt_port_socket_write(task, port, type, file.fd, ...);  // return discarded

Why it leaks

CLOSE_FD is honored in three places — nxt_port_msg_close_fd is called at nxt_port_socket.c:455 (send success), :519 (inline send error), and from nxt_port_error_handler:1361 (async error after enqueue). The hole:

  • nxt_port_msg_alloc malloc fails inside chk_insert → msg copy never enters port->messages, error_handler never sees it. fd leaks in main process.

The (void) cast on the call means even if socket_write returned NXT_ERROR the handler couldn't react.

Concrete impact

A leaked file descriptor in the privileged main process, one per failure. The certificate .ocsp (or PEM, or script) file stays open — visible in /proc/$PID/fd, accumulates over reload churn, eventually hits RLIMIT_NOFILE.


Fix plan

Single follow-up PR titled port: invoke completion / close fd on send-side failure paths.

Step 1 — sender side (Leak A)

Add a helper next to nxt_port_socket_write that runs the buf's completion handler on the work queue when the caller can't send the buffer. Replace each goto fail in the affected functions with a call to that helper, then remove the bare goto fail for the post-retain failure paths.

Minimal alternative: in each fail: label, manually invoke b->completion_handler(task, b, b->parent) if b != NULL && retain happened. This mirrors what nxt_port_error_handler does inline.

Apply uniformly to:

  • nxt_cert_store_get (src/nxt_cert.c:1102)
  • nxt_cert_store_get_ocsp (src/nxt_cert.c:1232)
  • nxt_script_store_get (src/nxt_script.c)
  • nxt_router_access_log reopen path (src/nxt_router_access_log.c:579)

Step 2 — receiver side (Leak B)

Capture the nxt_port_socket_write return value; on != NXT_OK close file.fd explicitly when CLOSE_FD was set. Two-line change per site. A shared helper would be cleaner:

static void
nxt_port_reply_with_fd(nxt_task_t *task, nxt_port_t *port,
    nxt_port_msg_type_t ok_type, nxt_fd_t fd, uint32_t stream)
{
    nxt_port_msg_type_t type;

    type = (fd != -1)
           ? (ok_type | NXT_PORT_MSG_CLOSE_FD)
           : NXT_PORT_MSG_RPC_ERROR;

    if (nxt_port_socket_write(task, port, type, fd, stream, 0, NULL) != NXT_OK
        && fd != -1)
    {
        nxt_fd_close(fd);
    }
}

Apply to the five receiver call sites listed above.

Step 3 — tests

Both leaks are reachable only under malloc failure / RPC pool exhaustion, so deterministic tests would need malloc-injection. Two viable options:

  1. AddressSanitizer + LD_PRELOAD malloc shim — a unit test that wraps malloc and forces failure on the Nth call after a marker. Heavy.
  2. Direct unit test of the helper — call nxt_port_socket_write against a port whose write buffer is artificially full; assert that the completion ran (sender) and fd is closed (receiver). Lighter.

Option 2 keeps the PR scoped. Option 1 can come later as a hardening exercise across the IPC layer.

Estimated effort

~1 day. Six call sites, all using identical patterns; the helper extracts the duplication. No protocol or config-surface changes.

Out of scope for this PR

Both leaks are pre-existing in upstream Unit's cert store and adjacent IPC machinery. Fixing only the OCSP twins would leave the cert/script/conf-store/access-log twins broken in the same way, so the fix is cross-cutting and belongs in its own PR rather than expanding the scope of static OCSP stapling.


Generated by Claude Code

@andypost
Copy link
Copy Markdown
Owner Author

andypost commented May 7, 2026

Upstream routing for the cert-store IPC leak fix

Correction to the previous comment's "out of scope" footnote: the upstream for this codebase is freeunitorg/freeunit (community LTS continuation; nginx/unit was archived October 2025 and no longer accepts contributions).

When the follow-up "port: invoke completion / close fd on send-side failure paths" PR is ready, it should target freeunitorg/freeunit:master so the fix lands once for everyone — both leaks (Leak A, Leak B) live in pre-existing upstream code (nxt_cert.c, nxt_script.c, nxt_main_process.c, nxt_router_access_log.c) that this PR only mirrors for the OCSP twin functions.

No code change needed in this PR; this comment is purely a routing note for the follow-up.


Generated by Claude Code

Adds opt-in OCSP stapling for TLS listeners. Setting "ocsp_staple": true
on a listener's "tls" object makes the router fetch a DER-encoded OCSP
response from <name>.ocsp alongside the certificate in the cert store
and serve it via the TLS status_request extension. Operator refreshes
the response file out-of-band (e.g. cron + openssl ocsp).

Implementation:
- Build probe NXT_HAVE_OPENSSL_OCSP (auto/ssltls).
- nxt_tls_bundle_conf_t carries the cached DER + open fd; nxt_tls_init_t
  carries the per-listener boolean.
- New cert_ocsp_get IPC handler opens <cert_dir>/<name>.ocsp read-only
  and returns the fd to the router; missing files are not errors
  (handshake just runs without stapling).
- nxt_openssl.c parses the DER (d2i_OCSP_RESPONSE), checks
  responseStatus == successful, and validates thisUpdate/nextUpdate via
  OCSP_check_validity so an expired or future-dated response fails the
  config apply rather than silently breaking handshakes.  Cached bytes
  are served per handshake via SSL_CTX_set_tlsext_status_cb (with the
  OPENSSL_malloc copy the API requires); response size capped at 64 KiB.
- Router flow chains cert fetch -> ocsp fetch -> server_init.

Phase 1 of the D4 TLS modernization roadmap entry; auto-fetch from AIA
responder URL is deliberately deferred.
@andypost andypost force-pushed the claude/tls-modernization-unit-NQehK branch from 1cede4b to 52c9b54 Compare May 7, 2026 22:58
@andypost
Copy link
Copy Markdown
Owner Author

andypost commented May 7, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements TLS OCSP stapling by adding configuration options, main process handlers for loading .ocsp files, and OpenSSL logic to serve responses during handshakes. Feedback suggests improving path construction safety for OCSP files, ensuring file descriptors are correctly initialized to -1 after zero-allocation to prevent cleanup issues, and addressing a potential memory leak of the bundle structure if context allocation fails.

Comment thread src/nxt_cert.c
Comment thread src/nxt_router.c
Comment thread src/nxt_router.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants