server: restrict container egress CONNECT listener to the managed container#6205
server: restrict container egress CONNECT listener to the managed container#6205
Conversation
|
The generated output of Full Type Diff |
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This PR restricts the egress CONNECT listener so that only the managed container (via its sidecar, which shares the container's network namespace) can establish CONNECT tunnels, preventing other Docker bridge peers from using it as an open proxy.
Issues (highest severity first)
1. [High] response.accept(403, ...) is incorrect for rejecting a CONNECT request
KJ's ConnectResponse has a separate reject() method for sending error responses on CONNECT. Using accept() with a 403 status transitions the connection into tunnel mode regardless of the status code, which means the server thinks the tunnel is established. After "accepting," you then call shutdownWrite() on the raw connection — but this is the underlying transport, not a stream returned by reject(). The correct approach is to use response.reject(403, "Forbidden", headers) which returns an output stream you can optionally write an error body to, without establishing a tunnel.
2. [High] TOCTOU race between inspectContainer() and the actual peer
The authorization calls inspectContainer() (a Docker API HTTP round-trip) to get the container IP, then compares it against the peer. Between the inspect response and the getpeername() call, the container could restart with a different IP, or the IP could be reassigned to another container. For a security-critical check, consider caching the container IP at start time and/or performing the check without an async round-trip per CONNECT.
3. [Medium] getpeername() can throw — no error handling
If the socket has been disconnected or is in an unexpected state, getpeername() will throw a KJ exception. The exception will propagate uncaught, potentially crashing the CONNECT handler or leaving the connection in a bad state. A failed getpeername() should be treated as unauthorized.
4. [Medium] KJ style: maybeIpAddress == kj::none instead of KJ_IF_SOME
The idiomatic KJ pattern for checking kj::Maybe is KJ_IF_SOME. The code checks maybeIpAddress == kj::none on line 640, then uses KJ_IF_SOME on line 649. The first check is redundant if the KJ_IF_SOME path handles it — and using == on kj::Maybe is non-idiomatic in this codebase.
5. [Low] Temporary kj::str() allocations in the comparison hot path
Both the IPv4 and IPv6 branches call containerIp == kj::str(addrBuf), which allocates a kj::String on the heap just to compare with a kj::StringPtr. Since containerIp is a kj::String, you can compare directly with a kj::StringPtr constructed from the char array without allocation.
| auto isAuthorized = co_await containerClient.isEgressPeerAuthorized(connection); | ||
| if (!isAuthorized) { | ||
| kj::HttpHeaders responseHeaders(headerTable); | ||
| response.accept(403, "Forbidden", responseHeaders); | ||
| connection.shutdownWrite(); | ||
| co_return; | ||
| } |
There was a problem hiding this comment.
[High] accept() transitions the connection into tunnel mode. For rejecting a CONNECT, use reject() instead, which sends a proper error response without establishing a tunnel.
| auto isAuthorized = co_await containerClient.isEgressPeerAuthorized(connection); | |
| if (!isAuthorized) { | |
| kj::HttpHeaders responseHeaders(headerTable); | |
| response.accept(403, "Forbidden", responseHeaders); | |
| connection.shutdownWrite(); | |
| co_return; | |
| } | |
| auto isAuthorized = co_await containerClient.isEgressPeerAuthorized(connection); | |
| if (!isAuthorized) { | |
| kj::HttpHeaders responseHeaders(headerTable); | |
| response.reject(403, "Forbidden", responseHeaders); | |
| co_return; | |
| } |
| kj::Promise<bool> ContainerClient::isEgressPeerAuthorized(kj::AsyncIoStream& connection) { | ||
| auto [isRunning, _ports, maybeIpAddress] = co_await inspectContainer(); | ||
| if (!isRunning || maybeIpAddress == kj::none) { | ||
| co_return false; | ||
| } | ||
|
|
||
| sockaddr_storage peerAddrStorage; | ||
| auto peerAddr = reinterpret_cast<sockaddr*>(&peerAddrStorage); | ||
| kj::uint peerAddrLen = sizeof(peerAddrStorage); | ||
| connection.getpeername(peerAddr, &peerAddrLen); | ||
|
|
||
| KJ_IF_SOME(containerIp, maybeIpAddress) { | ||
| switch (peerAddr->sa_family) { | ||
| case AF_INET: { | ||
| char addrBuf[INET_ADDRSTRLEN] = {0}; | ||
| auto* sin = reinterpret_cast<sockaddr_in*>(peerAddr); | ||
| if (inet_ntop(AF_INET, &sin->sin_addr, addrBuf, INET_ADDRSTRLEN) != nullptr && | ||
| containerIp == kj::str(addrBuf)) { | ||
| co_return true; | ||
| } | ||
| break; | ||
| } | ||
| case AF_INET6: { | ||
| char addrBuf[INET6_ADDRSTRLEN] = {0}; | ||
| auto* sin6 = reinterpret_cast<sockaddr_in6*>(peerAddr); | ||
| if (inet_ntop(AF_INET6, &sin6->sin6_addr, addrBuf, INET6_ADDRSTRLEN) != nullptr && | ||
| containerIp == kj::str(addrBuf)) { | ||
| co_return true; | ||
| } | ||
| break; | ||
| } | ||
| default: | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| co_return false; |
There was a problem hiding this comment.
[Medium/High] Multiple issues in isEgressPeerAuthorized():
getpeername()can throw if the socket is disconnected. An exception here should be caught and treated as unauthorized.maybeIpAddress == kj::noneis non-idiomatic — the subsequentKJ_IF_SOMEalready handles thenonecase, making line 640's check redundant.kj::str(addrBuf)allocates unnecessarily —containerIp(akj::String) can be compared directly against akj::StringPtr.
| kj::Promise<bool> ContainerClient::isEgressPeerAuthorized(kj::AsyncIoStream& connection) { | |
| auto [isRunning, _ports, maybeIpAddress] = co_await inspectContainer(); | |
| if (!isRunning || maybeIpAddress == kj::none) { | |
| co_return false; | |
| } | |
| sockaddr_storage peerAddrStorage; | |
| auto peerAddr = reinterpret_cast<sockaddr*>(&peerAddrStorage); | |
| kj::uint peerAddrLen = sizeof(peerAddrStorage); | |
| connection.getpeername(peerAddr, &peerAddrLen); | |
| KJ_IF_SOME(containerIp, maybeIpAddress) { | |
| switch (peerAddr->sa_family) { | |
| case AF_INET: { | |
| char addrBuf[INET_ADDRSTRLEN] = {0}; | |
| auto* sin = reinterpret_cast<sockaddr_in*>(peerAddr); | |
| if (inet_ntop(AF_INET, &sin->sin_addr, addrBuf, INET_ADDRSTRLEN) != nullptr && | |
| containerIp == kj::str(addrBuf)) { | |
| co_return true; | |
| } | |
| break; | |
| } | |
| case AF_INET6: { | |
| char addrBuf[INET6_ADDRSTRLEN] = {0}; | |
| auto* sin6 = reinterpret_cast<sockaddr_in6*>(peerAddr); | |
| if (inet_ntop(AF_INET6, &sin6->sin6_addr, addrBuf, INET6_ADDRSTRLEN) != nullptr && | |
| containerIp == kj::str(addrBuf)) { | |
| co_return true; | |
| } | |
| break; | |
| } | |
| default: | |
| break; | |
| } | |
| } | |
| co_return false; | |
| kj::Promise<bool> ContainerClient::isEgressPeerAuthorized(kj::AsyncIoStream& connection) { | |
| auto [isRunning, _ports, maybeIpAddress] = co_await inspectContainer(); | |
| if (!isRunning) { | |
| co_return false; | |
| } | |
| KJ_IF_SOME(containerIp, maybeIpAddress) { | |
| sockaddr_storage peerAddrStorage; | |
| auto peerAddr = reinterpret_cast<sockaddr*>(&peerAddrStorage); | |
| kj::uint peerAddrLen = sizeof(peerAddrStorage); | |
| KJ_IF_SOME(exception, kj::runCatchingExceptions([&]() { | |
| connection.getpeername(peerAddr, &peerAddrLen); | |
| })) { | |
| // Socket disconnected or in bad state — treat as unauthorized. | |
| co_return false; | |
| } | |
| switch (peerAddr->sa_family) { | |
| case AF_INET: { | |
| char addrBuf[INET_ADDRSTRLEN] = {0}; | |
| auto* sin = reinterpret_cast<sockaddr_in*>(peerAddr); | |
| if (inet_ntop(AF_INET, &sin->sin_addr, addrBuf, INET_ADDRSTRLEN) != nullptr && | |
| containerIp == kj::StringPtr(addrBuf)) { | |
| co_return true; | |
| } | |
| break; | |
| } | |
| case AF_INET6: { | |
| char addrBuf[INET6_ADDRSTRLEN] = {0}; | |
| auto* sin6 = reinterpret_cast<sockaddr_in6*>(peerAddr); | |
| if (inet_ntop(AF_INET6, &sin6->sin6_addr, addrBuf, INET6_ADDRSTRLEN) != nullptr && | |
| containerIp == kj::StringPtr(addrBuf)) { | |
| co_return true; | |
| } | |
| break; | |
| } | |
| default: | |
| break; | |
| } | |
| } | |
| co_return false; | |
| } |
|
|
||
|
|
||
| kj::Promise<bool> ContainerClient::isEgressPeerAuthorized(kj::AsyncIoStream& connection) { | ||
| auto [isRunning, _ports, maybeIpAddress] = co_await inspectContainer(); |
There was a problem hiding this comment.
[Design note] This calls inspectContainer() (a full Docker API HTTP round-trip) on every CONNECT request. This creates a TOCTOU window: the container IP could change between the inspect response and the getpeername() call. Consider capturing the container IP once at container start time and caching it, rather than re-inspecting on every connection. This would also improve latency on the CONNECT path.
|
Review posted successfully. Here's a summary of the 5 issues I flagged on PR #6205:
All issues include concrete suggestion comments with fix code. |
Motivation
Description
ipAddresstoInspectResponsereturned byContainerClient::inspectContainer()so the container's IP is available for authorization (src/workerd/server/container-client.h,src/workerd/server/container-client.c++).ContainerClient::isEgressPeerAuthorized(kj::AsyncIoStream&)which callsinspectContainer(), obtains the container IP, retrieves the peer socket address viagetpeername(), and compares addresses for IPv4/IPv6; it returnstrueonly when the peer matches the managed container IP (src/workerd/server/container-client.c++).EgressHttpService::connect()to callisEgressPeerAuthorized()and reject unauthorized peers with403 Forbiddenand immediately close the connection before accepting/forwarding (src/workerd/server/container-client.c++).inspectContainer()results to match the extended result tuple (src/workerd/server/container-client.c++).Testing
bazel test //src/workerd/server/tests/container-client:container-client@, which failed in this environment due to inability to download the Bazel binary (HTTP 403), so automated tests could not complete.InspectResponseshape is handled at modified callsites and that the CONNECT path now performs authorization; no runtime test harness could be executed in this environment.Codex Task