proxymode tunnel stability — immediate close, interface cleanup, keepalive tolerance#6195
Open
cvl wants to merge 3 commits into
Open
proxymode tunnel stability — immediate close, interface cleanup, keepalive tolerance#6195cvl wants to merge 3 commits into
cvl wants to merge 3 commits into
Conversation
- Close(): close device immediately instead of 2-minute deferred goroutine. Orphaned devices accumulated, sending rogue keepalives. - ConfigureDevice(): close old device/proxy before creating new ones. ReConfigureDevice leaked old netstack, device, and proxy server. - Proxy(): synchronous net.Listen + server.Serve so port conflicts fail ConfigureDevice immediately instead of silently in background. - PeerStats(): nil/lock guard on Device to prevent panic after Close. - Suppress http.ErrServerClosed log noise on normal shutdown.
- Stop(): skip ReleaseInterface when ProxyPort > 0. Proxymode names interfaces myst<port> without AllocateInterface, so release always errored with "allocated interface not found". - StartConsumerMode(): same skip on configure failure path. - cleanAbandonedInterfaces(): skip in proxymode (same reason as dVPN — multiple concurrent connections should not destroy each other).
In proxymode the gateway manages tunnel health via its own sentinel and probe mechanisms. The P2P keepalive failure (3 × 5s) was killing perfectly healthy WireGuard tunnels because the P2P channel (NATS/UDP signaling) is less stable than the tunnel itself in Docker.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #6195 +/- ##
==========================================
+ Coverage 25.60% 25.76% +0.16%
==========================================
Files 540 540
Lines 31160 31180 +20
==========================================
+ Hits 7977 8033 +56
+ Misses 22392 22347 -45
- Partials 791 800 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
WireGuard tunnels in
--proxymodedie within 60-251 seconds. Every tunnel, every country. The proxy server and connection object stay alive — nothing reports unhealthy until external probes discover the death seconds later.Root causes:
proxyclient.Close()defersDevice.Close()by 2 minutes in a goroutine. Orphaned devices accumulate, each sending rogue UDP keepalives to old providers. With pool cycling, dozens of phantom devices pile up.ConfigureDevice()overwritesc.Deviceandc.proxyClosewithout closing the old ones. On reconfigure (e.g. auto-reconnect), the old netstack, WireGuard device, and HTTP proxy server leak forever. The old proxy holds the port, so the new one silently fails to bind in a background goroutine.ReleaseInterface()errors on every disconnect in proxymode ("allocated interface not found") because proxymode sets interface name tomyst<port>without callingAllocateInterface().cleanAbandonedInterfaces()is not skipped for proxymode either (only dVPN was exempted).The P2P keepalive loop auto-disconnects after 3 consecutive ping failures (15-30s). In proxymode, the gateway manages tunnel health via its own sentinel and probe mechanisms. The P2P channel (NATS/UDP signaling) is less stable than the WireGuard tunnel itself in Docker — the keepalive kills perfectly healthy tunnels.
Changes
services/wireguard/endpoint/proxyclient/client.goClose(): close device immediately (was 2-minute deferred goroutine). Nil bothDeviceandproxyCloseto prevent double-close.ConfigureDevice(): close old device and proxy before creating new ones. Old resources extracted under lock, closed outside lock.Proxy(): synchronousnet.Listen+server.Serve(ln)so port conflicts failConfigureDeviceimmediately instead of silently in a background goroutine. Suppresshttp.ErrServerClosedon normal shutdown.PeerStats(): nil/lock guard onDevice— returns error instead of panic when device is closed.services/wireguard/endpoint/endpoint.goStop(): skipReleaseInterfacewhenProxyPort > 0(proxymode never calledAllocateInterface).StartConsumerMode(): same skip on configure failure path.cleanAbandonedInterfaces(): skip in proxymode (same reason as dVPN — multiple concurrent connections should not destroy each other).core/connection/manager.gokeepAliveLoop(): in proxymode (FlagProxyMode), reset error counter and continue instead of disconnecting/going OnHold. Log at warn level. The gateway's sentinel detects real tunnel death independently.Tests
Test_Close_ReleasesDeviceImmediately— Device nil after CloseTest_Close_ProxyCloseCalledAndNilled— proxyClose invoked and clearedTest_Close_Idempotent— double Close safeTest_ConfigureDevice_CleansOldResources— old device/proxy cleaned on reconfigureTest_ConfigureDevice_DoubleConfigureSamePort— two configures produce distinct devicesTest_PeerStats_NilDeviceReturnsError— nil guard returns error, no panicTest_PeerStats_NilAfterClose— PeerStats safe after CloseTest_Proxy_SyncBindFailsOnPortConflict— port conflict detected synchronouslyResults