Skip to content

Gwolfgit/nats-tsnet

Repository files navigation

nats-tsnet

A single-binary NATS JetStream server whose only public network face is a Tailscale tsnet listener. Peers cluster over WireGuard. Peer discovery uses the local Tailscale daemon — no Tailscale API token required.

Each node:

  1. Starts tsnet and joins the tailnet using a pre-auth key that applies tag:nats-js.
  2. Queries LocalClient.Status() for every tailnet peer carrying the same tag.
  3. Builds the NATS cluster route list from those peers' tailscale IPs.
  4. Starts an embedded nats-server with JetStream enabled, with both the client and cluster listeners bound to 127.0.0.1. tsnet listeners on :4222 and :6222 are the public faces — every byte in/out crosses WireGuard.

Why the loopback hop?

The embedded nats-server/v2 API has no public hook to swap its net.Dialer or inject a custom net.Listener. tsnet runs entirely in userspace, so a kernel-side net.Listen on the tailscale IP wouldn't work. The workaround:

  • Inbound: tsnet accepts on tailnet :4222 / :6222 and pipes each connection to 127.0.0.1:<same port> where nats-server is bound.
  • Outbound (cluster): one 127.0.0.1:<auto-port> listener per peer. When nats-server dials a cluster route, the forwarder calls ts.Dial(peerIP:6222). From NATS's perspective every route is loopback; the bytes actually traverse Tailscale.

Cluster.NoAdvertise = true so this server does not advertise its cluster URL in INFO messages — peers therefore don't try to gossip additional dial targets to each other. (All other inter-server traffic — subject interest, JetStream metadata + Raft, message routing — flows normally over the established cluster connections.) Without this, any gossiped route URL would resolve to a tailscale IP that NATS itself can't dial, since nats-server has no hook to swap its net.Dialer. Route discovery is owned entirely by tsnet.

Loopback caveat: any process on the host can connect to 127.0.0.1:4222 without going through Tailscale. If that's a problem for you, run each node in its own container/namespace or set iptables rules to drop non-tsnet loopback ingress.

Build

Requires Go 1.23+.

go build -o nats-tsnet .

Build variants

Two variants are selected at build time via Go build tags. They share all logic except the helpers in bindhost.go / bindhost_dockerbridge.go, which decide where the embedded nats-server binds.

Variant Build flag nats-server binds to Public faces
default (tsnet-only) (no tag) always 127.0.0.1:4222 and 127.0.0.1:6222 tsnet listeners on :4222 + :6222
dockerbridge -tags dockerbridge NATS_TSNET_CLIENT_BIND_HOST / NATS_TSNET_CLUSTER_BIND_HOST env vars; both default to 127.0.0.1 tsnet listeners plus any Docker network the container joins when the operator sets NATS_TSNET_CLIENT_BIND_HOST=0.0.0.0
# Default — production default, tsnet is the only way in.
go build -o nats-tsnet .

# dockerbridge — explicit opt-in to also accept sibling-container
# traffic on a shared Docker network. Use when replacing a pre-tsnet
# NATS in-place without rewriting downstream consumers to be
# tsnet-native. Set NATS_TSNET_CLIENT_BIND_HOST=0.0.0.0 at runtime to
# widen the bind; leaving it unset keeps the binary safe-by-default.
go build -tags dockerbridge -o nats-tsnet-dockerbridge .

The two binaries are intentionally distinct artifacts so the default tsnet-only trust model is provable at build time — there is no runtime knob in the default binary that can weaken it.

Tailscale setup (one-time)

In your tailnet ACL (https://login.tailscale.com/admin/acls):

{
  "tagOwners": {
    "tag:nats-js": ["autogroup:admin"]
  },
  "acls": [
    // Allow nats-js nodes to talk to each other on the cluster + client ports
    {
      "action": "accept",
      "src":    ["tag:nats-js"],
      "dst":    ["tag:nats-js:4222", "tag:nats-js:6222"]
    },
    // Allow whoever needs to publish/subscribe
    {
      "action": "accept",
      "src":    ["autogroup:member"],
      "dst":    ["tag:nats-js:4222"]
    }
  ]
}

Mint a pre-auth key with the tag attached (one key per host is fine; reusable keys also work):

# https://login.tailscale.com/admin/settings/keys
# Tag: tag:nats-js  (mandatory)
# Pre-authorized: yes
# Reusable / Ephemeral: your call

Run

export TS_AUTHKEY=tskey-auth-...     # the tag:nats-js pre-auth key
cp config.example.yaml config.yaml
$EDITOR config.yaml                  # set hostname per node, server_name
./nats-tsnet -config config.yaml

On first boot the node registers in the tailnet, the tag is applied, and the node starts discovering peers. Bring up a second node the same way — it will discover the first via LocalClient.Status() and form a cluster automatically.

Verify

From any tailnet member:

# Connect to a node (MagicDNS hostname matches `hostname:` in config.yaml)
nats -s nats://nats-1:4222 server check connection
nats -s nats://nats-1:4222 server list

# JetStream sanity check
nats -s nats://nats-1:4222 stream add demo --subjects 'demo.>' --defaults
nats -s nats://nats-1:4222 stream info demo

For JetStream cluster quorum you need at least 3 tagged nodes.

Config

See config.example.yaml. Every field has a default; on most hosts only hostname needs setting (and TS_AUTHKEY in the env).

Env var Purpose
TS_AUTHKEY Pre-auth key with tag:nats-js. Required on first boot.
NATS_TSNET_HOSTNAME Override hostname from the config file.

Known limitations (MVP)

  • Live peer-set changes are not applied. New tagged peers are logged but not added to the running cluster's routes (NATS has no clean reload for programmatic route mutation). Restart the binary on every node when the cluster size changes.
  • One tailnet IP per peer. If a peer has both IPv4 and IPv6 tailscale IPs, only the first is used as a cluster route.
  • No mTLS between client + tsnet acceptor. The tsnet-loopback pipe is plaintext bytes; client→tailnet traffic is still WireGuard-encrypted.

About

Embedded NATS JetStream server that runs only on Tailscale via tsnet, with peer discovery by tag

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors