|
| 1 | +# K3s Infrastructure Documentation |
| 2 | + |
| 3 | +> Last updated: 2025-11-27 |
| 4 | +
|
| 5 | +## Overview |
| 6 | + |
| 7 | +Self-hosted k3s cluster on DigitalOcean for internal tools (Appsmith, etc.). |
| 8 | + |
| 9 | +| Component | Value | |
| 10 | +|-----------|-------| |
| 11 | +| Region | NYC3 | |
| 12 | +| Nodes | 3 (HA control plane) | |
| 13 | +| K3s Version | v1.33.6+k3s1 | |
| 14 | +| Container Runtime | containerd 2.1.5 | |
| 15 | +| OS | Ubuntu 24.04.3 LTS | |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## DigitalOcean Resources |
| 20 | + |
| 21 | +### VPC |
| 22 | + |
| 23 | +| Property | Value | |
| 24 | +|----------|-------| |
| 25 | +| Name | `ops-vpc-tools-k3s-nyc3` | |
| 26 | +| ID | Get via: `doctl vpcs list \| grep k3s` | |
| 27 | +| IP Range | `10.108.0.0/20` | |
| 28 | +| Region | nyc3 | |
| 29 | + |
| 30 | +### Droplets |
| 31 | + |
| 32 | +| Name | Private IP | Specs | |
| 33 | +|------|-----------:|-------| |
| 34 | +| ops-vm-tools-k3s-nyc3-01 | 10.108.0.4 | 4 vCPU, 8GB RAM, 160GB | |
| 35 | +| ops-vm-tools-k3s-nyc3-02 | 10.108.0.5 | 4 vCPU, 8GB RAM, 160GB | |
| 36 | +| ops-vm-tools-k3s-nyc3-03 | 10.108.0.6 | 4 vCPU, 8GB RAM, 160GB | |
| 37 | + |
| 38 | +All tagged: `tools-k3s` |
| 39 | + |
| 40 | +### Load Balancer |
| 41 | + |
| 42 | +| Property | Value | |
| 43 | +|----------|-------| |
| 44 | +| Name | `ops-lb-tools-k3s-nyc3-01` | |
| 45 | +| IP | Get via: `doctl compute load-balancer list \| grep k3s` | |
| 46 | +| VPC | `ops-vpc-tools-k3s-nyc3` | |
| 47 | +| Target Droplets | All 3 k3s nodes | |
| 48 | + |
| 49 | +**Forwarding Rules:** |
| 50 | + |
| 51 | +| Entry Protocol | Entry Port | Target Protocol | Target Port | TLS | |
| 52 | +|----------------|------------|-----------------|-------------|-----| |
| 53 | +| HTTP | 80 | HTTP | 30080 | - | |
| 54 | +| HTTPS | 443 | HTTPS | 30443 | Passthrough | |
| 55 | + |
| 56 | +**Health Check:** |
| 57 | +- Protocol: TCP |
| 58 | +- Port: 30443 |
| 59 | +- Interval: 10s |
| 60 | +- Timeout: 5s |
| 61 | +- Healthy threshold: 5 |
| 62 | +- Unhealthy threshold: 3 |
| 63 | + |
| 64 | +### Firewall |
| 65 | + |
| 66 | +| Property | Value | |
| 67 | +|----------|-------| |
| 68 | +| Name | `tools-fw-nyc3` | |
| 69 | +| ID | Get via: `doctl compute firewall list \| grep tools` | |
| 70 | + |
| 71 | +**Inbound Rules:** |
| 72 | + |
| 73 | +| Protocol | Ports | Source | |
| 74 | +|----------|-------|--------| |
| 75 | +| ICMP | - | VPC (10.108.0.0/20) | |
| 76 | +| TCP | All | VPC (10.108.0.0/20) | |
| 77 | +| UDP | All | VPC (10.108.0.0/20) | |
| 78 | +| TCP | 22 | 0.0.0.0/0 (SSH) | |
| 79 | +| TCP | 30080 | Load Balancer only | |
| 80 | +| TCP | 30443 | Load Balancer only | |
| 81 | + |
| 82 | +**Outbound Rules:** All traffic allowed (TCP/UDP/ICMP to 0.0.0.0/0) |
| 83 | + |
| 84 | +--- |
| 85 | + |
| 86 | +## Kubernetes Cluster |
| 87 | + |
| 88 | +### Control Plane |
| 89 | + |
| 90 | +All 3 nodes are control-plane/etcd/master (HA configuration): |
| 91 | + |
| 92 | +``` |
| 93 | +┌─────────────────────────────────────────────────────────────┐ |
| 94 | +│ K3s HA Cluster │ |
| 95 | +├─────────────────┬─────────────────┬─────────────────────────┤ |
| 96 | +│ Node 01 │ Node 02 │ Node 03 │ |
| 97 | +│ 10.108.0.4 │ 10.108.0.5 │ 10.108.0.6 │ |
| 98 | +├─────────────────┼─────────────────┼─────────────────────────┤ |
| 99 | +│ control-plane │ control-plane │ control-plane │ |
| 100 | +│ etcd │ etcd │ etcd │ |
| 101 | +│ master │ master │ master │ |
| 102 | +├─────────────────┼─────────────────┼─────────────────────────┤ |
| 103 | +│ coredns │ traefik │ appsmith │ |
| 104 | +│ metrics-server │ longhorn │ longhorn │ |
| 105 | +│ longhorn │ │ │ |
| 106 | +├─────────────────┴─────────────────┴─────────────────────────┤ |
| 107 | +│ Longhorn Replicated Storage (2 replicas) │ |
| 108 | +└─────────────────────────────────────────────────────────────┘ |
| 109 | +``` |
| 110 | + |
| 111 | +### API Server Access |
| 112 | + |
| 113 | +```yaml |
| 114 | +server: https://ops-vm-tools-k3s-nyc3-01:6443 |
| 115 | +``` |
| 116 | +
|
| 117 | +Kubeconfig uses hostname resolution (likely via `/etc/hosts` or Tailscale). |
| 118 | + |
| 119 | +### Resource Usage (as of inspection) |
| 120 | + |
| 121 | +| Node | CPU | Memory | |
| 122 | +|------|-----|--------| |
| 123 | +| node-01 | 92m (2%) | 1497Mi (18%) | |
| 124 | +| node-02 | 78m (1%) | 817Mi (10%) | |
| 125 | +| node-03 | 58m (1%) | 1978Mi (24%) | |
| 126 | + |
| 127 | +**Total Capacity per Node:** 4 CPU, 8GB RAM |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +## Networking |
| 132 | + |
| 133 | +### Traffic Flow |
| 134 | + |
| 135 | +``` |
| 136 | +Internet |
| 137 | + │ |
| 138 | + ▼ |
| 139 | +┌──────────────────────────────────┐ |
| 140 | +│ DO Load Balancer │ |
| 141 | +│ - HTTP:80 → NodePort:30080 │ |
| 142 | +│ - HTTPS:443 → NodePort:30443 │ |
| 143 | +└──────────────────────────────────┘ |
| 144 | + │ |
| 145 | + ▼ (VPC: 10.108.0.0/20) |
| 146 | +┌──────────────────────────────────┐ |
| 147 | +│ Firewall (tools-fw-nyc3) │ |
| 148 | +│ - Only LB can reach 30080/30443 │ |
| 149 | +│ - SSH open (consider limiting) │ |
| 150 | +└──────────────────────────────────┘ |
| 151 | + │ |
| 152 | + ▼ |
| 153 | +┌──────────────────────────────────┐ |
| 154 | +│ Traefik (NodePort Service) │ |
| 155 | +│ - 30080 → web (HTTP) │ |
| 156 | +│ - 30443 → websecure (HTTPS) │ |
| 157 | +└──────────────────────────────────┘ |
| 158 | + │ |
| 159 | + ▼ |
| 160 | +┌──────────────────────────────────┐ |
| 161 | +│ Gateway API │ |
| 162 | +│ - GatewayClass: traefik │ |
| 163 | +│ - Gateway per namespace │ |
| 164 | +│ - HTTPRoutes for routing │ |
| 165 | +└──────────────────────────────────┘ |
| 166 | + │ |
| 167 | + ▼ |
| 168 | +┌──────────────────────────────────┐ |
| 169 | +│ Application Services (ClusterIP)│ |
| 170 | +│ - appsmith:80 │ |
| 171 | +└──────────────────────────────────┘ |
| 172 | +``` |
| 173 | +
|
| 174 | +### Traefik Configuration |
| 175 | +
|
| 176 | +Located at: `cluster/charts/traefik/values.yaml` |
| 177 | +
|
| 178 | +Key settings: |
| 179 | +- **Service Type:** NodePort (for DO LB compatibility) |
| 180 | +- **NodePorts:** 30080 (HTTP), 30443 (HTTPS) |
| 181 | +- **Gateway API:** Enabled |
| 182 | +- **TLS Passthrough:** Yes (terminates at app Gateway) |
| 183 | +- **Access Logs:** Enabled |
| 184 | +
|
| 185 | +### Pod Network |
| 186 | +
|
| 187 | +- CIDR: `10.42.0.0/16` (default k3s) |
| 188 | +- Service CIDR: `10.43.0.0/16` |
| 189 | +- DNS: CoreDNS at `10.43.0.10` |
| 190 | +
|
| 191 | +--- |
| 192 | +
|
| 193 | +## Storage |
| 194 | +
|
| 195 | +### Longhorn (Primary) |
| 196 | +
|
| 197 | +Distributed block storage with cross-node replication. |
| 198 | +
|
| 199 | +| Property | Value | |
| 200 | +|----------|-------| |
| 201 | +| Version | v1.10.1 | |
| 202 | +| Provisioner | `driver.longhorn.io` | |
| 203 | +| Default Replicas | 2 (survives 1 node failure) | |
| 204 | +| Data Path | `/var/lib/longhorn/` | |
| 205 | +| Config | `cluster/charts/longhorn/values.yaml` | |
| 206 | +
|
| 207 | +**Failover Tested:** Pod reschedules to healthy node, mounts replica, continues working. |
| 208 | +
|
| 209 | +```bash |
| 210 | +# Longhorn UI (port-forward) |
| 211 | +kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80 |
| 212 | +
|
| 213 | +# Check volumes |
| 214 | +kubectl get volumes.longhorn.io -n longhorn-system |
| 215 | +kubectl get replicas.longhorn.io -n longhorn-system -o wide |
| 216 | +``` |
| 217 | + |
| 218 | +### Local Path (Legacy) |
| 219 | + |
| 220 | +Still available for non-critical workloads. Single-node, no replication. |
| 221 | + |
| 222 | +| Property | Value | |
| 223 | +|----------|-------| |
| 224 | +| Provisioner | `rancher.io/local-path` | |
| 225 | +| Storage Path | `/var/lib/rancher/k3s/storage/` | |
| 226 | + |
| 227 | +### Storage Classes |
| 228 | + |
| 229 | +| Name | Provisioner | Replicas | Use For | |
| 230 | +|------|-------------|----------|---------| |
| 231 | +| `longhorn` | driver.longhorn.io | 2 | Databases, stateful apps | |
| 232 | +| `local-path` (default) | rancher.io/local-path | 1 | Ephemeral, non-critical | |
| 233 | + |
| 234 | +--- |
| 235 | + |
| 236 | +## Installed Components |
| 237 | + |
| 238 | +### System (kube-system) |
| 239 | + |
| 240 | +| Component | Purpose | |
| 241 | +|-----------|---------| |
| 242 | +| CoreDNS | Cluster DNS | |
| 243 | +| Traefik | Ingress/Gateway controller | |
| 244 | +| Local Path Provisioner | Legacy storage | |
| 245 | +| Metrics Server | Resource metrics | |
| 246 | + |
| 247 | +### Longhorn (longhorn-system) |
| 248 | + |
| 249 | +| Component | Replicas | |
| 250 | +|-----------|----------| |
| 251 | +| longhorn-manager | 3 (DaemonSet) | |
| 252 | +| longhorn-driver-deployer | 1 | |
| 253 | +| longhorn-csi-plugin | 3 (DaemonSet) | |
| 254 | +| longhorn-ui | 1 | |
| 255 | +| csi-attacher/provisioner/resizer/snapshotter | 2 each | |
| 256 | + |
| 257 | +### Gateway API CRDs |
| 258 | + |
| 259 | +- `gatewayclasses.gateway.networking.k8s.io` |
| 260 | +- `gateways.gateway.networking.k8s.io` |
| 261 | +- `httproutes.gateway.networking.k8s.io` |
| 262 | +- `grpcroutes.gateway.networking.k8s.io` |
| 263 | +- `referencegrants.gateway.networking.k8s.io` |
| 264 | + |
| 265 | +### Traefik CRDs |
| 266 | + |
| 267 | +- `middlewares.traefik.io` |
| 268 | +- `ingressroutes.traefik.io` |
| 269 | +- `serverstransports.traefik.io` |
| 270 | +- `tlsoptions.traefik.io` |
| 271 | + |
| 272 | +--- |
| 273 | + |
| 274 | +## Applications |
| 275 | + |
| 276 | +### Appsmith |
| 277 | + |
| 278 | +| Property | Value | |
| 279 | +|----------|-------| |
| 280 | +| Namespace | `appsmith` | |
| 281 | +| Domain | `appsmith.freecodecamp.net` | |
| 282 | +| Gateway | `appsmith-gateway` | |
| 283 | +| HTTPRoutes | `appsmith-route`, `http-redirect` | |
| 284 | +| Storage | 10Gi PVC (longhorn, 2 replicas) | |
| 285 | + |
| 286 | +### Outline |
| 287 | + |
| 288 | +| Property | Value | |
| 289 | +|----------|-------| |
| 290 | +| Namespace | `outline` | |
| 291 | +| Domain | `outline.freecodecamp.net` | |
| 292 | +| Gateway | `outline-gateway` | |
| 293 | +| HTTPRoutes | `outline-route`, `http-redirect` | |
| 294 | +| Storage | 10Gi PostgreSQL + 10Gi data (longhorn) | |
| 295 | +| Auth | Google OAuth | |
| 296 | +| Components | Outline + PostgreSQL + Redis (single pod) | |
| 297 | + |
| 298 | +--- |
| 299 | + |
| 300 | +## Security Considerations |
| 301 | + |
| 302 | +1. **SSH Access:** Currently open to 0.0.0.0/0 - consider restricting to known IPs or Tailscale |
| 303 | +2. **Firewall:** NodePorts only accessible via Load Balancer (good) |
| 304 | +3. **TLS:** Passthrough to application Gateways (Cloudflare origin certs) |
| 305 | +4. **API Server:** Accessible via hostname (requires VPN/hosts entry) |
| 306 | + |
| 307 | +--- |
| 308 | + |
| 309 | +## DNS Configuration |
| 310 | + |
| 311 | +| Domain | Type | Target | |
| 312 | +|--------|------|--------| |
| 313 | +| appsmith.freecodecamp.net | A | `<LB_IP>` | |
| 314 | +| outline.freecodecamp.net | A | `<LB_IP>` | |
| 315 | + |
| 316 | +DNS managed in Cloudflare (proxied or DNS-only based on requirements). |
| 317 | + |
| 318 | +--- |
| 319 | + |
| 320 | +## Maintenance Commands |
| 321 | + |
| 322 | +```bash |
| 323 | +# Set kubeconfig (or use direnv) |
| 324 | +cd k3s |
| 325 | +export KUBECONFIG=./.kubeconfig.yaml |
| 326 | + |
| 327 | +# Check cluster health |
| 328 | +kubectl get nodes -o wide |
| 329 | +kubectl top nodes |
| 330 | + |
| 331 | +# Check all pods |
| 332 | +kubectl get pods -A |
| 333 | + |
| 334 | +# Check storage |
| 335 | +kubectl get pv,pvc -A |
| 336 | +kubectl get storageclass |
| 337 | + |
| 338 | +# Check gateways |
| 339 | +kubectl get gateway,httproute -A |
| 340 | + |
| 341 | +# DO resources |
| 342 | +doctl compute droplet list | grep k3s |
| 343 | +doctl compute load-balancer list | grep k3s |
| 344 | +doctl compute firewall list | grep tools |
| 345 | +``` |
| 346 | + |
| 347 | +--- |
| 348 | + |
| 349 | +## Architecture Diagram |
| 350 | + |
| 351 | +``` |
| 352 | + ┌─────────────────────────────────┐ |
| 353 | + │ Cloudflare │ |
| 354 | + │ appsmith.freecodecamp.net │ |
| 355 | + └───────────────┬─────────────────┘ |
| 356 | + │ |
| 357 | + ▼ |
| 358 | +┌───────────────────────────────────────────────────────────────────────────────┐ |
| 359 | +│ DigitalOcean NYC3 │ |
| 360 | +│ ┌─────────────────────────────────────────────────────────────────────────┐ │ |
| 361 | +│ │ Load Balancer │ │ |
| 362 | +│ │ HTTP:80 → 30080, HTTPS:443 → 30443 │ │ |
| 363 | +│ └─────────────────────────────────────────────────────────────────────────┘ │ |
| 364 | +│ │ │ |
| 365 | +│ ┌─────────────────────────────────────────────────────────────────────────┐ │ |
| 366 | +│ │ Firewall (tools-fw-nyc3) │ │ |
| 367 | +│ └─────────────────────────────────────────────────────────────────────────┘ │ |
| 368 | +│ │ │ |
| 369 | +│ ┌─────────────────────────────────────────────────────────────────────────┐ │ |
| 370 | +│ │ VPC (10.108.0.0/20) │ │ |
| 371 | +│ │ ┌─────────────────┬─────────────────┬─────────────────┐ │ │ |
| 372 | +│ │ │ Node 01 │ Node 02 │ Node 03 │ │ │ |
| 373 | +│ │ │ 10.108.0.4 │ 10.108.0.5 │ 10.108.0.6 │ │ │ |
| 374 | +│ │ │ │ │ │ │ │ |
| 375 | +│ │ │ ┌───────────┐ │ ┌───────────┐ │ ┌───────────┐ │ │ │ |
| 376 | +│ │ │ │ coredns │ │ │ traefik │ │ │ appsmith │ │ │ │ |
| 377 | +│ │ │ │ metrics │ │ │ longhorn │ │ │ longhorn │ │ │ │ |
| 378 | +│ │ │ │ longhorn │ │ │ │ │ │ │ │ │ │ |
| 379 | +│ │ │ └───────────┘ │ └───────────┘ │ └───────────┘ │ │ │ |
| 380 | +│ │ │ │ │ │ │ │ |
| 381 | +│ │ │ ════════════ Longhorn Replicated Storage ═══════════ │ │ |
| 382 | +│ │ │ │ │ │ │ │ |
| 383 | +│ │ │ [etcd] │ [etcd] │ [etcd] │ │ │ |
| 384 | +│ │ │ [api-server] │ [api-server] │ [api-server] │ │ │ |
| 385 | +│ │ └─────────────────┴─────────────────┴─────────────────┘ │ │ |
| 386 | +│ └─────────────────────────────────────────────────────────────────────────┘ │ |
| 387 | +└───────────────────────────────────────────────────────────────────────────────┘ |
| 388 | +``` |
0 commit comments