feat(fleetnode): device pairing + agent reporting (server) [PR 1/2]#332
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 55468dd665
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
🔐 Codex Security Review
Review SummaryOverall Risk: HIGH Findings[HIGH] Pair/unpair RPCs bypass miner pairing permissions
[MEDIUM] Discovery reports ignore the required server-issued command ID
NotesNo cryptostealing/pool-hijack behavior, raw SQL injection, command injection, or protobuf wire-format break was evident in the reviewed diff. The added Generated by Codex Security Review | |
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry — single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction, dropped-event counter (64-slot buffer), atomic accounting for tests + observability. - FleetNodeGateway.ControlStream — bidi handler. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices hook — when the agent reports devices with a command_id, the batch is also published to the registry so the operator's waiting stream wakes up. - FleetNodeAdmin.DiscoverOnFleetNode — operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - pairing.proto — buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - ipscanner — exports GenerateIPsFromCIDR for cross-package reuse. - middleware/rpc_permissions — DiscoverOnFleetNode -> fleetnode:manage. - Tests — registry register/send/ack/eviction, ControlStream hello + dispatch + duplicate-stream rejection, DiscoverOnFleetNode happy/no-stream/MDNS-reject/IPRange-expand/Nmap-passthrough/viewer-gate, ReportDiscoveredDevices command_id correlation, expandIPv4Range overflow + boundary cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds server-side fleet-node device pairing and discovery reporting surfaces, wiring a new pairing domain into the fleet-node admin and gateway handlers plus revocation cleanup.
Changes:
- Adds
fleetnodepairingservice/store models, SQL queries, and integration coverage. - Implements FleetNodeAdmin pair/unpair/list endpoints and FleetNodeGateway discovery report ingestion.
- Updates revocation to delete fleet-node pairings and wires services into
fleetd.
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
server/sqlc/queries/fleetnodepairing.sql |
Adds pairing, unpairing, listing, cleanup, and fleet-node discovery upsert queries. |
server/internal/domain/fleetnodepairing/service.go |
Adds pairing business logic and discovery report validation/upsert flow. |
server/internal/domain/fleetnodepairing/models.go |
Defines pairing and discovery report domain models. |
server/internal/domain/stores/sqlstores/fleetnodepairing.go |
Implements SQL store adapter for the new pairing domain. |
server/internal/domain/stores/sqlstores/fleetnodeenrollment.go |
Adds revocation cleanup store method. |
server/internal/domain/fleetnodeenrollment/service.go |
Extends revoke flow to delete fleet-node device pairings. |
server/internal/handlers/fleetnodeadmin/handler.go |
Adds admin RPC handlers for pair, unpair, and list fleet-node devices. |
server/internal/handlers/fleetnodeadmin/handler_pairing_test.go |
Adds handler tests for admin pairing endpoints and permissions. |
server/internal/handlers/fleetnodegateway/handler.go |
Adds gateway RPC handler for discovered device reports. |
server/internal/handlers/fleetnodegateway/handler_heartbeat_test.go |
Updates handler test setup for the new pairing dependency. |
server/internal/handlers/fleetnodegateway/handler_discovery_test.go |
Adds gateway discovery report handler tests. |
server/internal/handlers/middleware/rpc_permissions.go |
Moves implemented fleet-node admin RPCs into permission mapping. |
server/cmd/fleetd/main.go |
Wires pairing service/store into production handlers. |
server/generated/sqlc/fleetnodepairing.sql.go, server/generated/sqlc/db.go |
Regenerated sqlc bindings for new queries. |
Address security review on PR #332: the validator allows empty scheme but rejects "virtual", the scheme the virtual plugin emits (plugin/virtual/internal/driver/driver.go:160,186). Every legitimate virtual-plugin discovery report currently fails validation. - Add "virtual" to the allowlist so virtual-plugin reports round-trip cleanly. - Drop "" from the allowlist — empty was an undocumented placeholder, not a graceful default. The agent's plugin driver always knows the scheme at probe time. - Tests: TestUpsertDiscoveredDevices_AcceptsVirtualScheme (new positive case), TestUpsertDiscoveredDevices_RejectsEmptyScheme (new negative case), existing RejectsDisallowedScheme (ftp) untouched. The other two findings from the same review (command_id binding, attribution-based cloud-pairing quarantine) live in PR #235. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5a1df0053
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry — single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction, dropped-event counter (64-slot buffer), atomic accounting for tests + observability. - FleetNodeGateway.ControlStream — bidi handler. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices hook — when the agent reports devices with a command_id, the batch is also published to the registry so the operator's waiting stream wakes up. - FleetNodeAdmin.DiscoverOnFleetNode — operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - pairing.proto — buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - ipscanner — exports GenerateIPsFromCIDR for cross-package reuse. - middleware/rpc_permissions — DiscoverOnFleetNode -> fleetnode:manage. - Tests — registry register/send/ack/eviction, ControlStream hello + dispatch + duplicate-stream rejection, DiscoverOnFleetNode happy/no-stream/MDNS-reject/IPRange-expand/Nmap-passthrough/viewer-gate, ReportDiscoveredDevices command_id correlation, expandIPv4Range overflow + boundary cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four findings from PR #332 inline review: 1. URLScheme proto/validator mismatch: proto's url_scheme had buf.validate `in: ["http", "https"]` which would reject "virtual" before the handler ever saw it, making the recent validator allowlist change dead surface. Added "virtual" to the proto `in: [...]` list and regenerated. Now virtual-plugin reports pass both layers. 2. UpsertDiscoveredDevices batch wasn't atomic: each per-report upsert auto-committed, so a mid-batch validation failure left a committed prefix. Now: validate every report up-front (validateReport is pure / O(n)), then wrap the writes in s.transactor.RunInTx so either the whole batch commits or none does. Ownership-rejected rows (0 rows affected) still count toward rejectedOwnership without aborting the tx — that's the store's normal "we refused to overwrite a hijacked row" signal, not an error. 3. Added RejectedCount field to ReportDiscoveredDevicesResponse and slog.Warn when rejectedOwnership > 0. Field is additive (proto3 backward-compatible). 4. UpsertDiscoveredDeviceFromFleetNode NOT EXISTS guard was over-blocking: the predicate `WHERE fnd.fleet_node_id IS NULL` blocked any device row not paired to the reporting node — including unpaired devices. After UnpairDevice the originating node could never refresh that discovered_device (is_active / last_seen / ip would freeze). Rewrote the predicate as `NOT EXISTS (... JOIN fnd ... AND fnd.fleet_node_id <> $10 ...)` — block only when paired to a *different* fleet_node, which is the actual hijack case the comment described. Updated the test that asserted the old over-blocking behavior (renamed and inverted assertions); added TestUpsertDiscoveredDevices_BatchValidationErrorRollsBack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 35e7a64a8f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry — single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction, dropped-event counter (64-slot buffer), atomic accounting for tests + observability. - FleetNodeGateway.ControlStream — bidi handler. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices hook — when the agent reports devices with a command_id, the batch is also published to the registry so the operator's waiting stream wakes up. - FleetNodeAdmin.DiscoverOnFleetNode — operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - pairing.proto — buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - ipscanner — exports GenerateIPsFromCIDR for cross-package reuse. - middleware/rpc_permissions — DiscoverOnFleetNode -> fleetnode:manage. - Tests — registry register/send/ack/eviction, ControlStream hello + dispatch + duplicate-stream rejection, DiscoverOnFleetNode happy/no-stream/MDNS-reject/IPRange-expand/Nmap-passthrough/viewer-gate, ReportDiscoveredDevices command_id correlation, expandIPv4Range overflow + boundary cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the server-side surfaces operators need to manage fleet-node device pairings and the gateway endpoint agents call to report devices they discovered on their LAN. This is PR 1 of a stack — PR 2 (open as #235) layers server-initiated discovery via ControlStream on top. What's in this PR: - fleetnodepairing domain (Service + Store) with PairDevice, UnpairDevice, ListPairs, ListDevicesForFleetNode, UpsertDiscoveredDevices, plus IP/port/scheme validation on agent-reported devices. - fleet_node_device pairing table queries and UpsertDiscoveredDeviceFromFleetNode with a NOT EXISTS guard that prevents fleet node B from overwriting a device already paired with fleet node A. - FleetNodeGateway.ReportDiscoveredDevices RPC: agents authenticated via fleetnodeauth submit batches of devices; ip_address must be RFC1918/RFC4193, port 1-65535, url_scheme http or https. - FleetNodeAdmin.PairDeviceToFleetNode, UnpairDevice, ListFleetNodeDevices RPCs, gated by fleetnode:manage / fleetnode:read via middleware.RequirePermission. - RevocationCleanupStore extracted from fleetnodeenrollment.Store so RevokeFleetNode now deletes the fleet node's pairings as part of the same TX. - Integration tests for the pairing CRUD round trip, double-pair rejection, soft-deleted/pending node rejection, cross-org isolation, agent-report validation (invalid IP, port, scheme, non-private ranges, RFC4193 IPv6), the NOT EXISTS pairing guard, and revoke-clears-pairings. What's deferred to PR 2: - fleetnodecontrol.Registry (in-memory ControlStream + per-command_id event dispatch). - FleetNodeGateway.ControlStream bidi handler. - FleetNodeAdmin.DiscoverOnFleetNode operator-initiated discovery, plus the proto-level max_items caps on DiscoverRequest modes. - The command_id correlation hook in ReportDiscoveredDevices that fans batches to the operator's waiting stream. Build, vet, lint, and tests for middleware, fleetnodeadmin, fleetnodegateway, fleetnodepairing, and fleetnodeenrollment are green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address security review on PR #332: the validator allows empty scheme but rejects "virtual", the scheme the virtual plugin emits (plugin/virtual/internal/driver/driver.go:160,186). Every legitimate virtual-plugin discovery report currently fails validation. - Add "virtual" to the allowlist so virtual-plugin reports round-trip cleanly. - Drop "" from the allowlist — empty was an undocumented placeholder, not a graceful default. The agent's plugin driver always knows the scheme at probe time. - Tests: TestUpsertDiscoveredDevices_AcceptsVirtualScheme (new positive case), TestUpsertDiscoveredDevices_RejectsEmptyScheme (new negative case), existing RejectsDisallowedScheme (ftp) untouched. The other two findings from the same review (command_id binding, attribution-based cloud-pairing quarantine) live in PR #235. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four findings from PR #332 inline review: 1. URLScheme proto/validator mismatch: proto's url_scheme had buf.validate `in: ["http", "https"]` which would reject "virtual" before the handler ever saw it, making the recent validator allowlist change dead surface. Added "virtual" to the proto `in: [...]` list and regenerated. Now virtual-plugin reports pass both layers. 2. UpsertDiscoveredDevices batch wasn't atomic: each per-report upsert auto-committed, so a mid-batch validation failure left a committed prefix. Now: validate every report up-front (validateReport is pure / O(n)), then wrap the writes in s.transactor.RunInTx so either the whole batch commits or none does. Ownership-rejected rows (0 rows affected) still count toward rejectedOwnership without aborting the tx — that's the store's normal "we refused to overwrite a hijacked row" signal, not an error. 3. Added RejectedCount field to ReportDiscoveredDevicesResponse and slog.Warn when rejectedOwnership > 0. Field is additive (proto3 backward-compatible). 4. UpsertDiscoveredDeviceFromFleetNode NOT EXISTS guard was over-blocking: the predicate `WHERE fnd.fleet_node_id IS NULL` blocked any device row not paired to the reporting node — including unpaired devices. After UnpairDevice the originating node could never refresh that discovered_device (is_active / last_seen / ip would freeze). Rewrote the predicate as `NOT EXISTS (... JOIN fnd ... AND fnd.fleet_node_id <> $10 ...)` — block only when paired to a *different* fleet_node, which is the actual hijack case the comment described. Updated the test that asserted the old over-blocking behavior (renamed and inverted assertions); added TestUpsertDiscoveredDevices_BatchValidationErrorRollsBack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
35e7a64 to
ad35e6c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ad35e6cdf9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry: single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a done channel (so outgoing channel is never closed under a publisher); Send selects on done to bail cleanly. Publishers hold the mutex through the bounded non-blocking send to avoid panicking on a closed channel when cleanup races. Dropped-event counter on a 64-slot buffer, exposed via DroppedEvents(). - FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent cannot hold a server goroutine + HTTP/2 stream indefinitely. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices: rejects reports without a command_id or whose command_id is not in flight for this fleet_node (binds to server-issued ControlCommand). UpsertDiscoveredDevices now returns acceptedIdx []int instead of an opaque count; only the rows the store actually accepted are forwarded to the operator's command stream so ownership-rejected rows can't leak. - FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Wraps the operator ctx with DiscoverCommandTimeout (5m default, var for test override) so a buggy/silent agent cannot pin operator streams + registry entries forever. Returns CodeDeadlineExceeded on timeout. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - discovered_by_fleet_node_id is immutable origin tracking. Set on first agent report; never cleared by PairDevice / UnpairDevice / RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any discovered_device with DiscoveredByFleetNodeID != nil so an agent-reported private IP cannot redirect cloud credentialing later. Migration 000064 adds the column + FK + partial index. - UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto one row; mac:/serial: identifiers pass through unchanged. - pairing.proto: buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - middleware: DiscoverOnFleetNode gated on fleetnode:manage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The committed driver_pb2_grpc.py carried GRPC_GENERATED_VERSION 1.76.0, generated against a stale proto-python-gen venv. CI regenerates with the pinned grpcio-tools 1.80.0, so the generated-code-check and python-gen-staleness gates failed. Regenerate to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 14f02c5193
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry: single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a done channel (so outgoing channel is never closed under a publisher); Send selects on done to bail cleanly. Publishers hold the mutex through the bounded non-blocking send to avoid panicking on a closed channel when cleanup races. Dropped-event counter on a 64-slot buffer, exposed via DroppedEvents(). - FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent cannot hold a server goroutine + HTTP/2 stream indefinitely. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices: rejects reports without a command_id or whose command_id is not in flight for this fleet_node (binds to server-issued ControlCommand). UpsertDiscoveredDevices now returns acceptedIdx []int instead of an opaque count; only the rows the store actually accepted are forwarded to the operator's command stream so ownership-rejected rows can't leak. - FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Wraps the operator ctx with DiscoverCommandTimeout (5m default, var for test override) so a buggy/silent agent cannot pin operator streams + registry entries forever. Returns CodeDeadlineExceeded on timeout. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - discovered_by_fleet_node_id is immutable origin tracking. Set on first agent report; never cleared by PairDevice / UnpairDevice / RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any discovered_device with DiscoveredByFleetNodeID != nil so an agent-reported private IP cannot redirect cloud credentialing later. Migration 000064 adds the column + FK + partial index. - UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto one row; mac:/serial: identifiers pass through unchanged. - pairing.proto: buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - middleware: DiscoverOnFleetNode gated on fleetnode:manage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry: single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a done channel (so outgoing channel is never closed under a publisher); Send selects on done to bail cleanly. Publishers hold the mutex through the bounded non-blocking send to avoid panicking on a closed channel when cleanup races. Dropped-event counter on a 64-slot buffer, exposed via DroppedEvents(). - FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent cannot hold a server goroutine + HTTP/2 stream indefinitely. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices: rejects reports without a command_id or whose command_id is not in flight for this fleet_node (binds to server-issued ControlCommand). UpsertDiscoveredDevices now returns acceptedIdx []int instead of an opaque count; only the rows the store actually accepted are forwarded to the operator's command stream so ownership-rejected rows can't leak. - FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Wraps the operator ctx with DiscoverCommandTimeout (5m default, var for test override) so a buggy/silent agent cannot pin operator streams + registry entries forever. Returns CodeDeadlineExceeded on timeout. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - discovered_by_fleet_node_id is immutable origin tracking. Set on first agent report; never cleared by PairDevice / UnpairDevice / RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any discovered_device with DiscoveredByFleetNodeID != nil so an agent-reported private IP cannot redirect cloud credentialing later. Migration 000064 adds the column + FK + partial index. - UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto one row; mac:/serial: identifiers pass through unchanged. - pairing.proto: buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - middleware: DiscoverOnFleetNode gated on fleetnode:manage. Review fixes folded in: - Migration 000065 widens discovered_device.url_scheme from VARCHAR(10) to VARCHAR(32) to match the gateway proto's advertised max_len. Schemes of 11-32 chars (e.g. "stratum+tcp") passed validation but overflowed the column, failing the whole batch as an internal error. - UpsertDiscoveredDevices tallies accepted/rejected into per-attempt locals reset on closure entry, so a RunInTx retry after a retryable Postgres/commit failure can no longer double-count a batch. Adds a unit test for the retry path and a DB-backed test for the 32-char scheme. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry: single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a done channel (so outgoing channel is never closed under a publisher); Send selects on done to bail cleanly. Publishers hold the mutex through the bounded non-blocking send to avoid panicking on a closed channel when cleanup races. Dropped-event counter on a 64-slot buffer, exposed via DroppedEvents(). - FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent cannot hold a server goroutine + HTTP/2 stream indefinitely. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices: rejects reports without a command_id or whose command_id is not in flight for this fleet_node (binds to server-issued ControlCommand). UpsertDiscoveredDevices now returns acceptedIdx []int instead of an opaque count; only the rows the store actually accepted are forwarded to the operator's command stream so ownership-rejected rows can't leak. - FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Wraps the operator ctx with DiscoverCommandTimeout (5m default, var for test override) so a buggy/silent agent cannot pin operator streams + registry entries forever. Returns CodeDeadlineExceeded on timeout. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - discovered_by_fleet_node_id is immutable origin tracking. Set on first agent report; never cleared by PairDevice / UnpairDevice / RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any discovered_device with DiscoveredByFleetNodeID != nil so an agent-reported private IP cannot redirect cloud credentialing later. Migration 000064 adds the column + FK + partial index. - UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto one row; mac:/serial: identifiers pass through unchanged. - pairing.proto: buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - middleware: DiscoverOnFleetNode gated on fleetnode:manage. Review fixes folded in: - Migration 000065 widens discovered_device.url_scheme from VARCHAR(10) to VARCHAR(32) to match the gateway proto's advertised max_len. Schemes of 11-32 chars (e.g. "stratum+tcp") passed validation but overflowed the column, failing the whole batch as an internal error. - UpsertDiscoveredDevices tallies accepted/rejected into per-attempt locals reset on closure entry, so a RunInTx retry after a retryable Postgres/commit failure can no longer double-count a batch. Adds a unit test for the retry path and a DB-backed test for the 32-char scheme. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry: single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a done channel (so outgoing channel is never closed under a publisher); Send selects on done to bail cleanly. Publishers hold the mutex through the bounded non-blocking send to avoid panicking on a closed channel when cleanup races. Dropped-event counter on a 64-slot buffer, exposed via DroppedEvents(). - FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent cannot hold a server goroutine + HTTP/2 stream indefinitely. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices: rejects reports without a command_id or whose command_id is not in flight for this fleet_node (binds to server-issued ControlCommand). UpsertDiscoveredDevices now returns acceptedIdx []int instead of an opaque count; only the rows the store actually accepted are forwarded to the operator's command stream so ownership-rejected rows can't leak. - FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Wraps the operator ctx with DiscoverCommandTimeout (5m default, var for test override) so a buggy/silent agent cannot pin operator streams + registry entries forever. Returns CodeDeadlineExceeded on timeout. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - discovered_by_fleet_node_id is immutable origin tracking. Set on first agent report; never cleared by PairDevice / UnpairDevice / RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any discovered_device with DiscoveredByFleetNodeID != nil so an agent-reported private IP cannot redirect cloud credentialing later. Migration 000064 adds the column + FK + partial index. - UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto one row; mac:/serial: identifiers pass through unchanged. - pairing.proto: buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - middleware: DiscoverOnFleetNode gated on fleetnode:manage. Review fixes folded in: - Migration 000065 widens discovered_device.url_scheme from VARCHAR(10) to VARCHAR(32) to match the gateway proto's advertised max_len. Schemes of 11-32 chars (e.g. "stratum+tcp") passed validation but overflowed the column, failing the whole batch as an internal error. - UpsertDiscoveredDevices tallies accepted/rejected into per-attempt locals reset on closure entry, so a RunInTx retry after a retryable Postgres/commit failure can no longer double-count a batch. Adds a unit test for the retry path and a DB-backed test for the 32-char scheme. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry: single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a done channel (so outgoing channel is never closed under a publisher); Send selects on done to bail cleanly. Publishers hold the mutex through the bounded non-blocking send to avoid panicking on a closed channel when cleanup races. Dropped-event counter on a 64-slot buffer, exposed via DroppedEvents(). - FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent cannot hold a server goroutine + HTTP/2 stream indefinitely. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices: rejects reports without a command_id or whose command_id is not in flight for this fleet_node (binds to server-issued ControlCommand). UpsertDiscoveredDevices now returns acceptedIdx []int instead of an opaque count; only the rows the store actually accepted are forwarded to the operator's command stream so ownership-rejected rows can't leak. - FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Wraps the operator ctx with DiscoverCommandTimeout (5m default, var for test override) so a buggy/silent agent cannot pin operator streams + registry entries forever. Returns CodeDeadlineExceeded on timeout. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - discovered_by_fleet_node_id is immutable origin tracking. Set on first agent report; never cleared by PairDevice / UnpairDevice / RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any discovered_device with DiscoveredByFleetNodeID != nil so an agent-reported private IP cannot redirect cloud credentialing later. Migration 000064 adds the column + FK + partial index. - UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto one row; mac:/serial: identifiers pass through unchanged. - pairing.proto: buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - middleware: DiscoverOnFleetNode gated on fleetnode:manage. Review fixes folded in: - Migration 000065 widens discovered_device.url_scheme from VARCHAR(10) to VARCHAR(32) to match the gateway proto's advertised max_len. Schemes of 11-32 chars (e.g. "stratum+tcp") passed validation but overflowed the column, failing the whole batch as an internal error. - UpsertDiscoveredDevices tallies accepted/rejected into per-attempt locals reset on closure entry, so a RunInTx retry after a retryable Postgres/commit failure can no longer double-count a batch. Adds a unit test for the retry path and a DB-backed test for the 32-char scheme. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry: single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a done channel (so outgoing channel is never closed under a publisher); Send selects on done to bail cleanly. Publishers hold the mutex through the bounded non-blocking send to avoid panicking on a closed channel when cleanup races. Dropped-event counter on a 64-slot buffer, exposed via DroppedEvents(). - FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent cannot hold a server goroutine + HTTP/2 stream indefinitely. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices: rejects reports without a command_id or whose command_id is not in flight for this fleet_node (binds to server-issued ControlCommand). UpsertDiscoveredDevices now returns acceptedIdx []int instead of an opaque count; only the rows the store actually accepted are forwarded to the operator's command stream so ownership-rejected rows can't leak. - FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Wraps the operator ctx with DiscoverCommandTimeout (5m default, var for test override) so a buggy/silent agent cannot pin operator streams + registry entries forever. Returns CodeDeadlineExceeded on timeout. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - discovered_by_fleet_node_id is immutable origin tracking. Set on first agent report; never cleared by PairDevice / UnpairDevice / RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any discovered_device with DiscoveredByFleetNodeID != nil so an agent-reported private IP cannot redirect cloud credentialing later. Migration 000064 adds the column + FK + partial index. - UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto one row; mac:/serial: identifiers pass through unchanged. - pairing.proto: buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - middleware: DiscoverOnFleetNode gated on fleetnode:manage. Review fixes folded in: - Migration 000065 widens discovered_device.url_scheme from VARCHAR(10) to VARCHAR(32) to match the gateway proto's advertised max_len. Schemes of 11-32 chars (e.g. "stratum+tcp") passed validation but overflowed the column, failing the whole batch as an internal error. - UpsertDiscoveredDevices tallies accepted/rejected into per-attempt locals reset on closure entry, so a RunInTx retry after a retryable Postgres/commit failure can no longer double-count a batch. Adds a unit test for the retry path and a DB-backed test for the 32-char scheme. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry: single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a done channel (so outgoing channel is never closed under a publisher); Send selects on done to bail cleanly. Publishers hold the mutex through the bounded non-blocking send to avoid panicking on a closed channel when cleanup races. Dropped-event counter on a 64-slot buffer, exposed via DroppedEvents(). - FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent cannot hold a server goroutine + HTTP/2 stream indefinitely. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices: rejects reports without a command_id or whose command_id is not in flight for this fleet_node (binds to server-issued ControlCommand). UpsertDiscoveredDevices now returns acceptedIdx []int instead of an opaque count; only the rows the store actually accepted are forwarded to the operator's command stream so ownership-rejected rows can't leak. - FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Wraps the operator ctx with DiscoverCommandTimeout (5m default, var for test override) so a buggy/silent agent cannot pin operator streams + registry entries forever. Returns CodeDeadlineExceeded on timeout. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - discovered_by_fleet_node_id is immutable origin tracking. Set on first agent report; never cleared by PairDevice / UnpairDevice / RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any discovered_device with DiscoveredByFleetNodeID != nil so an agent-reported private IP cannot redirect cloud credentialing later. Migration 000064 adds the column + FK + partial index. - UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto one row; mac:/serial: identifiers pass through unchanged. - pairing.proto: buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - middleware: DiscoverOnFleetNode gated on fleetnode:manage. Review fixes folded in: - Migration 000065 widens discovered_device.url_scheme from VARCHAR(10) to VARCHAR(32) to match the gateway proto's advertised max_len. Schemes of 11-32 chars (e.g. "stratum+tcp") passed validation but overflowed the column, failing the whole batch as an internal error. - UpsertDiscoveredDevices tallies accepted/rejected into per-attempt locals reset on closure entry, so a RunInTx retry after a retryable Postgres/commit failure can no longer double-count a batch. Adds a unit test for the retry path and a DB-backed test for the 32-char scheme. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry: single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a done channel (so outgoing channel is never closed under a publisher); Send selects on done to bail cleanly. Publishers hold the mutex through the bounded non-blocking send to avoid panicking on a closed channel when cleanup races. Dropped-event counter on a 64-slot buffer, exposed via DroppedEvents(). - FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent cannot hold a server goroutine + HTTP/2 stream indefinitely. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices: rejects reports without a command_id or whose command_id is not in flight for this fleet_node (binds to server-issued ControlCommand). UpsertDiscoveredDevices now returns acceptedIdx []int instead of an opaque count; only the rows the store actually accepted are forwarded to the operator's command stream so ownership-rejected rows can't leak. - FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Wraps the operator ctx with DiscoverCommandTimeout (5m default, var for test override) so a buggy/silent agent cannot pin operator streams + registry entries forever. Returns CodeDeadlineExceeded on timeout. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - discovered_by_fleet_node_id is immutable origin tracking. Set on first agent report; never cleared by PairDevice / UnpairDevice / RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any discovered_device with DiscoveredByFleetNodeID != nil so an agent-reported private IP cannot redirect cloud credentialing later. Migration 000064 adds the column + FK + partial index. - UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto one row; mac:/serial: identifiers pass through unchanged. - pairing.proto: buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - middleware: DiscoverOnFleetNode gated on fleetnode:manage. Review fixes folded in: - Migration 000065 widens discovered_device.url_scheme from VARCHAR(10) to VARCHAR(32) to match the gateway proto's advertised max_len. Schemes of 11-32 chars (e.g. "stratum+tcp") passed validation but overflowed the column, failing the whole batch as an internal error. - UpsertDiscoveredDevices tallies accepted/rejected into per-attempt locals reset on closure entry, so a RunInTx retry after a retryable Postgres/commit failure can no longer double-count a batch. Adds a unit test for the retry path and a DB-backed test for the 32-char scheme. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR 2 of a stack. Layers operator-initiated discovery on top of the pairing + agent-reporting surface in PR 1 (#332). Builds on the existing fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory registry correlates server-issued ControlCommand requests with the agent's eventual ReportDiscoveredDevices batches. What's in this PR: - fleetnodecontrol.Registry: single-instance in-memory map of fleet_node_id -> active ControlStream + per-command_id event channel (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a done channel (so outgoing channel is never closed under a publisher); Send selects on done to bail cleanly. Publishers hold the mutex through the bounded non-blocking send to avoid panicking on a closed channel when cleanup races. Dropped-event counter on a 64-slot buffer, exposed via DroppedEvents(). - FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent cannot hold a server goroutine + HTTP/2 stream indefinitely. After Hello, registers the stream and pumps outgoing ControlCommand requests + incoming ControlAck responses through a side goroutine (2-buffer to avoid linger on exit). - ReportDiscoveredDevices: rejects reports without a command_id or whose command_id is not in flight for this fleet_node (binds to server-issued ControlCommand). UpsertDiscoveredDevices now returns acceptedIdx []int instead of an opaque count; only the rows the store actually accepted are forwarded to the operator's command stream so ownership-rejected rows can't leak. - FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC. Validates target is CONFIRMED, normalizes IPRange to IPList (capped at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap. Wraps the operator ctx with DiscoverCommandTimeout (5m default, var for test override) so a buggy/silent agent cannot pin operator streams + registry entries forever. Returns CodeDeadlineExceeded on timeout. Uses id.GenerateID() for command_id and proto.Marshal for the payload. - discovered_by_fleet_node_id is immutable origin tracking. Set on first agent report; never cleared by PairDevice / UnpairDevice / RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any discovered_device with DiscoveredByFleetNodeID != nil so an agent-reported private IP cannot redirect cloud credentialing later. Migration 000064 adds the column + FK + partial index. - UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto one row; mac:/serial: identifiers pass through unchanged. - pairing.proto: buf.validate count caps on DiscoverRequest modes (4096 IPs, 256 ports per mode). - middleware: DiscoverOnFleetNode gated on fleetnode:manage. Review fixes folded in: - Migration 000065 widens discovered_device.url_scheme from VARCHAR(10) to VARCHAR(32) to match the gateway proto's advertised max_len. Schemes of 11-32 chars (e.g. "stratum+tcp") passed validation but overflowed the column, failing the whole batch as an internal error. - UpsertDiscoveredDevices tallies accepted/rejected into per-attempt locals reset on closure entry, so a RunInTx retry after a retryable Postgres/commit failure can no longer double-count a batch. Adds a unit test for the retry path and a DB-backed test for the 32-char scheme. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR 1 of a stack. Adds the server-side surfaces operators need to manage fleet-node device pairings and the gateway endpoint agents call to report devices they discovered on their LAN.
Summary
Service+StorewithPairDevice,UnpairDevice,ListPairs,ListDevicesForFleetNode,UpsertDiscoveredDevices. Agent reports are validated for RFC1918/RFC4193 IP, port range, and http/https scheme.UpsertDiscoveredDeviceFromFleetNode—NOT EXISTSpairing guard prevents fleet node B from overwriting a device already paired with fleet node A.fleetnode:manage/fleetnode:readviamiddleware.RequirePermission.RevocationCleanupStoreout offleetnodeenrollment.StoresoRevokeFleetNodedeletesfleet_node_devicerows for the revoked node in the same transaction.Deferred to PR 2 (#235)
fleetnodecontrol.Registry(in-memory ControlStream + per-command_idevent dispatch).FleetNodeGateway.ControlStreambidi handler.FleetNodeAdmin.DiscoverOnFleetNodeoperator-initiated discovery + protomax_itemscaps onDiscoverRequestmodes.command_idcorrelation hook inReportDiscoveredDevicesthat fans batches to the operator's waiting stream.Test plan
cd server && go build ./... && go vet ./...just lintclean (buf, eslint, golangci-lint)DB_PASSWORD=fleet go test ./internal/handlers/middleware/... ./internal/handlers/fleetnodeadmin/... ./internal/handlers/fleetnodegateway/... ./internal/domain/fleetnodepairing/... ./internal/domain/fleetnodeenrollment/...— all green🤖 Generated with Claude Code