(8) ACLs by daniel-noland · Pull Request #1565 · githedgehog/dataplane

daniel-noland · 2026-05-24T02:48:15Z

Introduces two new crates, scoped narrowly to "1Hz ACL updates" so we can develop and experiment with the cascade and update logic without the 6-10k-line LSM avalanche described in the rfc-lsm branch.

dataplane-cascade: head + frozen[] + tail with multi-writer head absorption via the Upsert trait (commutative+associative folding). Slot-published Cascade with rotate (freeze head, install new) and compact (fuse frozen into tail via MergeInto). Snapshot walks head -> frozen[] -> tail and stops at the first definitive answer. Generation newtype is in place; lookup_at / compact_through exist but aren't load-bearing yet at 1Hz update rate. DrainEvent / subscribe are behind a feature flag, off by default.

dataplane-acl: software ACL classifier wrapping the cascade. AclHead is a Mutex<BTreeMap<Priority, AclRule>>; AclFrozen is a priority-sorted Vec; AclTail is structurally identical (room to swap a richer representation later). Compaction merges sealed layers into the tail with priority dedup.

Deferred for follow-up PRs (already worked out on rfc-lsm but not shipping yet): mat facade crate, mat-runtime (PolicyGenAllocator, ManagedCascade), mat-state-sync (PeerDedup), pressure regimes / try_reset, FlowOrigin / HasOrigin, conntrack-shape cascade.

See .scratch/lsm.md and .scratch/mat-pipeline-rfc/0001-mat-pipeline.md for the broader design this is a thin slice of.

Sweep direct `use std::sync::{Arc, Mutex, RwLock, atomic::*}` imports across the workspace to `concurrency::sync` so loom/shuttle test builds can route through instrumented primitives via one feature flip. Two enforcement layers: * `clippy.toml` extends `disallowed-types` for the lock primitives. parking_lot's lock types are distinct concrete types, so clippy sees through the `concurrency::sync` re-export without flagging legitimate uses. * `.semgrep/rules/no-std-sync-direct.yaml` covers the rest (`Arc`, `Weak`, atomics, `LazyLock`, `OnceLock`, `Once`, `Barrier`, `Condvar`) where clippy's alias resolution can't distinguish the facade re-export from `std::sync`. The `concurrency` crate and its tests are exempt by path. `mgmt/tests/reconcile.rs` keeps a direct `std::sync::Mutex` because bolero's `catch_unwind` needs `RefUnwindSafe`, which parking_lot's `Mutex` doesn't impl. Documented inline with `clippy::disallowed_types` allow + `nosemgrep` annotation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

`std::thread::Builder::spawn_scoped` is inherent on std but missing on the loom and shuttle Builders; both ship `Scope::spawn` instead. Add a `concurrency::thread::BuilderExt` trait with one method: * std: forwards to the inherent `Builder::spawn_scoped` via fully-qualified call (Rust's method resolution prefers the inherent, so the trait impl is dead but kept for symmetry). * shuttle / loom: discards advisory Builder config, delegates to `Scope::spawn`, wraps the infallible return in `Ok` to match std's `io::Result` signature. `use concurrency::thread::BuilderExt;` lets call sites write `builder.spawn_scoped(scope, f)` under every backend. Used by the kernel driver's named scoped threads in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Collapse the chained `shuttle_dfs -> shuttle_pct -> shuttle` features into a single `shuttle` feature backed by `shuttle::PortfolioRunner`. The runner drives `RandomScheduler` and `PctScheduler` in parallel; any scheduler finding a counterexample fails the test (`stop_on_first_failure = true`). `shuttle_dfs` becomes an additive opt-in that adds `DfsScheduler` to the same portfolio. `stress.rs` now has one shuttle arm instead of three, and `#[concurrency::test]` emits one leaf per backend (`loom` / `shuttle`) instead of three shuttle variants. Workspace consumers (`nat`, `flow-entry`) and CI (`dev.yml`) drop the `shuttle_pct` step; the existing `shuttle` step covers Random + PCT in one pass. Tests previously gated `not(feature = "shuttle_pct")` to opt out of single-threaded bodies that PCT panics on are rewritten to `not(feature = "shuttle")` since PCT now runs in every shuttle build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Tier A of the std::thread sweep that complements the `concurrency::sync` facade migration. Swap `use std::thread` to `use concurrency::thread` in two test modules whose tests are candidates for `#[concurrency::test]` conversion: * `routing/src/fib/test.rs` -- prerequisite for the FIB race-test conversion later in the stack. * `dpdk/src/acl/mod.rs::classify_concurrent_arc_shared` -- import swap only; the test runs under real DPDK EAL so the macro conversion is deferred. Production threading sites (`dpdk/src/lcore.rs`, `mgmt/src/processor/ launch.rs`, `routing/src/router/rio.rs`, `dataplane/src/statistics`, `test-utils`) are left alone -- they need real OS threads and never compile under loom/shuttle. A later sweep adds clippy/semgrep enforcement once production sites are also routed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Tier B of the std::thread sweep. `ThreadPortMap` keys a per-thread `RwLock<HashMap<ThreadId, _>>` by `std::thread::current().id()`. Each backend ships its own `ThreadId`, so a std-typed map would silently work in production while loom/shuttle key the table by their own thread identity. Route the import and call sites through `concurrency::thread` so the key tracks the active backend. No behavioural change under the default backend. Prerequisite for any future loom/shuttle exercise of the NAT allocator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Replace the hand-rolled `shuttle::check_random(... 100)` wrappers in `shuttle_tests` with `#[concurrency::test]`, which routes bodies through `concurrency::stress` (loom `model`, shuttle `PortfolioRunner`). The module is renamed `concurrency_tests` and gated to `cfg(any(feature = "shuttle", feature = "loom"))`. `FlowTable::insert` spawns a tokio task for the flow timer, which would panic without a running runtime; the existing `start_timer` bypass under shuttle is extended to loom. Tokio-driven coverage of `insert` stays in `std_tests`. `test_flow_table_timeout` is dropped from the model-checker mod: it's single-threaded (PCT rejects), and `std_tests` already has the authoritative `#[tokio::test(start_paused = true)]` version. Adds `loom = ["concurrency/loom"]` to `flow-entry/Cargo.toml` so the macro-emitted cfg arm resolves to a known feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Move `test_fib_removals` and `test_leftright_destroy_race_simple` out of the `#[concurrency_mode(std)]` block into a sibling `concurrency_tests` module that runs through `#[concurrency::test]`: default backend smoke run, `loom::model` under loom, shuttle's PortfolioRunner under shuttle. The heavy fuzz loops (`test_concurrency_fib` / `test_concurrency_fibtable`) stay on std -- their 100k+ packet iteration counts are TSAN-calibrated, not for per-iteration model-checking cost. Iteration counts are tuned per backend via `cfg_select!`: 5 rounds under loom/shuttle (vs 1000 on std), with a fixed reader/worker budget under the model checkers so unbounded poll loops don't trip shuttle's `max_steps` ceiling. `test_packet` is inlined because the original lives in the std-gated `mod tests` and is invisible under loom/shuttle. Add `loom`, `shuttle`, `shuttle_dfs` features to `routing/Cargo.toml` so the macro-emitted cfg arms resolve to known features. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Bump `concurrency::stress`'s shuttle `Config::stack_size` from the default 32 KiB to 4 MiB. Shuttle wraps each atomic/lock primitive with bookkeeping that pushes per-instance size into the 100-byte range (an `AtomicBool` is ~100 bytes under shuttle), so any non-trivial atomic-heavy body blows through the default. The historical workaround was per-call `shuttle::Config` overrides at 1 MiB (notably in NAT's allocator tests). One number in the dispatcher kills the per-test knob. 4 MiB carries the heaviest workspace consumer (NAT allocator's per-block atomic arrays) with headroom; the cost is `N workers * 4 MiB` per stress iteration, well below CI memory pressure. `shuttle::Config` is `#[non_exhaustive]`, so written as a mutation of `Config::default()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Now that `concurrency::stress` carries a 4 MiB shuttle stack and the PortfolioRunner runs Random + PCT in parallel, the custom `shuttle_config` / `run_shuttle_random` / `run_shuttle_pct` scaffolding in `tests_shuttle` is redundant. Replace with a single `mod concurrency_tests` that flips each test to `#[concurrency::test]`: * `test_concurrent_allocations_two_ips` (was `..._without_shuttle`) -- two threads against distinct source IPs; smoke run on default backend, full coverage on `--features shuttle`. * `test_concurrent_allocations_three_workers` (was `..._shuttle_random` + `_pct`, collapsed) -- portfolio runs both schedulers in one invocation. * `test_ensure_shuttle_works` -- gated to model-checker backends only; the deliberate race only reaches the failing schedule under a real scheduler, the default backend's one-shot run is non-deterministic. Drops the helpers plus the `Arc` / `thread` imports in `std_tests` that they pulled in. Adds `loom = ["concurrency/loom"]` to `nat/Cargo.toml` so the macro-emitted cfg arm resolves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Make `just features=loom test` build and run end-to-end across the workspace. Earlier commits had loom working only on the `concurrency` crate; everything else failed to compile or crashed at runtime with stack overflow, Arc leak, or DashMap destructor panics. Bundled into one commit because each fix was discovered by running the previous one. ## concurrency * `fn sleep(_: Duration)` shim under loom (loom 0.7 doesn't model time; yields to the scheduler so the call still acts as a schedule point). * Enumerate `loom::thread` re-exports and shadow `spawn` with a 4 MiB default stack. Loom 0.7's default coroutine stack is 4 KiB, which overflows trivially under atomic-heavy `concurrency::sync` types. * `stress` under loom wraps the body in a 4 MiB `Builder::spawn` so the main 4 KiB coroutine just spawns and joins. Costs one of loom's five thread slots. * `loom_scope::Scope::spawn` routes through `super::spawn` for the same default. * `Slot::load()` / `SlotOption::load()` helpers used by `common::cliprovider` to drop a redundant `load_full()` clone. ## nat * `cfg_attr`-gate `#![feature(arbitrary_self_types)]`: loom wraps `Arc<T>` in a facade newtype that isn't a blessed self-receiver, so `self: Arc<Self>` methods on `AllocatedIp` / `AllocatedPortBlock` need the unstable feature there. * `#[concurrency_mode(loom)]` no-op `shuffle_slice` (loom needs determinism for replay; shuffle is allocation-order heuristic, not correctness). * Gate `concurrency_tests` off loom: the facade's `Weak` shim holds a strong clone of the `Arc`, so the allocator's `Weak::upgrade().is_none()` liveness signal never fires and loom's `Arc leaked` assertion catches it. ## flow-entry * Gate `concurrency_tests` to shuttle only: `FlowTable`'s internal `DashMap` panics in loom's end-of-execution cleanup (sharded `RwLock`s don't fit loom's strict lifecycle accounting). ## routing * Gate `fib::test::concurrency_tests` off loom: the `left_right` epoch state space is too large for exhaustive search to terminate in reasonable time. ## dataplane * The binary builds an `Arc<dyn Fn(...) ...>` trait-object closure in `packet_processor::setup_internal`, which needs `CoerceUnsized` on the concrete `Arc`. Loom 0.7's `Arc` doesn't carry that trait, and the facade newtype can't add it. Gate the bin out of loom builds: extract the body to `dataplane/src/runtime.rs` and leave `main.rs` with a stub `main` under loom that panics if invoked. Library crates still get loom coverage through feature propagation. * `dataplane/src/drivers/dpdk.rs` switches `use crate::CmdArgs` to `use args::CmdArgs;` to follow the new module layout. CI's loom step in `.github/workflows/dev.yml` still scopes to `--package=dataplane-concurrency` because the loom-incompatible tests across the workspace are now cfg-gated rather than package-filtered; the package scope is no longer load-bearing for the test invocation, only for cargo's feature unification. Default and `--features shuttle` builds unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Leftover from the superseded dis-guard exploration; nothing references it. Public-API drift in `concurrency::slot` for no benefit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

The import-only enforcement let `static X: std::sync::LazyLock<T> = std::sync::LazyLock::new(...)` slip through, and one such site already existed in `config/src/external/overlay/vpcpeering.rs`. Extend the rule with a multi-line regex backstop that matches the facade-managed type names in any expression position, with a leading-comment lookahead so rustdoc intra-doc links (`/// [std::sync::Arc]`) don't false-positive. Also expand the grouped-import regex to span multiple lines. Convert the offending FQN to `concurrency::sync::LazyLock`. Move the deliberate `mgmt/tests/reconcile.rs` `nosemgrep:` onto the same line as the `std::sync::Mutex::new` it suppresses, and annotate the intentional `std::sync::Arc` in `concurrency::stress` (shared *across* `loom::model` invocations, so it must remain a std Arc). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

The previous loom step ran `just test concurrency` only, which left the workspace-wide loom compile (the whole point of the facade's local `Weak<T>` shim and `Arc::downgrade`) unprotected. Add a `cargo check`-equivalent step ahead of the test run so a regression in any consumer crate fails CI directly. Tests stay scoped to the concurrency crate -- model-checking the whole workspace under loom is intractable. Update the inline comment to reflect the new reality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Add the plumbing that lets cargo execute cross-compiled tests under qemu-user when the host architecture doesn't match the target: * `scripts/test-runner.sh` -- thin wrapper. When `MIRI_SYSROOT` is set we delegate to `.cargo-miri-wrapped` so a miri run on a matching host arch still goes through the miri interpreter instead of running natively under qemu. Otherwise: native exec when target == host, else `qemu-${target_machine}`. * `.cargo/config.toml` -- runner entries for the four cross triples we ship (x86_64 / aarch64 × gnu / musl). Explicit triples (not cfg-patterns) because cargo miri injects a `cfg(all())` runner and refuses to disambiguate between two cfg-pattern matches; the explicit form wins method-resolution-style and the script's `MIRI_SYSROOT` branch handles the miri case from inside. * `default.nix` -- `qemu-user` joins the dev shell so the wrapper has `qemu-x86_64` / `qemu-aarch64` on PATH locally and in CI. No CI behaviour change yet -- this just makes `cargo test --target <cross-triple>` work via emulation when run by hand. The CI step that actually exercises the path lands in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Add a `test` step to the cross job that runs the full nextest suite against the cross-compiled archive. Each cross binary dispatches through `scripts/test-runner.sh` (registered as the cargo runner in `.cargo/config.toml`) which delegates to `qemu-${target_machine}` when the host architecture doesn't match. Gates: * Only runs on `pull_request` runs with `ci:+cross` or `ci:+cross/full` on the labels. push / merge_group cross legs stay build-only; ISA emulation pays a real wall-clock cost and we don't want to slow the merge queue. * Gated to `matrix.recipe.args == 'dataplane'` so the test pass isn't duplicated for the `frr.dataplane` row, which differs from the `dataplane` row only in the container recipe, not in the cross target. `ci:+cross` (today's "run the cross job at all" label) implies "include the qemu test pass" -- there is currently no way to schedule cross/full builds without a label, so the same gate serves both modes. Splitting them out (e.g. opting bump.yml into cross/full automatically) is a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

The test hard-codes `3` as a "non-multiple of `RESULTS_MULTIPLIER`" which assumes the multiplier is 4 (x86_64). Under cross-compiled aarch64 builds the wrapper's `RESULTS_MULTIPLIER` constant binds to 1 -- the validator's "not-a-multiple" branch becomes unreachable because every positive integer is trivially a multiple of 1, so the validator returns `Ok` and the test panics on the `matches!` assert. Upstream DPDK defines `RTE_ACL_RESULTS_MULTIPLIER` as `XMM_SIZE / sizeof(uint32_t)`, which should be 4 on every supported ISA. Either bindgen on the cross sysroot doesn't see the right `XMM_SIZE` typedef on aarch64 or the cross headers ship a divergent `rte_vect.h`; needs a follow-up investigation. Skip on aarch64 with a TODO so the rest of the cross-aarch64 test surface can run green under qemu-user. This is the only known target-specific divergence in the dpdk binding; the rest of the sweep passes cleanly under musl+qemu. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Introduce a single `emulated` cfg as the umbrella over both emulation backends in the test surface: * `miri.just`: append `--cfg=emulated` to `RUSTFLAGS` so miri invocations set it. * `nix/profiles.nix`: set `--cfg=emulated` whenever `for-tests` is true and the target arch differs from the build host's (`is-emulated-test`). Today this covers cross-arch test builds that run under qemu-user on the lab runners. `--check-cfg=cfg (emulated)` is set unconditionally so unused branches don't trip `unexpected_cfgs`. * `default.nix`: plumb the build host's arch (`stdenv.hostPlatform.parsed.cpu.name`) through to `profiles.nix` as `host-arch`, so `is-emulated-test` compares target against actual host rather than hard-coded \`"x86_64"\`. * `.cargo/config.toml`: same `--check-cfg=cfg(emulated)` for the native dev build path. Sweep existing `cfg_attr(miri, ignore)` / `cfg(miri)` / `cfg_select! { miri => N }` sites that apply equally to qemu-user: * `routing/src/router/rio.rs`, `routing/src/frr/test.rs`, `routing/src/atable/resolver.rs`, `cli/src/cliproto.rs`: tests that bind Unix domain sockets / read kernel-state files now skip under any emulation backend, not just miri. qemu-user's epoll readiness emulation has the same gaps that justified the original miri skips. * `routing/src/fib/test.rs`, `net/src/packet/hash.rs`, `flow-entry/src/flow_table/nf_lookup.rs`, `left-right-tlcache/src/lib.rs`, `k8s-intf/src/bolero/support .rs`, `config/src/utils/collapse.rs`: per-arm iteration counts in `cfg_select!` are now keyed by `emulated` rather than `miri`. qemu-user runs at ~5-10x slowdown vs native and the original miri count (which targeted miri's much steeper slowdown) is still a sensible upper bound for qemu-user too. Pair this with two CI/runtime side changes: * `.config/nextest.toml`: new `cross-qemu` nextest profile sets a `slow-timeout = { period = "60s", terminate-after = 5 }` so a qemu hang gets killed and surfaces as a TIMEOUT in the report instead of wedging the whole run. `fail-fast` is permissive so we collect the full list of qemu-affected tests in one go. * `.github/workflows/dev.yml`: cross-job `test` step is restricted to `matrix.libc == 'musl'` because gnu's libgcc_s unwinder mis-handles qemu-user's emulated signal frames on aarch64 (every panic-unwind path turns into SIGABRT). musl's LLVM libunwind walks those frames correctly, so musl legs run the full suite clean; gnu legs stay build-only until/unless we fix the gnu unwinder story. * `.gitignore`: ignore `**/qemu_*.core` so the SIGABRT cores qemu drops when running gnu cross binaries locally don't show up in `git status`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

It is much too slow to print to stdout like this under qemu-user or miri. Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Two halves of one logical change: * `.github/workflows/dev.yml`: extend the cross job's `if:` to fire on `ci:+cross/full` in addition to `ci:+cross`, so a PR labeled only with `ci:+cross/full` (no `ci:+cross`) still gets cross. * `.github/workflows/bump.yml`: auto-apply `ci:+cross/full` to the weekly cargo-upgrades PR. The weekly cargo-upgrades PR is the right place to catch cross-arch regressions introduced by transitive dep churn (a crate dropping an aarch64 target, changing alignment, etc.). Without an opt-in label the cross job stays build-only on PR runs, so we add \`ci:+cross/full\` alongside the existing \`automated\` and \`dependencies\` labels. Today that label triggers the qemu-user test step on the existing 4-leg cross matrix; once the matrix scope split lands, the same label will expand the matrix to the full hardware x libc sweep without further changes here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Cross stays advisory (`continue-on-error: true`) on every PR, push, and merge_group run, so leg flakes never block merge. That's the right default for interactive PRs and the merge queue, but it makes genuine cross failures easy to miss on the weekly cargo-bump PR (auto-labeled `ci:+cross/full`), where catching upstream-driven aarch64 regressions is the whole point. Add a sticky-comment surfacing step: * New composite action `.github/actions/sticky-pr-comment` -- create-or-update a PR comment keyed by an HTML-comment marker (so subsequent runs find and update the same comment instead of spamming the thread). * New steps in the `summary` job that read the cross matrix's real per-leg conclusions from the Actions REST API (via `actions/github-script`) and, if any leg failed, post the sticky comment with a link back to the failing run. The summary job grows `pull-requests: write` (for the comment) and `actions: read` (for the API call). We can't gate on `needs.cross.result` here: at the job level, `continue-on-error: true` makes that always report `success` to dependents, which is exactly the property we want for merge gating but which also hides the failure from this step. Reading the underlying job results sidesteps the masking. The alternative -- making cross blocking when `ci:+cross/full` is present -- creates an attribution trap: a non-labeled PR that silently regresses cross would land on main, and the next cross/full-labeled bump PR would inherit the broken main and read as "the bump caused the regression." A sticky comment gives reviewers a loud in-PR signal without that misattribution risk and preserves the "leg-flake doesn't block merge" property uniformly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Signed-off-by: github-actions[bot] <224724778+hedgehog-dataplane-update[bot]@users.noreply.github.com>

name old req compatible latest new req ==== ======= ========== ====== ======= netgauze-bgp-pkt 0.12.0 0.12.1 0.12.1 0.12.1 Signed-off-by: github-actions[bot] <224724778+hedgehog-dataplane-update[bot]@users.noreply.github.com>

name old req compatible latest new req ==== ======= ========== ====== ======= netgauze-bmp-pkt 0.12.0 0.12.1 0.12.1 0.12.1 Signed-off-by: github-actions[bot] <224724778+hedgehog-dataplane-update[bot]@users.noreply.github.com>

name old req compatible latest new req ==== ======= ========== ====== ======= serde_json 1.0.149 1.0.150 1.0.150 1.0.150 Signed-off-by: github-actions[bot] <224724778+hedgehog-dataplane-update[bot]@users.noreply.github.com>

Drop manual_assert_eq, redundant references in formatting, and implicit format-arg captures flagged under `-D warnings`. `#[allow(dead_code)]` the unused egress helpers (kept for the future egress NF rework). Precursor for the threading rewrite stack so `just lint` is green throughout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

The DPDK driver was stubbed out behind `#![cfg(feature = "dpdk")]` and the dispatch in `runtime.rs` already routes the "dpdk" argument through `todo!()`. Remove the empty `drivers/dpdk` module, the `dpdk` cargo feature, the `dpdk-sysroot-helper` build dependency, and the now-empty `build.rs`. The DPDK driver supervisor will be re-introduced from scratch when the DPDK threading model lands; the present scaffolding is misleading because it implies the driver exists in some form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Add a `dataplane-lifecycle` crate with `Shutdown` and `Subsystem` primitives, signal-handler installation, and a process-wide shutdown watchdog. `Shutdown` bundles a root `CancellationToken` and one `Subsystem` per long-lived component (workers, router, mgmt, metrics). Each subsystem exposes a per-subsystem cancel token, a `TaskTracker`, and a shared fatal flag. Subsystems drain in topological order with per-subsystem deadlines; the detached watchdog enforces an absolute upper bound on total shutdown duration. No consumers yet -- wired up in follow-on commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Plumb lifecycle Subsystems into the routing crate as the first step of the threading rewrite. The rest of `main`'s shutdown signaling (ctrlc + mpsc<i32> + start_mgmt + MetricsServer + old DriverKernel::start) stays in place; follow-on commits migrate each. - `Router::new` takes `(mgmt, mgmt_handle, router)` Subsystems + runtime handle. Plumbed through `packet_processor::start_router`. - `start_rio` takes `&Subsystem`; the IO loop observes its cancel between poll cycles (worst-case exit latency = 1s poll). Adds an ExitGuard so panic-unwind or unexpected loop exit reports fatal. - `RouterCtlMsg::Finish` removed; `RioHandle::finish` becomes idempotent. - `bmp::spawn_background` spawns onto the caller-provided runtime handle tracked under `mgmt`; no more leaked runtime. - `runtime.rs` builds a multi-thread mgmt runtime (only BMP tenants it in this commit) and a `Shutdown` for plumbing into Router::new. - `dataplane` and `mgmt` Cargo.toml gain `lifecycle` dep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Move mgmt + metrics from dedicated OS threads with private runtimes to tenants of the multi-thread mgmt runtime introduced in the prior commit. The kernel driver still uses the legacy ctrlc + mpsc<i32> path; the final unification commit migrates that. - `run_mgmt` (renamed from `start_mgmt`): synchronous init on the caller-provided handle, then spawns the three long-lived tasks (config processor, status updater, config watcher) via `Subsystem::spawn_fatal_on_exit` so their unexpected exit flips fatal. Init observes `mgmt.root_token()` so SIGINT during k8s retries returns `LaunchError::Cancelled` within cancel latency. - `LaunchError::Cancelled` is a clean-shutdown signal; the call site in `runtime.rs` forwards the existing mpsc stop channel with code 0 so the legacy shutdown path stays consistent. - `spawn_metrics` replaces `MetricsServer`: HTTP endpoint, upkeep ticker, and stats collector all spawn onto `mgmt_handle` tracked under `metrics`. Uses plain `spawn_on` (not `spawn_fatal_on_exit`) — a dead metrics endpoint should not take down the dataplane. - Drop stale `LaunchError` variants no longer constructed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Replace the last legacy shutdown signaling (ctrlc handler + mpsc<i32> exit code + dedicated controller thread) with the single `lifecycle::Shutdown` path, and migrate the kernel driver to scoped threads with cancellation observation. After this commit there is one signaling path: SIGINT/SIGTERM -> shutdown.root, or any subsystem's report_fatal -> shutdown.root, with the watchdog as the absolute upper bound. - `main`: install `spawn_signal_handler` and `spawn_shutdown_watchdog`, run everything inside `concurrency::thread::scope`, block on `root.cancelled()`, then drain subsystems in canonical order (workers -> router -> metrics -> mgmt). Exit code from `is_fatal()`. - `DriverKernel::start`: takes `&Scope` and `&Subsystem`; workers spawn via `spawn_scoped` with an `ExitGuard` Drop pattern that reports fatal on panic-unwind, early `?`-return, and unexpected normal exit. Reader loops observe cancel between reads. Supervisor joins-and-logs. - Drop `dataplane/src/drivers/tokio_util.rs` and its `run_in_local_tokio_runtime` helper (inlined where needed). - Drop `ctrlc` and `mio` from `dataplane` dependencies; drop `ctrlc` from the workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>

Introduces two new crates, scoped narrowly to "1Hz ACL updates" so we can develop and experiment with the cascade and update logic without the 6-10k-line LSM avalanche described in the rfc-lsm branch. `dataplane-cascade`: head + frozen[] + tail with multi-writer head absorption via the `Upsert` trait (commutative+associative folding). Slot-published `Cascade` with `rotate` (freeze head, install new) and `compact` (fuse frozen into tail via `MergeInto`). `Snapshot` walks head -> frozen[] -> tail and stops at the first definitive answer. `Generation` newtype is in place; `lookup_at` / `compact_through` exist but aren't load-bearing yet at 1Hz update rate. `DrainEvent` / `subscribe` are behind a feature flag, off by default. `dataplane-acl`: software ACL classifier wrapping the cascade. `AclHead` is a `Mutex<BTreeMap<Priority, AclRule>>`; `AclFrozen` is a priority-sorted Vec; `AclTail` is structurally identical (room to swap a richer representation later). Compaction merges sealed layers into the tail with priority dedup. Deferred for follow-up PRs (already worked out on rfc-lsm but not shipping yet): mat facade crate, mat-runtime (PolicyGenAllocator, ManagedCascade), mat-state-sync (PeerDedup), pressure regimes / try_reset, FlowOrigin / HasOrigin, conntrack-shape cascade. See `.scratch/lsm.md` and `.scratch/mat-pipeline-rfc/0001-mat-pipeline.md` for the broader design this is a thin slice of. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

daniel-noland and others added 30 commits May 22, 2026 17:22

test: disable trace under emulation

7ee09a9

It is much too slow to print to stdout like this under qemu-user or miri. Signed-off-by: Daniel Noland <daniel@githedgehog.com>

bump(dplane-plugin): accept glibc-ism removal

4ef40cb

Signed-off-by: Daniel Noland <daniel@githedgehog.com>

bump: pins

44bca4f

Signed-off-by: github-actions[bot] <224724778+hedgehog-dataplane-update[bot]@users.noreply.github.com>

bump(cargo)!: bump dependencies (cargo update)

6e51ec3

Signed-off-by: github-actions[bot] <224724778+hedgehog-dataplane-update[bot]@users.noreply.github.com>

bump(cargo)!: bump netgauze-bgp-pkt (cargo upgrade)

980e7e6

name old req compatible latest new req ==== ======= ========== ====== ======= netgauze-bgp-pkt 0.12.0 0.12.1 0.12.1 0.12.1 Signed-off-by: github-actions[bot] <224724778+hedgehog-dataplane-update[bot]@users.noreply.github.com>

bump(cargo)!: bump netgauze-bmp-pkt (cargo upgrade)

93fe7f1

name old req compatible latest new req ==== ======= ========== ====== ======= netgauze-bmp-pkt 0.12.0 0.12.1 0.12.1 0.12.1 Signed-off-by: github-actions[bot] <224724778+hedgehog-dataplane-update[bot]@users.noreply.github.com>

bump(cargo)!: bump serde_json (cargo upgrade)

75bd274

name old req compatible latest new req ==== ======= ========== ====== ======= serde_json 1.0.149 1.0.150 1.0.150 1.0.150 Signed-off-by: github-actions[bot] <224724778+hedgehog-dataplane-update[bot]@users.noreply.github.com>

daniel-noland and others added 3 commits May 23, 2026 15:17

daniel-noland force-pushed the pr/daniel-noland/threading-rewrite branch 6 times, most recently from 800f5dc to 2021414 Compare May 31, 2026 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(8) ACLs#1565

(8) ACLs#1565
daniel-noland wants to merge 33 commits into
pr/daniel-noland/threading-rewritefrom
pr/daniel-noland/cascade-acl

daniel-noland commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

daniel-noland commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant