Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,139 @@ and this file MUST be updated together whenever `__version__` changes.

---

## [0.8.0-dev3] — 2026-06-01

### Added — Subject taxonomy + `DedupStore` + `ReflexContext` (foundation for multi-source sensing)

Third sub-step of the brain refactor. Locks in two things that every
subsequent sub-step depends on: the **NATS subject taxonomy** and the
**dedup contract**. No new publishers and no behavior change in
production yet — that lands in `0.8.0-dev4`. This PR is intentionally a
pure refactor so the foundations can be reviewed in isolation.

#### Subject taxonomy

`docs/architecture/subjects.md` is the new authoritative spec.
Companion machine-readable constants in
`netcortex/contracts/subjects.py`:

* `SENSORY_EVENT_CLASSES` — closed vocabulary (`link_down`, `link_up`,
`bgp_drop`, `bgp_up`, `device_reboot`, `device_unreachable`,
`device_reachable`, `security_alert`, `config_change`,
`topology_change`, `route_advertisement_change`). Adding a class
requires a doc + constant change in the same PR.
* `SENSORY_SOURCES` — `<modality>_<provenance>` tokens
(`snmp_trap`, `snmp_poll`, `meraki_webhook`, `gnmi_dialout`, …).
* `sensory_subject(event_class, source, *target_parts)` — validated
builder. Refuses unknown classes/sources, empty parts, embedded
dots, whitespace.
* `parse_sensory_subject(subject)` — inverse extractor; handlers use
it to derive the `fact_key` from the incoming subject.

**Why event-class-first**: `sensory.<event_class>.<source>.<target>`
lets a single handler subscribe to `sensory.link_down.>` and catch
every source of link-down. The earlier `sensory.<modality>.<source>.<event>.<target>`
ordering forced per-source subscriptions and made the
"same-event-multiple-sources" dedup story awkward.

#### `DedupStore` Protocol + `InMemoryDedupStore`

`netcortex/contracts/dedup_store.py` defines the atomic check-and-record
contract. `netcortex/working/dedup/in_memory.py` ships the only 0.8.0
implementation:

* Asyncio-safe (lock-protected mutations).
* TTL-bounded with lazy expired-entry sweep (bounded budget per call
so tail latency is predictable).
* Size-bounded with LRU eviction so a misbehaving publisher cannot OOM.
* Accepts an injectable clock for fast deterministic unit tests.
* Cap warnings and long-TTL warnings on long-lived state that would
mask flap behavior.

Redis-backed implementation lands in 0.9.0 alongside working memory.
Contract tests (9 cases) parametrize over every registered
implementation — Redis only needs a factory function and a registry
row to gain full coverage when it arrives.

#### `ReflexContext` (handler dependency injection)

`netcortex.reflex.protocol.ReflexContext` is the runtime dependency
bag every handler receives on `handle(event, ctx)`. Frozen dataclass,
all fields optional, new resources added by appending fields so old
handlers are unaffected.

* `ctx.dedup_store: DedupStore | None` — 0.8.0
* `ctx.semantic_memory`, `ctx.working_memory`, `ctx.policy_engine`,
… — appended in later releases

The `ReflexRunner` owns one `ReflexContext` (default-constructed if
not supplied) and threads it through every dispatch. Existing failure-
isolation and lifecycle behavior unchanged.

#### Handler refactor — new patterns + dedup logic

| Handler | Old pattern (dev2) | New pattern (dev3) | Dedup window |
|---|---|---|---|
| `link_down` | `sensory.snmp.trap.link_down.>` | `sensory.link_down.>` | 60 s |
| `bgp_drop` | `sensory.snmp.trap.bgp_backward_transition.>` | `sensory.bgp_drop.>` | 60 s |
| `security_alert` (renamed from `security_webhook`) | `sensory.meraki.webhook.security.>` | `sensory.security_alert.>` | 300 s |

Renaming `security_webhook` → `security_alert` because the new handler
is source-agnostic (Meraki today, Cisco AMP / future SIEM tomorrow);
the file moved from `security_webhook.py` → `security_alert.py`. The
operator-facing handler id is renamed accordingly.

Each handler now constructs a `fact_key = "<event_class>|<target>"`
(plus `event_type` for `security_alert`) and consults `ctx.dedup_store`
when present. Duplicates return `outcome="skipped"` with the dedup
rationale; first arrivals return `outcome="logged"` as before. Severity
is intentionally demoted to `info` on skipped outcomes — they are
corroboration telemetry, not a second incident.

**Known limitation in 0.8.0**: a real flap (down/up/down within one
window) collapses to a single fact. Tracked: the fusion stage in
0.9.0 handles state transitions explicitly. See subjects.md "Dedup
model" section.

#### Tests

* `tests/contracts/dedup_store/test_dedup_store_contract.py` — 9 cases
parametrized over every registered store implementation
(atomicity-under-concurrency, TTL expiry, empty-key rejection,
non-positive-TTL rejection, close idempotency, use-after-close raises).
* `tests/contracts/test_subjects.py` — 13 cases for the taxonomy
builders, parser, vocabulary integrity checks.
* `tests/working/dedup/test_in_memory.py` — 6 cases for the in-memory
store specifics (LRU eviction, lazy sweep, fake clock, ctor
validation, close clears state).
* `tests/reflex/test_handlers.py` — updated for new patterns + 5 new
dedup cases (cross-source dedup for `link_down`, different-target
independence, missing-target skip-dedup, Meraki retry dedup for
`security_alert`, distinct-event-type-no-dedup, trap+gnmi dedup for
`bgp_drop`).
* `tests/reflex/test_runner.py` — updated for new signature + 2 new
cases (default-context wiring, explicit-context threading).
* `tests/reflex/test_registry.py` — updated for new handler signature
on the stub.

### Breaking — pre-release only

* `ReflexHandler.handle(event)` → `ReflexHandler.handle(event, ctx)` —
every handler implementation must take the context. The three
first-party handlers were updated in this PR; no external consumers
exist yet.
* Handler ids: `security_webhook` → `security_alert`. The dev2 release
never persisted these to anywhere stable, so this rename is
cost-free; future renames after publishers exist will require the
dual-publish dance described in subjects.md.

### Not yet wired

* Still no publishers. Pollers continue to call correlator + writeback
directly. The first dual-write publisher lands in `0.8.0-dev4`.
* Outcomes are logged only — Neo4j `:ReflexEvent` persistence + NetBox
journal mirror also land in `0.8.0-dev4`.

## [0.8.0-dev2] — 2026-06-01

### Added — Reflex skeleton: registry + runner + 3 first-party handlers
Expand Down
177 changes: 177 additions & 0 deletions docs/architecture/subjects.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# NATS Subject Taxonomy

The bus (Thalamus, see [`brain.md`](./brain.md)) carries every observation,
every derived fact, every reflex outcome, and every motor action through a
single shared subject namespace. The shape of that namespace determines
how easily handlers can subscribe to **all sources of one thing** versus
**all things from one source** — those two needs pull in different
directions, and the taxonomy below is what we landed on.

## Top-level namespaces

| Namespace | Who publishes | Who subscribes | Lifetime |
|---|---|---|---|
| `sensory.>` | Receivers (webhook / trap / telemetry) and pollers (SNMP, Meraki API, etc.) | Resolver (Stage 2), fusion (Stage 3), reflex (Stage 4 fallback), episodic memory | Raw observations, retained briefly in JetStream |
| `fact.>` (0.9.0+) | Fusion stage, after dedup + corroboration | Reflex, working memory, semantic memory | Derived facts, retained longer |
| `reflex.>` | Reflex handlers | UI, episodic memory, downstream agents | Handler outcomes |
| `motor.>` (future) | Conductor / agents | Action executors, audit, semantic memory | Outbound actions |
| `consolidation.>` (future) | Consolidation cycles | Semantic memory, pattern memory | Bulk derived state |

Anything published outside these four namespaces will be rejected by the
receiver-side validators once those are in place; for now (0.8.0) it is
strongly discouraged.

## `sensory.<event_class>.<source>.<target_parts...>`

Event-class first. This is the **only** ordering that lets a single
handler subscribe to "all link-down observations regardless of where
they came from" with a NATS wildcard.

### `<event_class>` — a closed vocabulary

| Class | Meaning | Typical sources |
|---|---|---|
| `link_down` | An interface transitioned to operationally-down | SNMP linkDown trap, SNMP poll diff, Meraki webhook, gNMI dial-out |
| `link_up` | An interface transitioned to operationally-up | same |
| `bgp_drop` | A BGP session entered a non-Established state | BGP4-MIB trap, CISCO-BGP4-MIB trap, gNMI BGP neighbor sample |
| `bgp_up` | A BGP session entered Established | same |
| `device_reboot` | A device restarted | coldStart trap, sysUpTime reset on poll, Meraki status webhook |
| `device_unreachable` | A device stopped responding to our reachability probes | poller probe failure, Meraki status webhook |
| `device_reachable` | The inverse | same |
| `security_alert` | A security-class event observed | Meraki webhook (IDS/malware/blocked-URL), Cisco AMP webhook, future SIEM |
| `config_change` | A device configuration changed | Meraki webhook, NETCONF change notification, RANCID-style diff |
| `topology_change` | A new neighbor appeared or an existing one disappeared | LLDP/CDP poll diff |
| `route_advertisement_change` | Advertised prefix set changed on a session | BGP RIB poll diff |

New classes are added by amending this table **and** the
`SENSORY_EVENT_CLASSES` constant in `netcortex/contracts/event_bus.py`
in the same PR. Reviewers reject PRs that grow one without the other.

### `<source>` — `<modality>_<provenance>` joined as one token

Why one token: NATS wildcards match exactly one token (`*`) or trailing
greedy (`>`). Compound sources as separate tokens would make
"all snmp sources" subscriptions require multiple subscriptions.

| Modality | Source token | Notes |
|---|---|---|
| SNMP | `snmp_trap`, `snmp_poll`, `snmp_walk` | `_walk` reserved for bulk MIB walks distinct from per-OID polls |
| HTTP webhook | `meraki_webhook`, `thousandeyes_webhook`, `cisco_amp_webhook`, `catalyst_center_webhook` | Source names match the upstream platform |
| Streaming telemetry | `gnmi_dialout`, `gnmi_dialin`, `netconf_yangpush`, `cisco_mdt` | |
| API poll | `meraki_api`, `intersight_api`, `vsphere_api`, `fmc_api`, `nexus_dashboard_api` | One per platform adapter |
| Synthetic | `netcortex_inference` | When the system itself derives an observation (rare; prefer `fact.*`) |

### `<target_parts...>` — canonical identifiers, dot-separated tokens

Targets are the entities the event is about. Multi-part keys use `|`
(URL-safe, unambiguous) **within** a token, dots between tokens:

| Event class | Target shape | Example |
|---|---|---|
| `link_down`, `link_up` | `<device>\|<interface>` | `sensory.link_down.snmp_trap.cpn-ful-cat9k1\|Gi1/0/12` |
| `bgp_drop`, `bgp_up` | `<device>\|<peer_ip>` | `sensory.bgp_drop.gnmi_dialout.cpn-ful-cat8k1\|10.0.1.5` |
| `device_*` | `<device>` | `sensory.device_reboot.snmp_trap.cpn-ful-cat9k1` |
| `security_alert` | `<network>\|<client_mac_or_ip>` | `sensory.security_alert.meraki_webhook.N_1\|aa:bb:cc:dd:ee:ff` |
| `config_change` | `<device>` or `<device>\|<section>` | `sensory.config_change.meraki_webhook.Q2XX-YYYY-ZZZZ` |
| `topology_change` | `<device_a>\|<iface_a>\|<device_b>\|<iface_b>` | `sensory.topology_change.snmp_poll.r1\|Gi0/1\|r2\|Gi0/24` |
| `route_advertisement_change` | `<device>\|<peer>` | `sensory.route_advertisement_change.snmp_poll.r1\|10.0.1.5` |

**Canonicalization happens before publish.** Receivers publish with the
identifier shape they observed (raw IP, MAC, serial, ifIndex). The
resolver stage (0.9.0+) canonicalizes to NetBox-blessed names by querying
semantic memory and re-publishes under the same subject with the
canonical target. Until then (0.8.0), best-effort canonicalization
happens inline in the receiver; un-canonicalized observations still pass
through, they just may dedup imperfectly.

## `fact.<event_class>.<target_parts...>` (lands in 0.9.0)

After the fusion stage dedups same-event-different-source observations
within a per-class time window, the surviving fact is republished under
`fact.>`. Reflex handlers eventually move from `sensory.*` subscription
to `fact.*` subscription, gaining:

* one-fire-per-real-event semantics for free (no per-handler dedup code)
* corroboration metadata (which sources observed it)
* identity already canonicalized

## `reflex.<handler_id>.<outcome>` (future)

Reflex handlers may emit their own events back to the bus so other
consumers (UI live tail, downstream automation, episodic memory) can
react. Not used in 0.8.0-dev3 but the namespace is reserved.

Examples:

* `reflex.link_down.applied.cpn-ful-cat9k1|Gi1/0/12`
* `reflex.bgp_drop.skipped.cpn-ful-cat8k1|10.0.1.5` (skipped because deduped)
* `reflex.security_webhook.errored.N_1|aa:bb:cc:dd:ee:ff`

## `motor.<target>.<action>.<outcome>` (future)

Reserved for actuation. Not in scope until the policy library + write-gate
land.

## Dedup model (0.8.0)

Same-event-multiple-sources is handled at the **reflex handler layer**
using a `DedupStore` (in-memory in 0.8.0, Redis in 0.9.0+, see
[`netcortex.contracts.dedup_store.DedupStore`](../../netcortex/contracts/dedup_store.py)).
The fact key is intentionally short and scoped by event class so two
handlers reacting to the same target on different conditions (e.g.
`link_down` and `link_up`) do not collide:

```
fact_key = f"{event_class}|{canonical_target}"
```

The store enforces a TTL window per call. First arrival succeeds and
records the key with TTL = `handler.dedup_window_seconds`. Any later
arrival of the same `fact_key` before TTL expiry is treated as a
duplicate and the handler returns a `skipped` outcome (still recorded,
so operators see corroboration in the UI).

**Known limitation in 0.8.0:** a real flap whose down→up→down
transitions all land within one window collapses to one fire. We accept
this because flap detection is a working-memory concern (0.9.0) and
adding a transition counter in 0.8.0 would require publishers to emit
one, which they cannot uniformly do. The fusion stage in 0.9.0 tracks
state transitions explicitly.

Per-handler defaults:

| Handler | `dedup_window_seconds` | Rationale |
|---|---|---|
| `link_down` | 60 | A real flap surfaces as multiple facts; a duplicate-source surfaces once |
| `bgp_drop` | 60 | Same as link_down — BGP sessions don't flap meaningfully faster |
| `security_webhook` | 300 | Meraki retries delivery; same alert may arrive 2-3 times within minutes |

Handlers can override per-event by computing a different `fact_key` (e.g.,
include a `transition_id` from the payload) — see each handler's docstring.

## Wildcards and worked subscriptions

Common reflex subscriptions and what they catch:

```text
sensory.link_down.> # ALL link-down regardless of source
sensory.link_down.snmp_trap.> # link-down ONLY from SNMP traps
sensory.link_down.*.cpn-ful-cat9k1.* # link-down from any source for one device
sensory.bgp_drop.> # ALL bgp drops
sensory.security_alert.meraki_webhook.> # Meraki security alerts only
sensory.*.snmp_trap.> # Everything from any SNMP trap source
sensory.> # Firehose (episodic memory only)
fact.link_down.> # 0.9.0+: post-fusion link-down facts
```

## Versioning and breaking changes

Subject names are part of the **operator-facing contract**. Renaming an
event class or source token is a breaking change and requires:

1. A `BREAKING CHANGE:` note in the changelog
2. Dual-publish under both old and new subjects for one minor version
3. A deprecation warning in the CHANGELOG entry
4. Removal of the old subject in the version after that

Adding a new event class or source token is **not** a breaking change.
2 changes: 1 addition & 1 deletion netcortex/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@
``CHANGELOG.md`` MUST be kept in sync whenever ``__version__`` changes.
"""

__version__ = "0.8.0-dev2"
__version__ = "0.8.0-dev3"
2 changes: 2 additions & 0 deletions netcortex/contracts/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,14 @@

from __future__ import annotations

from netcortex.contracts.dedup_store import DedupStore
from netcortex.contracts.event_bus import EventBus, EventBusValidationError, EventMessage
from netcortex.contracts.policy import Decision, Policy, PolicyContext
from netcortex.contracts.sensory_adapter import SensoryAdapter, SensoryEvent

__all__ = [
"Decision",
"DedupStore",
"EventBus",
"EventBusValidationError",
"EventMessage",
Expand Down
Loading
Loading