Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,29 @@ jobs:
contracts:
name: Contract tests (Protocol conformance)
runs-on: ubuntu-latest
services:
# Real NATS server so the NatsEventBus contract suite runs against the
# actual production backend, not just InMemoryEventBus. Core pub/sub
# only (no JetStream) — the EventBus Protocol is at-least-once-within-
# session, which is exactly NATS core. When 0.9.0 adds durable
# consumers for episodic memory we will run a separate job against a
# JetStream-enabled image. Monitoring port (8222) used by the GH
# service health check; client port (4222) is what NATS_URL points at.
nats:
image: nats:2.11-alpine
ports:
- 4222:4222
- 8222:8222
options: >-
--health-cmd "wget -qO- http://localhost:8222/healthz || exit 1"
--health-interval 5s
--health-timeout 3s
--health-retries 10
env:
# tests/contracts/conftest.py keys off this — when set, the NATS
# implementation is included in the parametrize; when unset, it is
# skipped (so devs without a broker can still run the suite).
NATS_URL: nats://localhost:4222
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
Expand Down
88 changes: 88 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,94 @@ and this file MUST be updated together whenever `__version__` changes.

---

## [0.8.0-dev1] — 2026-06-01

### Added — Thalamus: NATS-backed event bus lands

First sub-step of the brain-architecture refactor. Stands up the message
bus that the rest of `0.8.0` will route every signal through. Nothing
**uses** the bus yet — that's `0.8.0-dev3` — but the substrate is in
place, the Protocol has a real production implementation, and CI verifies
it against the same contract suite that covered the in-memory reference.

See `docs/architecture/brain.md` for the role of the thalamus in the
brain-mapped architecture and the rationale for NATS specifically.

#### Code

- `netcortex/thalamus/` — new package.
- `nats_bus.py` — `NatsEventBus` implementing the
`EventBus` Protocol against a real NATS server. NATS core pub/sub
(no JetStream) because the Protocol promises at-least-once delivery
within a session, no replay — which is exactly NATS core's semantics.
JetStream is enabled at the *server* level so future durable
consumers (episodic memory in 0.9.0; stream bridge for external
agents) can opt in via extension methods without redeploy.
- `__init__.py` — re-exports `NatsEventBus`. Package is intentionally
small; additional bus implementations (Redis, Kafka) would land as
siblings here.
- Lifecycle: sync constructor (matches the `Callable[[], EventBus]` shape
the contract tests use as a factory); lazy connect on first
publish/subscribe; idempotent `close()` that drains pending publishes
and cleanly unsubscribes everyone before closing the socket.
- Wire format: JSON-encoded UTF-8 payloads, NATS headers (server 2.2+)
for framing metadata. Malformed payloads surfaced as `{"_raw": ...}`
with a warning rather than crashing the consumer loop.

#### Tests

- `tests/contracts/conftest.py` — `NatsEventBus` registered as a second
`EventBus` implementation. The full contract suite (publish/subscribe
roundtrip, wildcard filtering, no-replay, independent subscribers,
invalid-subject rejection, invalid-payload rejection, idempotent close)
now runs against the real NATS backend in addition to `InMemoryEventBus`.
- `NATS_URL` env-gated: when unset the parametrized cases skip (so devs
without a local broker can still run the suite); when set the same
cases exercise the production code path. CI always sets it.

#### Infrastructure

- `deploy/helm/templates/{statefulset,service,configmap}-nats.yaml` —
single-node JetStream-enabled NATS StatefulSet matching the existing
Redis/Neo4j pattern. Headless ClusterIP service for stable DNS;
ConfigMap-driven `nats.conf`; PVC-backed `/data/jetstream`. Liveness
probes the bare listener; readiness asserts JetStream subsystem is up.
- `deploy/helm/values.yaml` — `nats:` block (enabled by default, 2.11-alpine,
2Gi PVC, resource caps that match Redis-class). HA clustering
(3-node raft) explicitly deferred to a later 0.8.x patch.
- `deploy/helm/templates/_helpers.tpl` — `netcortex.natsUrl` template
consistent with `netcortex.redisUrl` and `netcortex.neo4jUri`.
- `deployment-{web,worker}.yaml` — `NATS_URL` env var threaded into both
pods, gated on `nats.enabled` so operators that bring their own NATS
can disable the bundled chart.
- `deploy/helm/Chart.yaml` — chart `version` 0.1.0 → 0.2.0, `appVersion`
0.6.0 → 0.8.0-dev1.

#### Local dev

- `docker-compose.yml` — NATS service added with JetStream enabled,
monitoring port exposed, healthcheck against `/healthz`. `NATS_URL`
wired into the netcortex app and worker containers.

#### CI

- `.github/workflows/ci.yaml` — `contracts` job gains a NATS service
container (`nats:2.11-alpine` core pub/sub, JetStream not needed for
Protocol-surface tests). `NATS_URL=nats://localhost:4222` exported so
the gated NATS contract cases actually execute.

#### Dependencies

- `nats-py>=2.6` added to runtime deps. Light dependency (async-only
client, no native code).

### Not yet wired

- No production code path uses the bus yet. Pollers still call the
correlator and writeback directly. The cutover lands in `0.8.0-dev3`
(first dual-write) and `0.8.0-dev5` (full cutover).
- No `reflex/` handlers yet — those land in `0.8.0-dev2`.

## [0.7.1-dev3] — 2026-06-01

### Fixed — first green CI run
Expand Down
4 changes: 2 additions & 2 deletions deploy/helm/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ apiVersion: v2
name: netcortex
description: The intelligence layer for your network — MCP server, sync engine, and platform adapters
type: application
version: 0.1.0
appVersion: "0.6.0"
version: 0.2.0
appVersion: "0.8.0-dev1"
keywords:
- network
- mcp
Expand Down
9 changes: 9 additions & 0 deletions deploy/helm/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,12 @@ Neo4j bolt URI for peer containers.
{{- define "netcortex.neo4jUri" -}}
bolt://{{ include "netcortex.fullname" . }}-neo4j:7687
{{- end }}

{{/*
NATS URL for peer containers — the thalamus (event bus) the brain refactor
introduces in 0.8.0. Headless Service + StatefulSet means the pod is
reachable as <fullname>-nats:4222 from within the cluster.
*/}}
{{- define "netcortex.natsUrl" -}}
nats://{{ include "netcortex.fullname" . }}-nats:4222
{{- end }}
46 changes: 46 additions & 0 deletions deploy/helm/templates/configmap-nats.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{{- if .Values.nats.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "netcortex.fullname" . }}-nats
labels:
{{- include "netcortex.labels" . | nindent 4 }}
app.kubernetes.io/component: nats
data:
# NATS server configuration.
#
# Single-node JetStream-enabled deployment. HA clustering (3+ nodes with
# raft-replicated streams) lands in a later 0.8.x patch once the reflex
# layer has driven enough message volume to justify the operational
# complexity. Until then a single replica with a PVC is enough — JetStream
# persists to disk and survives pod restart.
#
# The 0.8.0-dev1 NatsEventBus implementation uses NATS core pub/sub for
# at-least-once delivery within a subscription session (matching the
# EventBus Protocol). JetStream is enabled at the SERVER level so future
# durable consumers (episodic memory in 0.9.0, stream bridge for external
# agents) can opt in without redeploying NATS.
nats.conf: |
# Client connections
port: 4222
http_port: 8222 # monitoring endpoint (/healthz, /varz, etc.)

# Server identity
server_name: {{ include "netcortex.fullname" . }}-nats-0

# Logging
debug: false
trace: false
logtime: true

# Connection limits — generous defaults for in-cluster traffic.
max_connections: 1000
max_payload: 1MB

# JetStream — server-side persistence for future durable consumers.
jetstream {
store_dir: "/data/jetstream"
max_memory_store: 256MB
max_file_store: {{ .Values.nats.jetstream.maxFileStore | default "1GB" }}
}
{{- end }}
4 changes: 4 additions & 0 deletions deploy/helm/templates/deployment-web.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ spec:
value: {{ include "netcortex.redisUrl" . | quote }}
- name: NEO4J_URI
value: {{ include "netcortex.neo4jUri" . | quote }}
{{- if .Values.nats.enabled }}
- name: NATS_URL
value: {{ include "netcortex.natsUrl" . | quote }}
{{- end }}
{{- with .Values.web.extraEnv }}
{{- toYaml . | nindent 12 }}
{{- end }}
Expand Down
4 changes: 4 additions & 0 deletions deploy/helm/templates/deployment-worker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ spec:
value: {{ include "netcortex.redisUrl" . | quote }}
- name: NEO4J_URI
value: {{ include "netcortex.neo4jUri" . | quote }}
{{- if .Values.nats.enabled }}
- name: NATS_URL
value: {{ include "netcortex.natsUrl" . | quote }}
{{- end }}
- name: SYNC_BACKEND
value: celery
{{- with .Values.worker.extraEnv }}
Expand Down
32 changes: 32 additions & 0 deletions deploy/helm/templates/service-nats.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{{- if .Values.nats.enabled }}
apiVersion: v1
kind: Service
metadata:
name: {{ include "netcortex.fullname" . }}-nats
labels:
{{- include "netcortex.labels" . | nindent 4 }}
app.kubernetes.io/component: nats
spec:
type: ClusterIP
clusterIP: None # headless — StatefulSet pods get stable DNS via serviceName
selector:
{{- include "netcortex.selectorLabels" . | nindent 4 }}
app.kubernetes.io/component: nats
ports:
- name: client
port: 4222
targetPort: 4222
protocol: TCP
- name: monitor
# Used by the readiness/liveness probes against /healthz and by ops
# for /varz, /connz, /streamsz inspection. NOT exposed via Ingress.
port: 8222
targetPort: 8222
protocol: TCP
- name: cluster
# Reserved for future HA clustering (raft replication between NATS
# nodes). Inactive while replicas: 1.
port: 6222
targetPort: 6222
protocol: TCP
{{- end }}
96 changes: 96 additions & 0 deletions deploy/helm/templates/statefulset-nats.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
{{- if .Values.nats.enabled }}
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: {{ include "netcortex.fullname" . }}-nats
labels:
{{- include "netcortex.labels" . | nindent 4 }}
app.kubernetes.io/component: nats
spec:
serviceName: {{ include "netcortex.fullname" . }}-nats
replicas: 1
selector:
matchLabels:
{{- include "netcortex.selectorLabels" . | nindent 6 }}
app.kubernetes.io/component: nats
template:
metadata:
labels:
{{- include "netcortex.selectorLabels" . | nindent 8 }}
app.kubernetes.io/component: nats
annotations:
# Forces a rolling restart when nats.conf changes — otherwise the
# ConfigMap update would not be picked up by an already-running pod.
checksum/config: {{ include (print $.Template.BasePath "/configmap-nats.yaml") . | sha256sum }}
spec:
# The official nats:alpine images run as UID 1000 by default. Same
# caveat as Redis/Neo4j on OpenShift restricted SCC — set runAsUser
# to null in a values override when targeting OCP, or grant anyuid.
securityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
containers:
- name: nats
image: "{{ .Values.nats.image.repository }}:{{ .Values.nats.image.tag }}"
imagePullPolicy: IfNotPresent
args:
- "--config"
- "/etc/nats/nats.conf"
ports:
- name: client
containerPort: 4222
- name: monitor
containerPort: 8222
- name: cluster
containerPort: 6222
resources:
{{- toYaml .Values.nats.resources | nindent 12 }}
volumeMounts:
- name: nats-config
mountPath: /etc/nats
readOnly: true
- name: nats-data
mountPath: /data
livenessProbe:
httpGet:
path: /healthz
port: monitor
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
readinessProbe:
httpGet:
# ?js-enabled-only=true asserts JetStream subsystem is up,
# not just the listener — matters because reflex handlers
# and future durable consumers depend on it.
path: /healthz?js-enabled-only=true
port: monitor
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumes:
- name: nats-config
configMap:
name: {{ include "netcortex.fullname" . }}-nats
items:
- key: nats.conf
path: nats.conf
volumeClaimTemplates:
- metadata:
name: nats-data
spec:
accessModes:
- ReadWriteOnce
{{- with .Values.nats.persistence.storageClass }}
storageClassName: {{ . | quote }}
{{- end }}
resources:
requests:
storage: {{ .Values.nats.persistence.size }}
{{- end }}
30 changes: 30 additions & 0 deletions deploy/helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,36 @@ neo4j:
size: 1Gi
storageClass: ""

# -----------------------------------------------------------------------------
# NATS — the thalamus (event bus) introduced in 0.8.0.
#
# Single-node JetStream-enabled deployment. Adapters publish SensoryEvents
# here; reflex handlers, the correlator, and (later) the stream bridge for
# external agents all subscribe. See docs/architecture/brain.md.
#
# HA clustering (3 raft-replicated nodes) lands in a later 0.8.x patch when
# event volume justifies the operational complexity.
# -----------------------------------------------------------------------------
nats:
enabled: true
image:
repository: nats
tag: "2.11-alpine"
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
persistence:
size: 2Gi
storageClass: "" # "" = cluster default
jetstream:
# File-backed JetStream store cap. Keep aligned with persistence.size
# (leave 20-30% headroom for indices and overhead).
maxFileStore: "1GB"

# -----------------------------------------------------------------------------
# NetCortex app data volume (/app/data)
# -----------------------------------------------------------------------------
Expand Down
Loading
Loading