This file is updated at the END of every Claude Code session. It is the source of truth for what exists, what works, and what comes next. Read this at the START of every session before doing anything.
Phase 5 — Tooling + release hardening (in progress)
🟢 Phase 5 progressing — tooling has expanded beyond Directory User Lookup and SSL Certificate Checker into Jenkins/GitLab air-gap bundle generation, recursive plugin dependency resolution, offline install bundles, host-mediated bundle transfer, and better transfer/download status. Since the last update the platform has also had a substantial release-hardening pass: local auth email verification, login lockout, self-service password reset for local accounts, shared database-backed auth and abuse throttles across replicas, trusted-origin mutation checks, instance-authenticated server actions, enrolment-token limits, certificate-checker SSRF and parsing hardening, signed-update enforcement, terminal SSH credential enforcement, heartbeat/JWT tightening, agent script limits, ingest gRPC caps, config-file permission validation, digest-pinned customer bundles, installer checksum verification, Password Manager release descriptor pinning, and broader E2E/process coverage.
URL, username, password Ansible setup (apps/ansible-api/server.py, apps/web/lib/automation/ansible-api.ts, apps/web/lib/actions/automation.ts, apps/web/app/(dashboard)/settings/integrations/automation/automation-settings-client.tsx, docker-compose.single.yml)
- Added an Ansible API pairing endpoint that accepts initial environment-backed credentials, generates a service-token secret, persists it in the Ansible API data volume, and uses it for ongoing HMAC verification.
- Allowed later rotation by re-pairing with the same initial credentials; static
legacy
ANSIBLE_API_SERVICE_TOKEN_*environment variables remain supported but are treated as externally managed. - Simplified the CT-Ops automation settings screen so admins pair an Ansible connection with only URL, username, and password. CT-Ops stores the generated service secret encrypted and clears the initial password after pairing.
- Added bundled compose/start-script defaults for
ANSIBLE_API_PAIRING_USERNAMEand generatedANSIBLE_API_PAIRING_PASSWORD, plus persistent Ansible token storage and updated docs/tests.
Validation
python3 -m unittest tests/test_contract.pyfromapps/ansible-apinode --experimental-strip-types --test lib/automation/ansible-pairing-core.test.mjs lib/automation/ansible-ui-gating.test.mjs lib/actions/mutation-authz.test.mjsfromapps/webpnpm --dir apps/web type-checkpnpm --dir apps/web db:validatepnpm --dir apps/web lint 'app/(dashboard)/settings/integrations/automation/automation-settings-client.tsx' lib/actions/automation.ts lib/automation/ansible-api.ts lib/automation/ansible-pairing-core.ts lib/automation/ansible-pairing-core.test.mjs lib/automation/ansible-ui-gating.test.mjs lib/actions/mutation-authz.test.mjs tests/e2e/settings/automation.spec.tspnpm --dir apps/web test:unitpnpm --dir apps/web test:e2e tests/e2e/settings/automation.spec.tsbash deploy/scripts/test-ansible-profile-wiring.sh
External module contract and Ansible conversion (apps/web/lib/modules/*, apps/web/lib/db/schema/module-connections.ts, apps/web/lib/automation/ansible-api.ts, apps/ansible-api/server.py)
- Added a generic
module_connectionscontract for optional modules with base URL, contract version, auth mode, service-token metadata, TLS mode, timeout, enabled state, and encrypted token secret storage. - Converted CT-Ops Ansible calls to resolve the configured module connection
instead of assuming the bundled
ansible-apicontainer URL, while retaining the legacyANSIBLE_API_URLfallback for existing installs. - Added service-token HMAC signing from CT-Ops to Ansible and optional HMAC
verification in
ansible-apiwhenANSIBLE_API_SERVICE_TOKEN_IDandANSIBLE_API_SERVICE_TOKEN_SECRETare set. - Removed start-script database probing that auto-started the Ansible Compose profile; Ansible is now treated as a manually run module that can live on the same host, a different host, or behind a reverse proxy.
Validation
node --experimental-strip-types --test apps/web/lib/modules/*.test.mjspython -m unittest tests/test_contract.pyfromapps/ansible-api
Tabbed host activity detail dialog (apps/web/app/(dashboard)/hosts/[id]/host-calendar-tab.tsx, apps/web/lib/actions/calendar.ts, apps/web/tests/e2e/hosts/host-calendar.spec.ts)
- Made the Hosts -> Activity -> Calendar event dialog wider with viewport bounds for mobile screens.
- Split the dialog into Activity Detail, Hosts, and Participants tabs while preserving the existing detail content and long-description scroll behavior.
- Hydrated host calendar events with linked hosts and participants, including participant role labels and a clear Current host marker in the Hosts tab.
Validation
pnpm --dir apps/web type-checkpnpm --dir apps/web test:e2e tests/e2e/hosts/host-calendar.spec.ts
Focused validation and dev-origin cleanup (apps/web/lib/password-manager/api-contract/openapi.json, apps/web/lib/password-manager/client.test.mjs, apps/web/lib/e2e/route-warmup.mjs, apps/web/lib/e2e/route-warmup.test.mjs, apps/web/tests/e2e/runner.mjs, apps/web/tests/e2e/auth.setup.ts, apps/web/lib/security/trusted-origins.ts, apps/web/lib/security/trusted-origins.test.mjs, apps/web/next.config.ts)
- Fixed #1414 by pinning the Password Manager API route contract inside
ct-opsso the web unit suite no longer depends on a siblingct-password-managercheckout. - Fixed #1413 by making focused e2e spec runs skip the broad route warmup by
default while still seeding the test instance and auth storage; set
E2E_ROUTE_WARMUP=allto force the previous full warmup. - Fixed #1400 by deriving Next
allowedDevOriginsfrom trusted dev origins andCT_OPS_DEV_PUBLIC_HOST, while keeping production builds empty.
Validation
node --experimental-strip-types --test apps/web/lib/e2e/route-warmup.test.mjs apps/web/lib/security/trusted-origins.test.mjs apps/web/lib/password-manager/client.test.mjspnpm --dir apps/web lint lib/e2e/route-warmup.mjs lib/e2e/route-warmup.test.mjs lib/security/trusted-origins.ts lib/security/trusted-origins.test.mjs lib/password-manager/client.test.mjs tests/e2e/runner.mjs tests/e2e/auth.setup.ts next.config.tspnpm --dir apps/web type-checkpnpm --dir apps/web test:unit
First-class bundle release artifact (release-please-config.json, .release-please-manifest.json, .github/workflows/agent-release.yml, .github/workflows/customer-bundle-check.yml, install.sh, deploy/customer-bundle/upgrade.sh, deploy/customer-bundle/start.sh, deploy/customer-bundle/README.md, apps/docs/docs/deployment/docker-compose.md, deploy/scripts/test-agent-release-bundle.sh, deploy/scripts/test-install.sh, deploy/scripts/test-upgrade.sh)
- Added a dedicated
bundle/*release-please component for the customer bundle so full-stack install assets are no longer published only from web releases. - Updated release packaging to upload bundle zip/checksum/compose descriptors, Password Manager metadata, and bundle SBOM assets to the bundle release.
- Kept customer defaults digest-pinned by using same-run image build outputs
when available, otherwise resolving the latest released component version
tags for
WEB_IMAGE,INGEST_IMAGE, andANSIBLE_API_IMAGE. The fallback resolver normalizes release-please manifest versions to the publishedv<version>image tags before reading manifest digests, and the release verification renders the ansible Compose profile so the optionalANSIBLE_API_IMAGEpin is checked. - Updated installer and upgrade scripts to download the latest bundle release, refresh all three CT-Ops runtime image env refs, and preserve custom operator overrides with explicit warnings.
Validation
bash -n install.sh deploy/customer-bundle/upgrade.sh deploy/customer-bundle/start.sh deploy/scripts/test-install.sh deploy/scripts/test-upgrade.sh deploy/scripts/test-agent-release-bundle.shbash deploy/scripts/test-agent-release-bundle.shbash deploy/scripts/test-install.shbash deploy/scripts/test-upgrade.shbash deploy/scripts/test-agent-release-password-manager-image-ref.shbash deploy/scripts/test-password-manager-compose.shbash deploy/scripts/test-password-manager-release-contract.shbash deploy/scripts/test-password-manager-nginx.shbash deploy/scripts/test-password-manager-startup.shbash deploy/scripts/test-start.shbash deploy/scripts/test-ansible-profile-wiring.shbash deploy/scripts/test-web-entrypoint.shbash deploy/scripts/test-support-data.shbash deploy/scripts/test-start-restart-port-check.shpython3 -m json.tool release-please-config.jsonpython3 -m json.tool .release-please-manifest.jsonruby -e 'require "yaml"; ARGV.each { |p| YAML.load_file(p); puts "#{p} yaml parsed" }' .github/workflows/agent-release.yml .github/workflows/customer-bundle-check.yml
Read-only vault workspace controls (apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/tests/e2e/fixtures/password-manager.ts, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Gated vault mutation affordances from the selected vault role returned by the
Password Manager API: only
ownerandmanagerroles can open Settings, create entries, choose entry templates, or open edit-entry forms. - Kept read-only vault use intact for restricted roles by leaving vault selection, entry listing, reveal, copy, export, and Audit available.
- Added a hosted Password Manager E2E regression that downgrades the selected
vault to
viewerand asserts Settings, New entry, template selection, and entry edit controls are hidden.
Validation
pnpm --dir apps/web lint 'app/(dashboard)/password-manager/password-manager-client.tsx' 'tests/e2e/fixtures/password-manager.ts' 'tests/e2e/tooling/password-manager.spec.ts'pnpm --dir apps/web type-checkpnpm --dir apps/web exec playwright test --list tests/e2e/tooling/password-manager.spec.tspnpm --dir apps/web exec node tests/e2e/runner.mjs tests/e2e/tooling/password-manager.spec.ts --grep "read-only vault roles"(blocked locally:testcontainerscould not find a working container runtime)
Standards-oriented audit visibility (apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/lib/password-manager/client.ts, apps/web/tests/e2e/fixtures/password-manager.ts, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Added Password Manager client support for the standalone API's redacted instance audit event listing and audit integrity status routes.
- Added an unlocked workspace Audit tab with instance-wide filters for vault, actor, event type, object type, outcome, and time range.
- Rendered public-safe audit evidence only: timestamp, actor, action, target, outcome, summary, and integrity status, with E2E assertions that forensic or secret-shaped fields are not displayed.
- Updated the bundled Password Manager API pin to
api/v0.1.4/ghcr.io/carrtech-dev/ct-password-manager/api@sha256:6f5cc00a33d5df59cbca9968b178675ea45afff9b8f9c2394c2b3ae0a7d09220, which contains the audit read API and integrity-chain migration.
Validation
PASSWORD_MANAGER_OPENAPI_CONTRACT_PATH=/Volumes/MacBookStorage-Dev/dev/carrtech/ct-password-manager-audit-view/docs/api-contract/openapi.json node --experimental-strip-types --test apps/web/lib/password-manager/client.test.mjspython3 deploy/scripts/validate-password-manager-release.py deploy/password-manager-release.jsonbash deploy/scripts/test-password-manager-release-contract.shbash deploy/scripts/test-password-manager-compose.shbash deploy/scripts/test-password-manager-nginx.shbash deploy/scripts/test-password-manager-startup.shpnpm --dir apps/web lint 'app/(dashboard)/password-manager/password-manager-client.tsx' 'lib/password-manager/client.ts' 'lib/password-manager/client.test.mjs' 'tests/e2e/fixtures/password-manager.ts' 'tests/e2e/tooling/password-manager.spec.ts'pnpm --dir apps/web type-checkpnpm --dir apps/web exec playwright test --list tests/e2e/tooling/password-manager.spec.tsBETTER_AUTH_URL=http://localhost:3000 BETTER_AUTH_SECRET=local-build-secret DATABASE_URL=postgres://test:test@localhost:5432/ctops_test pnpm --dir apps/web build(passes with existing Turbopack NFT warning and expected placeholder-secret Better Auth warnings)pnpm --dir apps/web exec node tests/e2e/runner.mjs tests/e2e/tooling/password-manager.spec.ts(blocked locally:testcontainerscould not find a working container runtime)
Member identity and advanced key material UI (apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Changed the vault Settings member list to display recipient name/email when available, with the Password Manager user ID demoted to secondary metadata.
- Triggered the existing member-recipient lookup when Settings is opened so existing vault members can be labelled from recipient metadata.
- Moved the public-key envelope textarea into an advanced disclosure while preserving the manual fallback path used by key rotation.
- Extended the hosted Password Manager E2E flow to assert member name/email display and hidden-by-default public-key envelope controls.
Validation
pnpm --dir apps/web lint 'app/(dashboard)/password-manager/password-manager-client.tsx' 'tests/e2e/tooling/password-manager.spec.ts'pnpm --dir apps/web type-checkpnpm --dir apps/web exec playwright test --list tests/e2e/tooling/password-manager.spec.ts
Entry dialog copy and masking hardening (apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Added copy-to-clipboard controls to saved Password Manager entry view/edit dialog fields, including title, template fields, password values, URLs, and notes, while reusing the existing copy audit hook.
- Changed saved password-style fields in entry dialogs to show a fixed
************mask while hidden, so the hidden display no longer discloses the secret length. Revealing is required before editing the stored value. - Extended the hosted Password Manager E2E flow to cover dialog field copying, fixed hidden password masking, and the additional copy-audit events.
Validation
pnpm --dir apps/web lint 'app/(dashboard)/password-manager/password-manager-client.tsx' 'tests/e2e/tooling/password-manager.spec.ts'pnpm --dir apps/web type-checkpnpm --dir apps/web exec playwright test --list tests/e2e/tooling/password-manager.spec.tspnpm --dir apps/web exec node tests/e2e/runner.mjs tests/e2e/tooling/password-manager.spec.ts(blocked locally:testcontainerscould not find a working container runtime)
Unlocked vault list safety (apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/lib/password-manager/crypto-batch.ts)
- Moved Password Manager browser crypto batching into a shared helper and added a settled batch variant for vault list decryption.
- Updated the unlocked workspace to skip individual vault records whose wrapped key or encrypted metadata cannot be decrypted with the current browser-only unlock profile, instead of failing the whole workspace with the generic vault load error.
- Cleared skipped vault keys from the in-memory cache and surfaced a targeted workspace warning when vault records are skipped.
Validation
node --experimental-strip-types --test apps/web/lib/password-manager/*.test.mjspnpm --dir apps/web lint 'app/(dashboard)/password-manager/password-manager-client.tsx' 'lib/password-manager/crypto-batch.ts' 'lib/password-manager/crypto-batch.test.mjs'pnpm --dir apps/web exec tsc --noEmit --pretty falsepnpm --dir apps/web exec playwright test --list tests/e2e/tooling/password-manager.spec.tspnpm --dir apps/web exec node tests/e2e/runner.mjs tests/e2e/tooling/password-manager.spec.ts(blocked locally:testcontainerscould not find a working container runtime)
Shared planning calendar (apps/web/app/(dashboard)/calendar, apps/web/lib/actions/calendar.ts, apps/web/lib/db/schema/calendar.ts)
- Added an Operations Calendar dashboard page for human planning of maintenance, patching, application work, change windows, meetings, and other operational events. Engineers and admins can create/edit/delete; read-only users can view.
- Added Outlook-style Day, Work Week, Full Week, Month, and Year views using FullCalendar with drag/drop and resize persistence through server actions.
- Added host links and user participants with roles: owner, requester, implementer, approver, reviewer, and observer.
- Added daily/weekly/monthly/yearly recurrence storage with bounded server-side expansion and single-occurrence exceptions for moved recurring instances.
Persistence, security, and audit
- Added
calendar_events,calendar_event_hosts, andcalendar_event_participantswith instance-scoped RLS, indexes, idempotent create support viaclient_request_id, and occurrence-exception uniqueness. - Calendar mutations validate input on the server, enforce instance ownership for linked hosts/users, rate-limit writes, and emit audit events.
- Added sidebar/command-palette entries, docs, E2E fixture cleanup, and route warmup coverage.
Validation
node --experimental-strip-types --test apps/web/lib/calendar/recurrence.test.mjs apps/web/lib/actions/calendar-source.test.mjspnpm --dir apps/web db:validatepnpm --dir apps/web type-checkpnpm --dir apps/web lint 'app/(dashboard)/calendar/page.tsx' 'app/(dashboard)/calendar/operations-calendar-client.tsx' lib/actions/calendar.ts lib/calendar/recurrence.ts lib/db/schema/calendar.ts components/shared/sidebar.tsx components/shared/command-palette/providers.tsx tests/e2e/calendar/calendar.spec.ts tests/e2e/auth.setup.ts tests/e2e/fixtures/db.tsBETTER_AUTH_URL=http://localhost:3000 BETTER_AUTH_SECRET=local-build-secret DATABASE_URL=postgres://test:test@localhost:5432/ctops_test pnpm --dir apps/web build(passes with existing Turbopack NFT warnings and expected placeholder-secret Better Auth warnings)pnpm --dir apps/web exec playwright test --list tests/e2e/calendar/calendar.spec.tspnpm --dir apps/web test:e2e tests/e2e/calendar/calendar.spec.ts(blocked locally:testcontainerscould not find a working container runtime)
Agent task error storage and display (apps/ingest/internal/db/queries/task_runs.sql.go, apps/web/app/api/system/health/route.ts, apps/web/app/(dashboard)/settings/system/system-client.tsx)
- Added
task_run_hosts.error_messageso failed agent task results keep the final error separately from incremental progress output. - Updated the System Health API to return a short agent-error summary plus full detail text.
- Updated the Administration → System Agent Errors table to show the summary inline and expose a View more dialog for the full error.
- Extended System page E2E coverage to seed a failed software inventory task and assert summary/detail behavior.
Validation
pnpm --filter web db:validatepnpm --filter web type-checkpnpm --filter web lint -- 'app/(dashboard)/settings/system/system-client.tsx' app/api/system/health/route.ts tests/e2e/settings/system-health.spec.tsgo test ./apps/ingest/internal/db/queries ./apps/ingest/internal/handlerspnpm --filter web test:e2e tests/e2e/settings/system-health.spec.ts(blocked locally:testcontainerscould not find a working container runtime)
Entry form password reveal controls (apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Added eye-icon visibility toggles to password-style fields in Password Manager entry dialogs, including login passwords, card security codes, and generated SSH key passphrases.
- Reset reveal state when the entry dialog is opened or closed so newly opened forms start with masked values.
- Extended Password Manager E2E coverage to assert password fields are masked by default, can be shown before creating an encrypted entry, and can be hidden again.
Validation
pnpm install --frozen-lockfilepnpm --dir apps/web lint 'app/(dashboard)/password-manager/password-manager-client.tsx' 'tests/e2e/tooling/password-manager.spec.ts'pnpm --dir apps/web exec playwright test --list tests/e2e/tooling/password-manager.spec.tspnpm --dir apps/web type-checkpnpm --dir apps/web exec node tests/e2e/runner.mjs tests/e2e/tooling/password-manager.spec.ts(blocked locally:testcontainerscould not find a working container runtime)
Password Generator embedded layout (apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/tests/e2e/password-generator/password-generator.spec.ts)
- Widened the Password Manager entry dialog so template forms have room for inline password-generation controls.
- Made template password fields span the full form width when generation is available, preventing the label action from compressing the password input.
- Widened the embedded shared Password Generator dialog used from Password Manager entries while leaving the standalone generator features unchanged.
- Added E2E layout assertions covering the Password Manager entry dialog, generated-password field wrapper, and embedded generator dialog sizing.
Validation
pnpm install --frozen-lockfilepnpm --filter web type-checkpnpm --filter web lint -- 'app/(dashboard)/password-manager/password-manager-client.tsx' tests/e2e/password-generator/password-generator.spec.tspnpm --filter web exec playwright test --list tests/e2e/password-generator/password-generator.spec.tsgit diff --check
Patch Status check history chart (apps/web/app/(dashboard)/hosts/[id]/checks-tab.tsx, apps/web/lib/checks/history-chart.ts)
- Changed Patch Status history bars so their height is based on
updates_countinstead of check response duration. - Changed Patch Status history bar colour to reflect patch age against the configured policy threshold: green before five-sixths of the threshold, amber from five-sixths through the threshold, and red after the threshold.
- Kept non-Patch Status checks on the existing response-duration chart semantics, and added tooltip copy for available updates, patch age, and the policy threshold.
Validation
pnpm install --frozen-lockfilenode --experimental-strip-types --test apps/web/lib/checks/history-chart.test.mjspnpm --dir apps/web type-checkpnpm --dir apps/web lint(passes with existing unrelated warnings)pnpm --dir apps/web test:unit(new tests pass; full suite still has unrelated failures from missing local container runtime plus existing issue #1109)
Reusable password generator (apps/web/lib/password-generator.ts, apps/web/components/password-generator/password-generator-tool.tsx, apps/web/app/(dashboard)/password-generator/page.tsx)
- Added a tooling-protected Password Generator page with browser-local password and passphrase generation, presets, length and character-class controls, ambiguous-character exclusion, custom symbols, copy support, and strength feedback.
- Added Password Generator entries to the sidebar and command palette using shared navigation metadata.
- Reused the same generator component inside the Password Manager login entry modal, so template password fields can open the generator and populate the selected encrypted-entry password field without creating a second generator.
Validation
pnpm install --frozen-lockfilenode --experimental-strip-types --test lib/password-generator.test.mjs lib/password-generator/navigation.test.mjspnpm --filter web type-checkpnpm --filter web lint(passes with existing unrelated warnings)BETTER_AUTH_URL=http://localhost:3000 BETTER_AUTH_SECRET=local-build-secret DATABASE_URL=postgres://test:test@localhost:5432/ctops_test pnpm --filter web build(passes; local placeholder secret emits expected Better Auth warnings)pnpm --filter web test:e2e tests/e2e/password-generator/password-generator.spec.ts(blocked locally:testcontainerscould not find a working container runtime)pnpm --filter web test:unit(new generator tests pass; full suite still has unrelated failures from missing container runtime plus existing issue #1109)
Authenticator setup QR code (apps/web/app/(dashboard)/profile/profile-client.tsx, apps/web/tests/e2e/profile/profile.spec.ts)
- Added a scan-ready QR code to the profile two-factor setup flow, generated
from the Better Auth
totpURIreturned when setup starts. - Kept the setup key and full authenticator URI visible as manual fallback options for users who cannot scan the QR code.
- Extended the profile 2FA E2E flow to assert the QR code renders before using the setup key to generate and verify the TOTP code.
Validation
pnpm --filter web add react-qr-codepnpm --dir apps/web test:e2e tests/e2e/profile/profile.spec.ts --grep "enable and disable authenticator"pnpm --dir apps/web type-checkpnpm --dir apps/web lint(passes with existing unrelated warnings)
Template-driven entry modal (apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/lib/password-manager/workspace.ts, apps/web/lib/password-manager/export.ts, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Replaced the inline entry editor with a modal so the Passwords tab gives its full content area to entries in the selected vault.
- Added a split New entry button with a template dropdown. Login remains the default template, and users can also choose Card, Identity, or Secure note templates before opening the modal.
- Extended encrypted entry payloads to carry a template type and template fields while keeping login username/password fields compatible with existing entries, filtering, reveal/copy, and export behavior.
- Expanded the hosted Password Manager E2E flow to create both Card and Login entries through the modal and assert plaintext template fields stay out of API requests, console output, and outbound bodies.
Validation
pnpm installpnpm --filter web type-checkcd apps/web && node --experimental-strip-types --test lib/password-manager/export.test.mjs lib/password-manager/workspace.test.mjspnpm --filter web lint(passes with existing unrelated warnings)pnpm --filter web test:e2e tests/e2e/tooling/password-manager.spec.tspnpm --filter web test:unit(one unrelated existing failure tracked as #1109)
Risk-gated vault export flow (apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/lib/password-manager/export.ts, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Replaced the one-click vault export with an export dialog that requires
fresh unlock-password re-authentication and the typed acknowledgement
I understand the risksbefore any download is generated. - Added an encrypted ZIP export as the default path. The ZIP contains a manifest plus an AES-256-GCM encrypted vault payload derived from the user's export file password with PBKDF2-SHA256.
- Kept explicit plaintext JSON export available as an advanced option with destructive styling and warning copy, while continuing to audit successful export events without sending plaintext secrets or export passwords to the Password Manager API.
Validation
node --experimental-strip-types --test lib/password-manager/export.test.mjspnpm --filter web type-checkpnpm --filter web lint(passes with existing unrelated warnings)pnpm --filter web test:e2e -- tests/e2e/tooling/password-manager.spec.tspnpm --filter web test:unit(one unrelated existing failure filed as #1109)git diff --check
Passwords-first vault workspace (apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Reworked the unlocked Password Manager workspace so the selected vault's password entries are the default primary content instead of sitting beneath vault administration and member management.
- Replaced the left-hand vault list and inline create form with a compact vault drop-down plus a create-vault dialog, keeping the first viewport focused on saved credentials.
- Moved vault rename/delete controls, member role management, member addition, and key rotation into a dedicated Settings tab that is hidden until selected.
- Kept the browser-only crypto and API flows intact while updating E2E coverage to exercise the new tab and dialog workflow.
Validation
pnpm installpnpm --dir apps/web lint 'app/(dashboard)/password-manager/password-manager-client.tsx' 'tests/e2e/tooling/password-manager.spec.ts'pnpm --dir apps/web exec playwright test --list tests/e2e/tooling/password-manager.spec.tspnpm --dir apps/web exec node tests/e2e/runner.mjs tests/e2e/tooling/password-manager.spec.ts
Vault member selector and public-key lookup (apps/web/app/(dashboard)/password-manager/page.tsx, apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/lib/password-manager/client.ts, apps/web/tests/e2e/fixtures/password-manager.ts, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Replaced the manual CT-Ops user ID and pasted public-key JSON add-member controls with a searchable instance-user selector that disables existing members and users who have not completed Password Manager setup.
- Added a Password Manager client call for the new vault-scoped
/member-recipientslookup, caching returned public-key envelopes in browser memory and using the mapped Password Manager user ID when adding members. - Pinned the bundled Password Manager API descriptor and compose image refs to
api/v0.1.3, which contains the recipient-readiness endpoint. - Extended the hosted Password Manager Playwright mock and seed data so E2E coverage exercises recipient readiness lookup without sending unlock passwords, plaintext entry values, private-key envelopes, or KDF metadata.
Validation
pnpm install --frozen-lockfilepnpm --dir apps/web type-checknode --experimental-strip-types --test apps/web/lib/password-manager/browser-crypto.test.mjsnode --experimental-strip-types --test apps/web/lib/password-manager/browser-crypto.test.mjs apps/web/lib/password-manager/client.test.mjspnpm --dir apps/web lint 'app/(dashboard)/password-manager/password-manager-client.tsx' 'lib/password-manager/client.ts' 'tests/e2e/fixtures/password-manager.ts' 'tests/e2e/tooling/password-manager.spec.ts'pnpm --dir apps/web exec playwright test --list tests/e2e/tooling/password-manager.spec.tspnpm --dir apps/web exec node tests/e2e/runner.mjs tests/e2e/tooling/password-manager.spec.tspython3 deploy/scripts/validate-password-manager-release.py deploy/password-manager-release.jsonbash deploy/scripts/test-password-manager-release-contract.shbash deploy/scripts/test-password-manager-compose.shgit diff --check
Descriptor-derived bundled API image (docker-compose.single.yml, .env.example, start.sh, deploy/customer-bundle/start.sh, deploy/customer-bundle/upgrade.sh, .github/workflows/agent-release.yml, .github/workflows/customer-bundle-check.yml, deploy/scripts/test-password-manager-compose.sh, deploy/scripts/test-password-manager-release-contract.sh, deploy/scripts/test-password-manager-startup.sh, deploy/scripts/test-upgrade.sh, deploy/customer-bundle/README.md, apps/docs/docs/deployment/docker-compose.md, apps/web/lib/password-manager/client.test.mjs)
- Removed the operator-facing
PASSWORD_MANAGER_API_IMAGEoverride from.env.example, customer bundle staging, optional env handling, and release bundle docs. Legacy.enventries are now removed with a warning duringstart.shandupgrade.sh. - Stamped
password-manager-apiandpassword-manager-migratedirectly to the reviewed digest indeploy/password-manager-release.json, with startup guards that fail before Docker starts if compose drifts from the descriptor or reintroduces an env override. - Added release and bundle checks that assert the pinned image is
api/v0.1.2atghcr.io/carrtech-dev/ct-password-manager/api@sha256:55669d3af9bfc0ab80388ff3c69ac4f75db86d768adf9a35b33402f87feaa033, and that CT-Ops still sends the browser-envelope/user-keysetup payload.
Validation
python3 deploy/scripts/validate-password-manager-release.py deploy/password-manager-release.jsonbash deploy/scripts/test-password-manager-release-contract.shbash deploy/scripts/test-password-manager-compose.shbash deploy/scripts/test-password-manager-nginx.shbash deploy/scripts/test-password-manager-startup.shbash deploy/scripts/test-upgrade.shBETTER_AUTH_SECRET=build-time-placeholder POSTGRES_PASSWORD=build-time-placeholder PASSWORD_MANAGER_CT_OPS_ED25519_PRIVATE_KEY=build-time-placeholder docker compose -f docker-compose.single.yml confignode --experimental-strip-types --test apps/web/lib/password-manager/client.test.mjspnpm install --frozen-lockfilepnpm --dir apps/web type-checkgo test ./internal/userkey ./internal/httpapiin a temporaryct-password-managerworktree atcb714edc1e6d813655fc3dd352c7c0fd92f4699agit diff --check
Bundled Password Manager release evidence refresh (PROGRESS.md, apps/docs/docs/deployment/docker-compose.md, apps/web/lib/password-manager/client.ts, apps/web/lib/password-manager/client.test.mjs, apps/web/tests/e2e/runner.mjs, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Closed the hosted-client portability gap that surfaced during the task-16
verification pass: relative CT-Ops launch assertion paths now stay relative
instead of being rewritten to
http://localhost/..., and browser-side idempotency keys now come from runtime Web Crypto rather thannode:crypto. - Strengthened the hosted Password Manager regression coverage so the client contract test asserts the relative launch-path behavior, the E2E runner supplies the launch-signing key and instance ID required by the hosted flow, and the Playwright spec uses more stable selectors around unlock and member actions.
- Updated the public deployment guide so the bundled Password Manager services
are named explicitly in the single-host compose stack, the digest-pinned
PASSWORD_MANAGER_API_IMAGEflow is called out in the update path, and backup or restore guidance now includespassword_manager_db_data,.env, anddeploy/password-manager-release.json.
Validation
node --experimental-strip-types --test apps/web/lib/password-manager/client.test.mjspnpm install --frozen-lockfilepnpm --dir apps/web type-checkpnpm --dir apps/web exec playwright test --list tests/e2e/tooling/password-manager.spec.tspnpm --dir apps/web exec node tests/e2e/runner.mjs tests/e2e/tooling/password-manager.spec.tspython3 deploy/scripts/validate-password-manager-release.py deploy/password-manager-release.jsonbash deploy/scripts/test-password-manager-compose.shbash deploy/scripts/test-password-manager-nginx.shbash deploy/scripts/test-password-manager-startup.shbash deploy/scripts/test-support-data.shBETTER_AUTH_SECRET=build-time-placeholder POSTGRES_PASSWORD=build-time-placeholder PASSWORD_MANAGER_CT_OPS_ED25519_PRIVATE_KEY=build-time-placeholder docker compose -f docker-compose.single.yml configgit diff --check
Hosted Password Manager E2E leak assertions (apps/web/tests/e2e/fixtures/test.ts, apps/web/tests/e2e/fixtures/password-manager.ts, apps/web/tests/e2e/tooling/password-manager.spec.ts)
- Added a Password Manager Playwright mock fixture that intercepts the hosted
/password-manager-api/traffic, simulates session launch/setup/unlock, vault and entry CRUD, member sharing and revocation, key rotation, audit hooks, session expiry, and instance switches, and records outbound request bodies plus headers for leak checks. - Added a targeted hosted Password Manager E2E spec that covers launch, browser-only setup, relaunch + unlock, vault create/delete, entry create/reveal/copy/export/edit/delete, member add/remove, rotation prompts, session refresh expiry handling, and instance switching from the CT-Ops hosted route shell.
- Added network and browser-surface assertions proving the hosted flow does not send unlock passwords, plaintext entry values, or plaintext-shaped key fields to either the CT-Ops launch-assertion route or the proxied Password Manager API, keeps audit POST bodies empty, reuses credentialed session requests, and avoids leaking secrets through browser console output or page errors.
Validation
pnpm install --frozen-lockfilepnpm --dir apps/web exec playwright test --list tests/e2e/tooling/password-manager.spec.tspnpm --dir apps/web type-checkpnpm --dir apps/web lint 'tests/e2e/fixtures/test.ts' 'tests/e2e/fixtures/password-manager.ts' 'tests/e2e/tooling/password-manager.spec.ts'BETTER_AUTH_URL=https://ops.example.test BETTER_AUTH_SECRET=build-time-placeholder POSTGRES_PASSWORD=build-time-placeholder PASSWORD_MANAGER_CT_OPS_ED25519_PRIVATE_KEY=build-time-placeholder pnpm --dir apps/web buildpnpm --dir apps/web exec node tests/e2e/runner.mjs tests/e2e/tooling/password-manager.spec.ts(blocked locally:testcontainerscould not find a working container runtime on this machine)git diff --check
Hosted sharing and revocation workflow (apps/web/app/(dashboard)/password-manager/page.tsx, apps/web/app/(dashboard)/password-manager/password-manager-client.tsx, apps/web/lib/password-manager/browser-crypto.ts, apps/web/lib/password-manager/browser-crypto.test.mjs, apps/web/lib/password-manager/client.test.mjs)
- Extended the hosted
/password-managerUI with vault-member management: users can list members, add a member by wrapping the current vault key for a pasted Password Manager public-key envelope, update member roles without resending plaintext, and remove members with safe conflict messaging. - Added browser-only public-key-envelope import so the client can rehydrate a recipient RSA-OAEP key from portable JSON and wrap new vault keys without any CT-Ops-side secret handling.
- Added rotation retry state for membership revocation flows, including browser-only wrapping of the next vault key for all remaining members, idempotency-key reuse on retry, and immediate local purge when the current user loses access to the selected vault.
- Hardened the client-contract test path resolution so the pinned Password Manager OpenAPI artifact is still found from dedicated Git worktrees, which the repo instructions require for every file-changing task.
Validation
pnpm install --frozen-lockfilenode --experimental-strip-types --test apps/web/lib/password-manager/browser-crypto.test.mjs apps/web/lib/password-manager/client.test.mjs apps/web/lib/password-manager/workspace.test.mjspnpm --dir apps/web type-checkpnpm --dir apps/web lint 'app/(dashboard)/password-manager/password-manager-client.tsx' 'lib/password-manager/browser-crypto.ts' 'lib/password-manager/client.test.mjs'BETTER_AUTH_URL=https://ops.example.test BETTER_AUTH_SECRET=build-time-placeholder POSTGRES_PASSWORD=build-time-placeholder PASSWORD_MANAGER_CT_OPS_ED25519_PRIVATE_KEY=build-time-placeholder pnpm --dir apps/web buildgit diff --check
Reusable Password Manager product layer (apps/web/lib/password-manager/client.ts, apps/web/lib/password-manager/browser-crypto.ts, apps/web/lib/password-manager/client.test.mjs, apps/web/lib/password-manager/browser-crypto.test.mjs)
- Added a configurable Password Manager client abstraction that treats the API
base URL and CT-Ops launch path as configuration, covers launch, session
refresh/logout, setup, user-key, vault, entry, member, key-rotation, and
audit routes, and sends
credentials: includeon every Password Manager request. - Added request payload helpers that serialize the Password Manager API’s
expected JSON field names while rejecting plaintext-shaped fields such as
plaintext,password,secret,private_key,vault_key, andentry_keybefore anything is sent over the wire. - Added a browser-only crypto module for unlock-profile creation, PBKDF2-based key derivation, encrypted private-key envelopes, RSA-OAEP vault-key wrapping, AES-GCM entry payload encryption/decryption, and member public-key export.
- Added route-coverage tests against the pinned Password Manager OpenAPI contract so client route drift is caught locally when the API contract or client surface changes.
Validation
pnpm install --frozen-lockfilepnpm --dir apps/web type-checknode --experimental-strip-types --test apps/web/lib/password-manager/client.test.mjs apps/web/lib/password-manager/browser-crypto.test.mjsgit diff --check
Release workflow Password Manager digest scope (.github/workflows/agent-release.yml, .github/workflows/customer-bundle-check.yml, deploy/scripts/test-agent-release-password-manager-image-ref.sh)
- Fixed the post-merge release workflow so the
Verify bundled compose pins image digestsstep resolvesPASSWORD_MANAGER_API_IMAGE_REFinside the same shell step where it is consumed, instead of relying on a prior step-local export that disappears between GitHub Actions steps. - Added a dedicated regression script that fails if the release workflow ever
stops resolving
PASSWORD_MANAGER_API_IMAGE_REFbefore asserting the bundled compose image pins. - Wired that regression into
Customer bundle checkso future PR CI catches the variable-scope bug before another release run fails onset -u.
Validation
bash deploy/scripts/test-agent-release-password-manager-image-ref.shgit diff --checkpython3workflow whitespace sanity check for.github/workflows/agent-release.ymland.github/workflows/customer-bundle-check.yml
CT-Ops Password Manager launch route (apps/web/app/api/password-manager/launch-assertion/route.ts, apps/web/lib/password-manager/launch-assertion.ts, apps/web/lib/password-manager/launch-assertion.test.mjs)
- Added
POST /api/password-manager/launch-assertion, gated by trusted mutation origins, an active instance session, Tooling access, and a per-user launch rate limit. - Added a shared Password Manager launch-assertion helper that reads the CT-Ops
launch config from runtime env, parses the configured Ed25519 PKCS#8 private
key, and signs short-lived
EdDSAJWTs carrying the required Password Manager claims including issuer, audience, product, CT-Ops instance, instance, user identity,iat,exp, andjti. - Included optional instance display names in the assertion when CT-Ops can resolve them from the instances table, while keeping failure responses generic and never proxying Password Manager API traffic or session cookies.
Validation
node --experimental-strip-types --test lib/password-manager/launch-assertion.test.mjspnpm type-checkgit diff --check
Customer bundle Password Manager pinning (deploy/customer-bundle/upgrade.sh, deploy/customer-bundle/README.md, .github/workflows/customer-bundle-check.yml, .github/workflows/agent-release.yml, deploy/scripts/test-upgrade.sh)
- Stamped
PASSWORD_MANAGER_API_IMAGEinto staged customer bundles fromdeploy/password-manager-release.jsonso online and air-gapped installs both render against the reviewed Password Manager API digest instead of relying on a floating operator edit. - Extended
upgrade.shso existing installs refresh the release-pinned Password Manager API image reference alongsideWEB_IMAGEandINGEST_IMAGEwhile still preserving explicit custom overrides in.env. - Tightened bundle documentation to call out that Password Manager starts by
default, that its API image pin is refreshed during upgrades, and that
operators must include
password_manager_db_data,.env, andpassword-manager-release.jsonin backup/restore procedures.
Validation
bash deploy/scripts/test-upgrade.sh(still fails on the current base withtar: Write error, tracked in carrtech-dev/ct-ops#1033)python3 deploy/scripts/validate-password-manager-release.py deploy/password-manager-release.jsonbash -n deploy/customer-bundle/upgrade.shgit diff --check
First-run Password Manager config (start.sh, deploy/customer-bundle/start.sh, docker-compose.single.yml, .env.example, deploy/customer-bundle/generate_support_data, .github/workflows/customer-bundle-check.yml, deploy/scripts/test-password-manager-startup.sh, deploy/scripts/test-support-data.sh)
- Added first-run generation for
PASSWORD_MANAGER_DB_PASSWORD,CT_OPS_INSTANCE_ID, and a bundled Password Manager Ed25519 launch-signing keypair, writing the generated values back to.envwith mode0600. - Filled the Password Manager launch assertion config from CT-Ops defaults when
blank, including issuer, audience, product, trusted origins, and an
https-aware session-cookie secure flag. - Wired the generated Password Manager signing private key plus related launch
config into the
webcontainer environment so the later launch-assertion route can sign assertions without out-of-band secret copying. - Extended support-data and CI coverage so the new Password Manager bootstrap secrets are redacted and the customer-bundle workflow exercises the startup generation path directly.
Validation
bash deploy/scripts/test-password-manager-startup.shbash deploy/scripts/test-support-data.shbash deploy/scripts/test-start.shbash -n start.shbash -n deploy/customer-bundle/start.shBETTER_AUTH_SECRET=build-time-placeholder POSTGRES_PASSWORD=build-time-placeholder PASSWORD_MANAGER_CT_OPS_ED25519_PRIVATE_KEY=build-time-placeholder docker compose -f docker-compose.single.yml configgit diff --check
CT-Ops origin routing (deploy/nginx/nginx.conf, .github/workflows/customer-bundle-check.yml, deploy/scripts/test-password-manager-nginx.sh)
- Added a dedicated
ct_password_manager_apinginx upstream pointing at the bundledpassword-manager-apiservice on the internal compose network. - Added
/password-manager-api/reverse proxy routing that strips only the CT-Ops prefix, preserves Password Manager-owned route suffixes, and clearsUpgradeandConnectionheaders so normal API traffic cannot smuggle h2c upgrades to the backend. - Kept
/password-manageron the existing CT-Ops web route by leaving the broaderlocation /proxy in place and handling only the more-specific API prefix separately. - Added a dedicated nginx regression test and wired it into the customer-bundle CI workflow so upstream target, prefix-stripping semantics, and header clearing remain enforced.
Validation
bash deploy/scripts/test-password-manager-nginx.shbash deploy/scripts/test-password-manager-compose.shBETTER_AUTH_SECRET=build-time-placeholder POSTGRES_PASSWORD=build-time-placeholder docker compose -f docker-compose.single.yml configbash deploy/scripts/test-start.shgit diff --check
Default single-host deployment (docker-compose.single.yml, deploy/customer-bundle/start.sh, .env.example, .github/workflows/customer-bundle-check.yml, deploy/scripts/test-password-manager-compose.sh)
- Added bundled
password-manager-db,password-manager-migrate, andpassword-manager-apiservices to the single-host compose profile. - Kept Password Manager off public host ports, added a dedicated
password_manager_db_datavolume, and enforced migration ordering with database health plusservice_completed_successfullydependencies before the API and web/nginx stack continue. - Wired the customer start path to pull the Password Manager services by default and documented the temporary env fallbacks used until first-run installer generation lands for the dedicated Password Manager secrets and CT-Ops launch settings.
- Added a rendered-compose regression test in CI that fails if the service set, dependency ordering, dedicated volume, or no-host-port boundary drifts.
Validation
bash deploy/scripts/test-password-manager-compose.shpython3 deploy/scripts/validate-password-manager-release.py deploy/password-manager-release.jsonBETTER_AUTH_SECRET=build-time-placeholder POSTGRES_PASSWORD=build-time-placeholder docker compose -f docker-compose.single.yml configbash -n deploy/customer-bundle/start.shbash deploy/scripts/test-start.shgit diff --check
Historical note
- The embedded CT-Ops Password Vault direction was superseded before customer release. Password Manager is now planned as a standalone service/API, with CT-Ops as the first UI/client.
- Embedded Password Vault code, routes, UI, schema, migrations, public docs, and active implementation-plan references were removed so future work does not continue the in-app implementation.
Validation
- Validation run: targeted residue searches plus the web validation commands listed in the cleanup PR.
Plugin licensing contract (docs/ct-ops-plugin-entitlement-storage.md, docs/ct-ops-licensing-and-ct-cve-product-decision.md, docs/ct-ops-ct-cve-api-contract.md, docs/ct-cve-migration-plan.md)
- Added the shared CT Ops plugin entitlement storage design covering normalized entitlement records, CT Portal request-token and licence binding, encrypted licence artifact storage, derived subscription-status responses, audit requirements, and safe degradation rules for air-gapped installs.
- Defined the stricter hidden-plugin visibility rule for any future external plugin that needs it: CT Ops must keep the product hidden unless a valid entitlement exists, including on nav, settings, search, and launch paths.
- Linked the new design from the product decision record, the CT-CVE API contract, and the migration plan so later implementation work uses one CT Ops-owned contract.
Validation
- Validation run:
git diff --checkand a targeted Markdown sanity check for balanced fenced code blocks and tab characters.
Plugin identity groundwork (docs/ct-ops-plugin-identity-broker.md, docs/ct-ops-licensing-and-ct-cve-product-decision.md, docs/ct-ops-ct-cve-api-contract.md, docs/ct-cve-migration-plan.md)
- Added the shared CT Ops plugin identity broker design covering installation identity, plugin-instance registration, trust exchange, short-lived signed launch assertions, redirect/iframe/proxy launch modes, plugin-local sessions, session-status revocation checks, and backend authorization responsibilities.
- Linked the new broker document from the CT-CVE product decision record and API contract so future CT-CVE and other external plugin implementation work shares one source of truth.
- Marked the CT-CVE migration plan's plugin-identity-broker phase complete and narrowed the remaining CT-CVE GUI blocker to plugin entitlement and follow-on integration work.
Validation
- Validation run:
git diff --checkand a targeted Markdown sanity check for balanced fenced code blocks and tab characters.
Release key source of truth (deploy/scripts/fetch-licence-public-key.sh, .github/workflows/agent-release.yml, .github/workflows/customer-bundle-check.yml)
- Added a release helper that fetches and validates
ct-ops/current.pemfromcarrtech-dev/licence-public-keysbefore packaging a customer bundle. - Updated release and bundle-check workflows to pull the current licence verifier key from the dedicated public-key repository instead of treating the checked-in bundle key as the source of truth.
- Updated customer docs to describe the public-key repository distribution model while keeping installed keys mounted from
./licence-keys/current.pem. - Extended the web image release path so the same fetched verifier key is baked into the CT-Ops web image, with runtime validation falling back to that bundled key if the customer-bundle mount is missing. Air-gapped installs must upgrade CT-Ops before activating licences signed after a CarrTech key rotation.
Licence verifier continuity (apps/web/lib/licence.ts, apps/web/lib/actions/settings.ts, apps/web/lib/actions/licence-guard.ts, apps/web/lib/seat-admission.ts)
- Added persisted licence verifier public key storage on the instance row so each saved licence continues to validate against the public key that originally verified it.
- Added
LICENCE_PUBLIC_KEY_PATHsupport and mounted./licence-keys/current.peminto the customer bundle for newly activated licences, with the baked production key retained as a fallback. - Added a migration for
licence_verifier_public_keyandlicence_verifier_public_key_fingerprint.
Customer bundle backup and key refresh (deploy/customer-bundle/backup.sh, deploy/customer-bundle/refresh_licence_key, deploy/customer-bundle/upgrade.sh)
- Split upgrade backups into a standalone
backup.shthat operators can run manually or schedule. - Updated
upgrade.shto callbackup.shbefore replacing release files and to carry forwardlicence-keys/current.pem. - Added
refresh_licence_keyso connected installs can download the latest CarrTech licence verifier public key without a full CT-Ops upgrade.
Validation
- Validation run:
node --experimental-strip-types --test apps/web/lib/licence.test.mjs,pnpm --dir apps/web type-check,node scripts/validate-migrations.js, andbash deploy/scripts/test-upgrade.sh.
Install-bound seat enforcement (apps/web/lib/seat-admission.ts, apps/web/lib/seat-selection.ts, apps/web/lib/auth/session.ts)
- Removed Pro as a valid CT-Ops tier; Community now carries core CT-Ops access and Enterprise remains the only feature add-on tier.
- Missing, expired, invalid, revoked, or non-CT-Ops licences now degrade to Community with 3 included user seats.
- Added trusted session admission checks so over-capacity users outside the selected seats are blocked from login/API use without being deleted or deactivated.
- Added deterministic included-seat selection with admin-pinned users, preserving an active super admin where possible, then pinned/admin users, then oldest active users.
- Added licence settings controls so admins can pin up to 3 included-seat users for expiry fallback.
- Updated licensing docs and the CT-Ops/CT-CVE product decision record for the 3 included seats, paid extra seats, Enterprise add-on, and expiry behavior.
Validation
- Validation run:
pnpm --dir apps/web type-check,pnpm --dir apps/web test:unit,pnpm --dir apps/web db:validate, targeted ESLint for changed web files, andpnpm --dir apps/web test:e2e tests/e2e/team/invitations.spec.ts.
Release bundle upgrades (deploy/customer-bundle/upgrade.sh, deploy/customer-bundle/README.md)
- Added a bundled
upgrade.shthat backs up the current install, downloads or accepts a local release zip, stops the Compose stack without deleting named volumes, installs the new release files in place, preserves.envand TLS material, and restarts throughstart.sh. - Added
--version,--from-zip, and--no-startoptions so online, pinned-version, and air-gapped upgrades use the same safety flow. - Updated customer bundle and getting-started docs to describe release-pinned images and the upgrade path instead of relying on
:latest.
Validation
- Added
deploy/scripts/test-upgrade.shfor the local bundle path, backup creation, stale offline image archive removal, and volume-preserving Compose shutdown. - Customer bundle CI now packages and syntax-checks
upgrade.sh; local validation ran installer, upgrade, support-data, staged-bundle syntax, and Compose image rendering checks.
CT-CVE migration audit (docs/ct-cve-final-residue-audit.md, docs/ct-cve-migration-plan.md)
- Added the final CT Ops residue audit for the CT-CVE migration, including the required CVE/feature/licence searches and targeted checks for source-state, affected-package, NVD/CISA, feed, and matcher residue.
- Classified remaining matches as intended connector, imported-finding storage, reporting/display, historical migration/changelog, or seat-licensing code.
- Confirmed no moved CT-CVE feed sync, package matching, source configuration, NVD key storage, source-management UI, or source-only E2E coverage remains in live CT Ops code.
Validation
- Documentation-only change validated by rerunning the recorded
rg/findaudit commands and reviewing the remaining match classes.
CT-CVE ownership cleanup (apps/web/lib/actions/vulnerabilities.ts, apps/web/lib/db/schema/vulnerabilities.ts)
- Removed the remaining CT Ops report UI/API dependency on vulnerability source sync status.
- Switched host vulnerability assessment freshness from local feed sync state to the CT-CVE finding-import timestamp stored by the connector.
- Added a migration that drops obsolete CT Ops source-state and affected-package match-rule storage while retaining imported CVE/finding storage and reports.
Validation
- Validation run: web migration generation/validation, full web unit suite, web type-check, targeted lint for changed web files, and targeted E2E coverage for host and report vulnerability displays.
CT-CVE ownership cleanup (apps/web/app/(dashboard)/settings/vulnerabilities/, apps/web/lib/actions/vulnerabilities.ts)
- Removed the legacy admin Settings -> Vulnerabilities page that monitored external vulnerability feed/API sources and managed the CT Ops-stored NVD API key.
- Removed the sidebar entry, NVD key server actions, and E2E coverage tied to that CT Ops-owned source-management surface.
- CT Ops still keeps imported vulnerability finding display in host details and vulnerability reports; CT-CVE owns feed/source configuration and status.
Validation
- Validation run:
pnpm install --frozen-lockfile,pnpm --filter web type-check,pnpm --filter web db:validate, targetedpnpm --filter web lint -- ..., andpnpm --filter web test:unit.
CT-CVE connector setup/status surface (apps/web/app/(dashboard)/settings/integrations/ct-cve/page.tsx, apps/web/lib/integrations/ct-cve/setup-status.ts)
- Added an admin-only Settings -> Integrations -> CT-CVE page showing connector configured state, inbound signed-token scopes/counts, outbound inventory targets, recent health/finding/inventory timestamps, and connector errors without exposing token ids or secrets.
- Added a sanitised setup overview helper that summarises
CT_CVE_SERVICE_TOKENS,CT_CVE_INVENTORY_PUSH_TARGETS, and durable connection status for the active instance. - Added the CT-CVE tab alongside LDAP and SMTP integration settings.
Validation
- Validation run: CT-CVE integration unit tests, full web unit suite, targeted ESLint for the new connector UI/helper files,
pnpm --filter web type-check, andpnpm --filter web db:validate.
CT-CVE outbound inventory push job (apps/web/lib/integrations/ct-cve/inventory-push-job.ts, apps/web/scripts/push-ct-cve-inventory.mjs)
- Added env-configured CT-CVE inventory push targets for outbound
inventory:writetokens. - Added a reusable push job that builds paged full inventory snapshots, signs and pushes every page to CT-CVE, accumulates accepted row counts, and reports per-target failures without stopping later targets.
- Added the
ct-cve:push-inventoryweb package script so operators can run the push job from cron, systemd timers, or Kubernetes CronJobs.
Validation
- Validation run: targeted inventory push job unit test, CT-CVE integration unit tests, targeted ESLint,
pnpm --dir apps/web type-check,pnpm --dir apps/web db:validate,pnpm --dir apps/web test:unit, and a no-targetpnpm --dir apps/web ct-cve:push-inventorysmoke test.
Stale session redirect handling (apps/web/lib/auth/redirects.ts, apps/web/lib/auth/session.ts, apps/web/app/(auth)/login/page.tsx)
- Fixed a stale/invalid auth-state redirect loop where a protected route could send the browser to
/loginwhile the auth page immediately sent session-looking state back to/dashboard. - Protected-page session rejection now redirects to
/login?session=expired, and auth pages bypass their authenticated redirect on that explicit expired-session path. - Shared auth-page redirect decisions now require an active, non-deleted user before redirecting away from login/register.
Validation
- Validation run:
pnpm install --frozen-lockfile, targeted auth redirect unit test,npm run type-check, targeted ESLint for changed auth files, andnpm run test:unitfromapps/web.
CT-CVE connector status persistence (apps/web/lib/integrations/ct-cve/connection-status.ts, apps/web/app/api/integrations/ct-cve/v1/connection-health/route.ts)
- Added durable, instance-scoped CT-CVE connector status persisted through
system_config, including last inventory push, finding ingest, health check, and connector error timestamps. - Updated signed connection health to return and refresh the stored status instead of returning process-local placeholder timestamps.
- Updated CT-CVE finding ingestion and inventory snapshot pushes to maintain the durable status and clear stale connector errors after successful data flow.
Validation
- Validation run:
pnpm install --frozen-lockfile, targeted CT-CVE integration unit tests,npm run type-check,npm run db:validate, andnpm run test:unitfromapps/web.
CT-CVE outbound inventory snapshots (apps/web/lib/integrations/ct-cve/inventory-export.ts)
- Added CT Ops inventory snapshot construction for CT-CVE, scoped by instance and limited to active hosts plus current software package rows.
- Included contract metadata, stable package fingerprints, Linux distro/package manager metadata, bounded host/package page sizes, and opaque cursors for follow-up inventory pages.
- Added a signed outbound push helper for
POST /api/v1/ct-ops/inventory-snapshotsusing the CT-CVEinventory:writeservice-token contract. - Corrected CT Ops connection health to require
connection:readrather thanfindings:write.
Validation
- Validation run:
pnpm install --frozen-lockfile,pnpm --dir apps/web run type-check,pnpm --dir apps/web run test:unit, and targeted ESLint for the connector files.
CT-CVE connector foundation (apps/web/lib/integrations/ct-cve/service-token.ts, apps/web/app/api/integrations/ct-cve/v1/connection-health/route.ts)
- Added signed CT-CVE service-token verification for inbound CT Ops connector requests, including
Authorization: CT-ServiceToken, body SHA-256 checks, HMAC signatures, timestamp skew enforcement, nonce replay protection, token revocation, instance binding, and scope checks. - Added the first signed CT Ops connector endpoint,
GET /api/integrations/ct-cve/v1/connection-health?instanceId=..., with per-token rate limiting and contract-shaped status output. - Documented the
CT_CVE_SERVICE_TOKENSconfiguration shape for early connector deployments.
Validation
- Added focused unit coverage for valid signatures, stale timestamps, content hash mismatches, invalid signatures, replayed nonces, scope checks, and instance binding.
- Validation run:
node --experimental-strip-types --test lib/integrations/ct-cve/service-token.test.mjs, targeted ESLint for the new connector files,pnpm --dir apps/web type-check,pnpm --dir apps/web db:validate, andpnpm --dir apps/web test:unit.
Seat-based CT-Ops licensing migration (apps/web/app/(dashboard)/settings/licence/page.tsx, apps/web/app/(dashboard)/settings/settings-client.tsx, apps/docs/docs/licensing.md)
- Updated the licence settings page to show the trusted effective tier, active users, pending invitations, seat limit, licence expiry, and Enterprise capability status.
- Rewrote licensing docs around Community core features, Pro user-seat capacity, Enterprise-only capabilities, offline licence validation, activation, expiry, and revocation.
- Updated air-gap and docs introduction copy so paid licences are described as seat and Enterprise entitlements rather than feature-unlock licences.
Validation
- Validation run:
pnpm --dir apps/web type-check.
Seat-based CT-Ops licensing migration (apps/web/lib/features.ts, apps/web/lib/actions/, apps/web/app/(dashboard)/, apps/web/components/shared/sidebar.tsx)
- Made former Pro-gated core CT-Ops capabilities available to the Community tier while preserving Enterprise-only feature checks.
- Removed Pro-only page locks, sidebar badges, backend
requireFeature()checks, and metric-retention clamps from core reports, certificates, service accounts, SSH key inventory, and related report actions. - Removed stale locked-feature UI components and stopped E2E tests from issuing Pro licences only to access now-open core pages.
Validation
- Added unit coverage proving Community includes core features and Pro does not grant Enterprise-only capabilities.
- Validation run:
node --experimental-strip-types --test lib/features.test.mjs,pnpm --dir apps/web type-check,pnpm --dir apps/web test:unit, targeted E2E coverage for reports/certificates/service accounts/local users/certificate tracking, and a clean rerun of the initially flaky software report plus service account specs.
Seat enforcement (apps/web/lib/actions/seat-enforcement.ts, apps/web/lib/licence-seats.ts)
- Added shared seat usage calculation for active non-deleted users plus pending unexpired invitations.
- Enforced
maxUserson admin invites, removed-user restoration, invite acceptance, user reactivation, and LDAP auto-provisioning. - Community seat rules are now finalized: missing
maxUsersfalls back to the 3 included seats.
Validation
- Added unit coverage for seat usage calculations and E2E coverage for invite creation and invite acceptance at the seat limit.
- Validation run:
npm run type-check,npm run test:unit, andnpm run test:e2e -- tests/e2e/team/invitations.spec.ts tests/e2e/auth/invite-acceptance.spec.tsfromapps/web.
Licence entitlement model (apps/web/lib/licence.ts, apps/web/lib/actions/licence-guard.ts)
- Added validated
maxUserssupport to signed CT-Ops licence payloads and the effective licence object used by trusted server-side guard paths. - Tightened capacity claim parsing so
maxUsersand legacymaxHostsare accepted only when they are positive safe integers. - Kept
maxHostsas a legacy optional host cap for compatibility while the migration moves commercial CT-Ops licensing toward user-seat capacity.
Validation
- Added licence validation coverage for
maxUsers, invalid capacity claims, and expired paid licences. - Validation run:
node --experimental-strip-types --test lib/licence.test.mjsfromapps/web.
RPM match accuracy (apps/ingest/internal/vuln/)
- Tightened RPM source-package matching so upstream-only
source_versionvalues from older inventory scans no longer cause false positives against full vendor fixed EVRs. - The matcher now prefers the installed package EVR when it contains release/epoch detail missing from the source version, while preserving source-version matching for non-RPM package managers.
- Added coverage for implicit zero epochs, fixed packages with equal EVR, newer Alma downstream releases, and source-package matches where
source_versionis upstream-only.
Validation
- Validation run:
go test ./internal/vulnandgo test ./...fromapps/ingest.
Confirmed RPM advisory data (apps/ingest/internal/vuln/)
- Added Red Hat CSAF advisory summary ingestion so
released_packagesare normalised intovulnerability_affected_packagesfixed-version rows underredhat-security-data. - Derived RHEL major versions from RPM EVR release strings such as
1:3.5.1-7.el9_7, allowing Alma/Rocky/RHEL-compatible hosts to match advisory rows for RHEL 9. - Preserved CVE/advisory metadata on generated fixed rows so confirmed findings carry advisory context.
RPM inventory accuracy (agent/internal/tasks/)
- Updated RPM software inventory to retain full installed EVR in
source_versionwhen the source RPM does not provide an epoch, preventing source-package comparisons from falling back to upstream-only versions.
Validation
- Added unit coverage for Red Hat CSAF released-package parsing, CSAF sync URL windowing, and RPM source-version EVR preservation.
- Validation run:
go test ./internal/vuln,go test ./...fromapps/ingest,go test ./internal/tasks, andgo test ./...fromagent.
Additive role model (apps/web/lib/auth/, apps/web/lib/actions/, apps/web/lib/db/schema/, apps/web/lib/db/migrations/0053_bright_black_cat.sql)
- Added persisted
rolesarrays to users and invitations while keeping the legacy singlerolecolumn as a derived compatibility value so existing role-gated flows keep working during the transition. - Normalised session/auth loading and guard checks to treat permissions as the union of assigned roles, with explicit precedence preserved for the legacy
rolefield andpendingusers continuing to carry no assigned roles. - Updated instance creation, invitation acceptance, invitation restore, and role-update flows to write the new additive role shape and to keep last-super-admin protections working when
super_adminmembership is removed, deactivated, or deleted.
Team management UI (apps/web/app/(dashboard)/team/, apps/web/tests/e2e/team/)
- Reworked the People page to display all assigned roles as badges, support multi-role assignment from the member role menu, and support multi-role invitations from the invite dialog.
- Added database migration backfill for existing user and invitation roles and extended coverage for additive-role guard behavior plus database-backed team invitation/member lifecycle flows.
Validation
- Validation run:
node --experimental-strip-types --test lib/auth/guards.test.mjs,node --experimental-strip-types --test lib/auth/tooling.test.mjs,pnpm --dir apps/web type-check,pnpm --dir apps/web db:validate,pnpm --dir apps/web test:e2e tests/e2e/auth/invite-acceptance.spec.ts, andpnpm --dir apps/web test:e2e tests/e2e/team/members.spec.ts tests/e2e/team/invitations.spec.ts.
Confirmed finding model and reports (apps/web/lib/db/schema/vulnerabilities.ts, apps/web/lib/actions/vulnerabilities.ts, apps/web/app/(dashboard)/reports/vulnerabilities/)
- Added explicit vulnerability finding confidence and match-reason persistence so confirmed Linux package matches are distinguishable from probable/future best-effort matches.
- Defaulted host and global vulnerability reports to confirmed findings while adding a report confidence filter for operators who want to inspect probable matches.
- Added a migration for
host_vulnerability_findings.confidence,match_reason, and supporting instance/status/confidence lookup. - Added a host Overview vulnerability assessment card showing affected/clear/stale/not-assessed status, confirmed finding counts, critical/high split, last inventory scan time, last vulnerability feed sync time, and a direct link to host findings.
Red Hat and matcher accuracy (apps/ingest/internal/vuln/)
- Hardened Red Hat parsing to prefer structured
affected_releasepackage data, extracting full RPM EVR from package NEVRA and preserving advisory/product metadata. - Marked free-text Red Hat
affected_packagesfallback rows as probable rather than confirmed. - Updated RPM matching to report vendor EVR-specific reasons and handle RHEL major-version advisory rules matching minor-version hosts such as
8.9. - Added RHEL-compatible distro matching for RPM hosts, so AlmaLinux/Rocky/CentOS/Oracle-style inventory with
ID_LIKE=rhelcan match structured Red Hat advisory rows by full EVR. - Replaced direct unbounded post-scan matching goroutines with a bounded ingest-side vulnerability match scheduler used by inventory completion and feed sync.
Validation
- Added unit coverage for RPM backport release matching, RHEL-compatible distro matching, fixed RPM packages resolving, Red Hat structured affected-release parsing, confirmed finding persistence, unsupported package sources, and host assessment status derivation.
- Validation run:
go test ./internal/vuln,go test ./...fromapps/ingest,node --experimental-strip-types --test lib/vulnerabilities/assessment.test.mjs,pnpm --dir apps/web run db:validate,pnpm --dir apps/web run type-check, andpnpm --dir apps/web run lint(existing warnings only).
Build Docs editing and exports (apps/web/app/(dashboard)/build-docs/[id]/, apps/web/components/build-docs/, apps/web/lib/build-docs/)
- Replaced section body textareas with a larger reusable rich Markdown editor powered by MDXEditor, including document-style spacing, Markdown shortcuts, source mode, headings, emphasis, links, lists, blockquotes, code blocks, tables, undo/redo, and dark-mode styling aligned with the dashboard.
- Added per-section full-screen editing with live draft preservation between inline and full-screen modes, section title editing, Save/Cancel controls, and read-only behavior that disables editing controls.
- Reworked PDF and DOCX exports to parse Markdown into supported document structures instead of flattening section bodies as plain text. Headings are demoted beneath Build Docs section titles, and bold/italic, inline code, lists, blockquotes, fenced code blocks, and tables now render in exports.
Validation
- Added Markdown export unit coverage for heading demotion, inline formatting, lists, blockquotes, fenced code blocks, tables, and DOCX formatting output.
- Extended Build Docs E2E coverage for rich editor source editing, full-screen editing, save/reload behavior, Markdown preview rendering, image upload, search, and export links.
- Validation run:
pnpm --filter web test:unit,pnpm --filter web type-check,pnpm --filter web lint(existing warnings only),pnpm --filter web db:validate,pnpm --filter web test:e2e tests/e2e/build-docs/build-docs.spec.ts, andBETTER_AUTH_URL=http://localhost:3000 BETTER_AUTH_SECRET=... DATABASE_URL=postgres://ctops:ctops@127.0.0.1:5432/ctops pnpm --filter web build.
Vulnerability operations UI (apps/web/app/(dashboard)/settings/vulnerabilities/)
- Added an in-page CVE detail modal to Administration → Vulnerabilities so admins can open a catalog vulnerability without leaving the page or losing current catalog filters.
- The modal shows the full CVE description, title, severity, CVSS score, source, known-exploited/rejected status, affected rule count, open finding count, and published/modified timestamps.
Validation
- Extended database-backed E2E coverage for
/settings/vulnerabilitiesto open a filtered CVE detail modal, assert key details, close it, and verify the filter remains applied. - Validation run:
pnpm --filter web type-check, targetedpnpm --filter web lint -- ..., andpnpm --filter web test:e2e tests/e2e/settings/vulnerabilities.spec.ts. - This completes the requested modal-based full vulnerability detail view while preserving user filters on the admin page.
System health operations view (apps/web/app/(dashboard)/settings/system/, apps/web/app/api/system/health/route.ts)
- Expanded the admin-only Administration → System page with ingest server status, live processing counts, last-hour received message totals, queue depth, memory usage, goroutines, and DB connection usage.
- Added a central Agent Errors table combining recent certificate signing, agent query, software inventory, and task-run failures by host/agent so admins can inspect operational agent failures in one place.
- Added an agent upgrade summary showing the required agent version and the count of active/offline agents that have not upgraded.
Ingest telemetry persistence (apps/ingest/internal/monitoring/, apps/web/lib/db/schema/ingest.ts)
- Added
ingest_server_snapshotsand ingest-side periodic reporting so each ingest process writes process identity, active request count, received message total, queue stats, Go runtime memory/goroutines, and pgx pool usage. - Added gRPC interceptors to count unary and stream messages and active requests without changing handler behavior.
Validation
- Added unit coverage for ingest and agent health summary calculations.
- Added database-backed E2E coverage for the System page with seeded ingest snapshots and agent errors.
- Validation run:
pnpm --filter web exec node --experimental-strip-types --test lib/system/health.test.mjs,pnpm --filter web type-check,pnpm --filter web lint(existing warnings only),pnpm --filter web db:validate,go test ./apps/ingest/internal/..., andpnpm --filter web test:e2e tests/e2e/settings/system-health.spec.ts. - This satisfies the requested admin visibility for ingest server status, historical message counts, resource usage, central agent errors, and not-upgraded agent counts.
Vulnerability operations UI (apps/web/app/(dashboard)/settings/vulnerabilities/, apps/web/lib/actions/vulnerabilities.ts)
- Added an admin-only Administration → Vulnerabilities page that shows vulnerability feed/API connection status, upstream API URLs, last attempt/success times, pulled record counts, and recent connection errors without exposing secrets.
- Added a live CVE catalog view backed by
vulnerability_cves, including pulled CVE counts, severity/KEV summaries, affected-package rule counts, source filtering, and CVE/title search so users can confirm API data is present independently of host findings. - Added a bounded, rate-limited server action for the management snapshot, with instance admin authorization and instance-scoped open finding counts.
- Added sync policy visibility plus expected feed rows for NVD, CISA KEV, Debian, Ubuntu OSV, Alpine SecDB, and Red Hat, so the page shows the APIs CT-Ops is supposed to contact even before the first sync attempt. Ingest now stores the attempted upstream URL in source metadata for accurate display after environment overrides.
- Added an admin NVD API key control to Administration → Vulnerabilities. The key is stored encrypted in
system_configasvulnerability_nvd_api_key; ingest uses it whenNVD_API_KEYis not set, while keeping the environment variable as the deployment-level override.
Validation
- Added database-backed E2E coverage for seeded API source states, pulled CVEs, and NVD API key save/clear behavior at
/settings/vulnerabilities. - Validation run:
pnpm --filter web type-check, targetedpnpm --filter web lint -- ...,pnpm --filter web db:validate,pnpm --filter web test:e2e tests/e2e/settings/vulnerabilities.spec.ts, andgo test ./apps/ingest/internal/vuln ./apps/ingest/internal/crypto ./apps/ingest/internal/config. - This completes the requested visibility for CVEs being pulled from vulnerability APIs and the status of vulnerability API connections. Manual sync triggering remains outside this interface.
Vulnerability intelligence and matching (agent/internal/tasks/, apps/ingest/internal/vuln/, proto/agent/v1/ingest.proto)
- Added Linux software inventory metadata for distro identity, codename, source package, epoch/release, repository, and origin so CVE matching can use vendor package advisory truth instead of fuzzy package-name or CPE matching.
- Added ingest-side vulnerability feed sync support for CISA KEV, NVD CVE enrichment, Debian Security Tracker, Ubuntu OSV, Alpine SecDB, and Red Hat security data, with source sync state, ETag/hash handling, retry/backoff, and configurable cadence.
- Added distro-aware version comparison plus asynchronous host matching after feed syncs and software inventory ingestion; unsupported package sources are left unassessed rather than treated as safe.
Persistence and reporting (apps/web/lib/db/schema/, apps/web/lib/actions/vulnerabilities.ts, apps/web/app/(dashboard)/reports/vulnerabilities/)
- Added normalized CVE catalog, vulnerability source state, affected package ranges, and per-host finding tables, scoped by instance for host findings.
- Added the Pro-gated Reports → Vulnerabilities page with filters for CVE, package, severity, KEV, fix availability, host group, distro, and package source.
- Added an Inventory → Vulnerabilities host detail tab showing CVE findings for a selected host.
Validation
- Added parser, version comparator, match-rule, agent metadata extraction, and database-backed E2E coverage for report and host vulnerability display.
- V1 scope is Linux OS packages from
dpkg,rpm, andapk; Windows, macOS apps, Homebrew, Snap, Flatpak, Pacman/Arch, and third-party application registry matching remain out of scope and unassessed.
Build Docs product area (apps/web/app/(dashboard)/build-docs/, apps/web/lib/actions/build-docs.ts, apps/web/lib/db/schema/build-docs.ts)
- Added a new Build Docs workspace for instance-level build document templates, reusable snippets, structured reorderable sections, screenshot/image assets, browser preview, and PDF/DOCX exports.
- Added immutable template versions, snippet provenance on inserted sections, document/section revision snapshots, Postgres full-text search vectors, and instance-scoped RLS-backed database tables.
- Added filesystem-backed image storage by default plus S3-compatible storage settings and storage adapter support for object storage deployments.
- Replaced the placeholder Runbooks route with a redirect to Build Docs and added Build Docs to navigation and the command palette.
Validation
- Added unit coverage for template required/optional field validation, deterministic section ordering, snippet snapshots, asset validation, and render-model table-of-contents/image attachment.
- Added database-backed E2E coverage for admin template/snippet creation, build doc creation, section editing, image upload, preview, search, and export links.
- Validation run:
pnpm --filter web test:unit,pnpm --filter web type-check,pnpm --filter web lint,pnpm --filter web db:validate,pnpm --filter web test:e2e tests/e2e/build-docs/build-docs.spec.ts, andBETTER_AUTH_URL=http://localhost:3000 BETTER_AUTH_SECRET=... DATABASE_URL=postgres://ctops:ctops@127.0.0.1:5432/ctops pnpm --filter web build.
Support diagnostics tooling (deploy/customer-bundle/generate_support_data, .github/workflows/, apps/docs/docs/)
- Added a customer-bundle
generate_support_dataexecutable that writes a timestampedct-ops-support-data-*.tar.gzbesidedocker-compose.yml. - The archive includes sanitized
.env/compose data, Docker status, recent compose logs, host information, file metadata, and TLS certificate fingerprints while excluding raw.envfiles, private keys, and database dumps. - Release and customer-bundle checks now stage and syntax-check the executable so it ships with the install bundle at the same level as
docker-compose.yml.
Validation
- Added
deploy/scripts/test-support-data.shto verify support bundle generation and redaction of env secrets, compose-rendered secrets, and log bearer/password values.
Patch status check (agent/internal/checks/patch_status.go, apps/web/lib/actions/checks.ts, apps/web/app/(dashboard)/hosts/[id]/checks-tab.tsx)
- Added a
patch_statuscheck type with configurablemax_age_daysand cappedmax_packages. - Linux agents determine last patch age and list available package updates for apt, dnf/yum, zypper, pacman, and apk using read-only package-manager commands.
- Windows agents report patch age from hotfix data and mark package update listing unsupported; macOS reports patch age from
softwareupdate --historywhere available and marks update listing unsupported. - Check pass/fail is based only on patch age; available package updates are reported but do not directly fail the check.
Persistence and reporting (apps/web/lib/db/schema/patch-status.ts, apps/ingest/internal/handlers/patch_status.go, apps/web/lib/actions/patch-status.ts)
- Added
host_patch_statusesfor latest per-host/check patch health andhost_package_updatesfor current/resolved available package updates. - Ingest now persists structured
patch_statuscheck output and refreshes current available update rows on every supported result. - Added
/reports/patch-statusmanagement report with estate summary, network patch status, and host-level patch status including network memberships.
Host detail drill-down (apps/web/app/(dashboard)/hosts/[id]/host-detail-client.tsx, apps/web/app/(dashboard)/hosts/[id]/patch-status-tab.tsx)
- Added an Infrastructure → Patch Status sub-tab showing the selected host's patch age, policy threshold, last checked time, package manager, warnings/errors, and available package updates where supported.
Validation
- Added agent parser/evaluator unit coverage for apt, rpm, Windows hotfix JSON, and patch-age-only pass/fail behavior.
- Added database-backed E2E coverage for the host Infrastructure → Patch Status tab with seeded patch status and package updates.
- Validation run:
go test ./agent/internal/checks ./apps/ingest/internal/...,pnpm --filter web type-check,pnpm --filter web lint -- ...,pnpm --filter web db:validate, andpnpm --filter web test:e2e -- tests/e2e/hosts/patch-status.spec.ts.
Self-service password reset (apps/web/app/(auth)/forgot-password/, apps/web/app/reset-password/, apps/web/lib/auth/)
- Added a local-account password reset entry point from the email/password login form so users who forgot their password are no longer stranded at sign-in.
- Wired Better Auth's reset capability into CT-Ops by configuring
sendResetPassword, revoking existing sessions on reset, and sending captured/SMTP password-reset emails through the shared auth email helper. - Added a reset request page plus a token-based reset form that validates new passwords, safely constrains post-reset redirects to in-app paths, and returns the user to
/login?reset=1with a success notice after the password changes.
Validation
- Added targeted E2E coverage for requesting a reset from the login page, consuming the reset email, setting a new password, and signing in with the updated credential.
- Validation run:
pnpm --dir apps/web test:e2e tests/e2e/auth/login.spec.tsandpnpm --dir apps/web type-check.
Shared authz guard layer (apps/web/lib/auth/guards.ts, apps/web/lib/actions/, apps/web/lib/eslint/, SECURITY.md)
- Added shared web authz helpers for active-user, same-instance, admin-role, and writable-role checks so role and instance enforcement lives in one place instead of being reimplemented ad hoc across server actions.
- Migrated the remaining raw
session.userauthorisation checks in instance settings, networks, notes, certificates, terminal, task schedules, software inventory, notifications, users, and security actions onto the shared helpers or the existingrequireInstanceAccess/requireInstanceAdminAccesswrappers. - Added a local ESLint rule that rejects direct
session.user.instanceId/session.user.rolecomparisons and role-listincludes()checks in web auth/action code, and marked security findingI-10complete.
Validation
- Added focused unit coverage for the new guards and the new ESLint rule.
- Validation run:
pnpm --filter web exec node --experimental-strip-types --test lib/actions/action-auth.test.mjs lib/auth/guards.test.mjs lib/eslint/no-single-table-select.test.mjs lib/eslint/no-raw-session-checks.test.mjs,pnpm --filter web exec eslint lib/actions/users.ts lib/actions/security.ts lib/actions/notification-settings.ts lib/actions/settings.ts lib/actions/networks.ts lib/actions/software-inventory.ts lib/actions/terminal.ts lib/actions/task-schedules.ts lib/actions/notes.ts lib/actions/notes-resolver.ts lib/actions/certificates.ts lib/actions/action-auth-core.ts lib/auth/guards.ts lib/eslint/no-raw-session-checks.mjs lib/eslint/no-raw-session-checks.test.mjs, andpnpm --filter web type-check.
Administration layout (apps/web/components/shared/sidebar.tsx, apps/web/app/(dashboard)/settings/)
- Replaced the old Administration sidebar grouping of Team plus nested Settings with high-level areas: People, Instance, Agents, Monitoring, Integrations, Security, and System.
- Split the settings UI into focused pages with tabs: Instance profile/licence, Agents enrolment/defaults/tag rules/software inventory, Monitoring alert defaults/notification policy/metric retention, Integrations LDAP/SMTP, and Security mTLS/terminal access.
- Kept compatibility redirects for the old Alert Defaults, LDAP, and Tag Rules settings URLs, and updated in-app links plus onboarding/agent docs to point at the new locations.
Validation
- Validation run:
pnpm --filter web type-checkandpnpm --filter web lint(lint passed with the repository's existing warnings).
SMTP relay testing UX (apps/web/app/(dashboard)/settings/settings-client.tsx, apps/web/lib/actions/notification-settings.ts, apps/web/lib/notifications/smtp-settings.ts)
- Changed the central SMTP Relay Test button to open a modal that asks which email address should receive the test message.
- The backend test action now validates a single requested recipient and sends the test email there instead of defaulting to the configured sender address.
- The modal remains open after sending and shows a sanitized SMTP test log with recipient, relay endpoint, sender, auth mode, host validation, and final success/error status without exposing credentials.
Validation
- Added focused unit coverage for SMTP test-recipient normalization.
- Validation run:
pnpm --filter web exec node --experimental-strip-types --test lib/notifications/smtp-settings.test.mjs,pnpm --filter web type-check, andpnpm --filter web test:unit.
Drizzle query-style convergence (apps/web/eslint.config.mjs, apps/web/lib/eslint/, apps/web/lib/actions/, apps/web/app/api/admin/hosts/bulk-delete/route.ts)
- Added a local ESLint rule that rejects straightforward single-table
db.select().from(...)andtx.select().from(...)reads, while still allowing query-builder paths that need joins, grouping, or aggregate SQL. - Normalised the remaining simple single-table reads in licence, tags, notes, tag-rule preview, agent cleanup/collision helpers, software inventory filters, and the admin bulk-delete route onto
db.query.*/tx.query.*. - This satisfies security finding
I-07/ issue#366by making the preferred read pattern explicit and enforceable without forcing a risky broad rewrite of aggregate/reporting queries.
Validation
- Added focused unit coverage for the custom ESLint rule so future regressions fail close to the policy.
Security engagement documentation (PENTEST.md)
- Added a repo-root penetration testing scope document alongside
SECURITY.mdandSECURITY_DISCLOSURE.md. - Defined the authorised contact paths, in-scope CT-Ops components, allowed testing categories, rules of engagement, and explicit out-of-scope targets/activities for coordinated assessments.
- This satisfies issue
#365by giving testers and operators a single source of truth for pentest scope and engagement boundaries.
Build state
- Documentation-only change; tests not run
Distributed auth and abuse controls (apps/web/lib/rate-limit.ts, apps/web/lib/auth/, apps/web/lib/db/schema/security-throttles.ts)
- Added a shared
security_throttlestable plus a DB-backed throttle store so security-sensitive rate limits and login lockouts survive restarts and apply consistently across multiple web replicas. - Converted the reusable rate limiter, password login lockout guard, and verification-resend throttles to use shared persisted state instead of process-local maps.
- Applied the shared guards to LDAP login, Better Auth email/password sign-in hooks, verification resend, invite lookups/acceptance, agent download/install/latest endpoints, enrolment-token creation, SMTP relay tests, software inventory scans/reports, certificate URL tracking, and alert test notifications.
- Kept the lightweight proxy-level auth limiter as a best-effort local fast-path while moving authoritative enforcement into server routes and auth hooks.
Test and migration coverage
- Added focused unit coverage proving rate-limit and login-lockout state is shared across separate guard instances when backed by the same store.
- Added the
security_throttlesmigration and E2E harness cleanup coverage so auth throttling state is truncated between Playwright specs. - Validation run:
pnpm --filter web type-check,pnpm --filter web db:validate,node --experimental-strip-types --test apps/web/lib/rate-limit.test.mjs apps/web/lib/auth/login-attempts.test.mjs apps/web/lib/auth/email-verification-rate-limit.test.mjs, andpnpm --filter web test:e2e -- tests/e2e/auth/login.spec.ts tests/e2e/auth/register.spec.ts.
Notification email delivery (apps/web/lib/actions/notification-settings.ts, apps/web/app/(dashboard)/settings/settings-client.tsx, apps/web/lib/notifications/)
- Moved SMTP relay configuration out of per-alert channels and into instance Settings as a central SMTP relay.
- Added admin-only save/test actions for the relay; SMTP passwords are encrypted before storage and redacted from client responses.
- Backend validation now restricts SMTP ports and rejects private/reserved relay hosts when the relay is enabled.
- Alert email channels now store only channel name and recipient addresses; relay host, credentials, encryption, and sender identity come from central settings.
Alert dispatch (apps/web/lib/actions/alerts.ts, apps/ingest/internal/handlers/notify.go, apps/ingest/internal/db/queries/alerts.sql.go)
- Web test notifications and ingest alert delivery now combine central relay settings with each email channel's recipients.
- Ingest keeps backward-compatible fallback support for legacy SMTP channel rows that still contain relay details.
Test coverage
- Added web unit coverage for SMTP recipient normalisation, relay redaction, and delivery through an in-process mock SMTP server.
- Added ingest Go coverage for SMTP delivery through an in-process mock SMTP server.
Agent operating rules (AGENTS.md)
- Added an E2E Database Harness section that points agents to
apps/docs/docs/development/testing.mdas the source of truth for the web E2E harness, avoiding duplicated setup details that can drift - Added rules requiring the database-backed harness when tests depend on real SQL, migrations, constraints, auth/session rows, instance scoping, cascades, or persisted state
- Added rules requiring explicit, relevant seed data for tests instead of relying on leaked state
- Added Progress Tracking guidance: when new feature work satisfies all or part of a requirement, update
PROGRESS.mdand state what is complete versus outstanding - Added Completion Cleanup guidance: remove temporary worktrees only after commit, push, release, and any relevant image/artifact publication, and never delete worktrees with uncommitted user work
- Added unrelated-finding guidance: when an agent identifies an unrelated error, bug, or security risk, it should create a focused GitHub issue so the finding is tracked instead of forgotten
E2E harness docs (apps/docs/docs/development/testing.md)
- Expanded the Playwright/Testcontainers documentation with agent-ready detail on how the tmpfs TimescaleDB/Postgres database is created, how
DATABASE_URLis injected before Next.js starts, and how migrations run - Documented
getTestDb()for direct SQL seeding, when to seed through SQL versus UI/API paths, and how the baselineseedInstanceAndUser()fixture works - Added guidance for feature-specific seed helpers, isolation maintenance, updating
APP_TABLESfor new app tables, single-spec runs, and common troubleshooting
Build state
- Documentation-only change; tests not run
Email verification policy (apps/web/lib/auth/env.ts, apps/web/lib/auth/index.ts, apps/web/app/(auth)/register/)
- Local email/password sign-up now reads
REQUIRE_EMAIL_VERIFICATION, defaulting totruefor the secure deployment posture. - When verification is disabled intentionally (
REQUIRE_EMAIL_VERIFICATION=false), registration signs the new user in and continues to the callback/onboarding flow instead of routing through the verification email gate. - Invalid boolean values are rejected early by the shared auth env helper.
- Root
.env.example,apps/web/.env.example,docker-compose.single.yml, customer-bundle README, and VuePress install/configuration/deployment docs now expose the variable and explain the default.
Test coverage
apps/web/lib/auth/env.test.mjscovers the default, explicittrue/false/1/0, and invalid values.apps/web/tests/e2e/auth/register.spec.tsnow verifies both modes: default verification-required behaviour and the no-verification opt-out path.- New npm script:
pnpm --filter web test:e2e:no-email-verification.
Authentication and account controls (apps/web/lib/auth/, apps/web/app/(auth)/)
- Local email/password sign-ups require email verification before dashboard access by default.
- Per-account login lockout added for repeated failed password attempts.
- Production auth config is now validated via
BETTER_AUTH_SECRET,BETTER_AUTH_URL, and trusted origins instead of silently accepting unsafe defaults. - LDAP login links are scoped to instance context; team-management mutations derive the actor from the session.
Mutation and tenancy boundaries (apps/web/lib/actions/, apps/web/lib/security/trusted-origins.ts)
- Server actions now require an authenticated instance context across agents, alerts, certificates, checks, host groups/settings, LDAP, networks, notes, notifications, service accounts, settings, software inventory, tag rules, tasks, terminal, and users.
- Trusted-origin validation added for mutation routes such as agent bundles, host queries, bundle transfer, and certificate checker calls.
- Notification test targets block private/internal addresses.
Agent, ingest, and tool security
- Terminal access now requires explicit SSH credentials and no longer depends on the agent-side terminal implementation.
- Heartbeat JWT handling tightened: initial heartbeat sends JWT, invalid fallback is rejected, and unsigned agent self-update is disabled.
- Enrolment tokens now enforce expiry and usage caps, with policy tests and migration support.
- Certificate checker surface hardened against SSRF and untrusted certificate parsing now uses Node's
X509Certificate.
Build/test coverage added
- Unit tests added around auth env validation, login attempts, trusted origins, enrolment token policy, certificate fetch/policy, heartbeat JWT handling, and agent update policy.
- E2E coverage expanded for login, registration, authenticated redirects, settings, and activation-token flows.
Agent task execution (agent/internal/tasks/)
- Custom script tasks are constrained with OS-specific process-limit helpers and kill handling.
- Script timeout behaviour now preserves inherited/default limits instead of accidentally dropping them.
- Patch/task execution tests cover the new boundaries.
Agent config safety (agent/internal/config/, agent/internal/install/install.go)
- Agent config loading validates file ownership and permissions on Unix, with Windows-specific fallback handling.
- Installer writes config files with stricter mode expectations and tests verify the new guardrails.
Ingest transport limits (apps/ingest/internal/grpc/server.go)
- gRPC server now caps message and stream limits, backed by a new server test suite.
- JWKS/health HTTP server has a
ReadHeaderTimeoutto remove slowloris exposure.
Customer release artifacts (.github/workflows/, docker-compose.single.yml, deploy/)
- Customer compose bundles now ship digest-pinned image references instead of mutable tags.
- Release workflow exports customer-bundle image references and the bundle check workflow validates them.
- Installer bundle checksums are generated and verified by release checks;
deploy/scripts/test-install.shexercises install output integrity. - Web image startup no longer runs as root; Docker build reliability improved with retry handling for flaky
pnpm install.
Dependency and CI upkeep
- BuildKit image pinned for Docker builds.
- Go, GitHub Actions, production npm dependencies, dev dependencies, and
@types/archiverwere bumped through Dependabot/release PRs. - Release-please continued producing web/agent/ingest tags through the hardening work.
Jenkins bundler (apps/web/app/(dashboard)/bundlers/jenkins-bundler.tsx, apps/web/app/api/tools/jenkins-bundler/route.ts, apps/web/lib/jenkins/update-center.ts)
- Jenkins LTS metadata is queried instead of relying on hard-coded Java compatibility baselines.
- Offline-script bundle flow added for air-gap installation.
- Recursive plugin dependency traversal added so required plugin dependencies are pulled into the archive.
- A dependency pull action and clearer plugin download status make long bundle generation easier to monitor.
GitLab bundler (apps/web/app/(dashboard)/bundlers/gitlab-bundler.tsx, apps/web/app/api/tools/gitlab-bundler/route.ts)
- GitLab air-gap bundler added alongside Jenkins.
- Target GitLab version can be fetched from upstream metadata instead of typed manually.
- Bundle archives can be streamed server-side and transferred to selected hosts through the existing agent/task path.
Repository operating rules (CLAUDE.md / agent instructions)
- Security and PR workflow rules added to keep future changes aligned with the hardening work.
- Conventional PR title requirements documented.
- Completion criteria now require relevant tests to pass before a task is considered done.
Verification posture
- The project now has broader E2E coverage around auth/settings, more focused unit coverage for security-sensitive helpers, and CI/release checks for customer bundle integrity.
GitLab/Jenkins bundle transfer (apps/web/app/api/tools/bundle-transfer/route.ts, apps/web/app/(dashboard)/bundlers/)
- Removed the direct SSH/SFTP transfer path from the web container; transfers now prepare the zip server-side and dispatch a
custom_scripttask to the selected online host. - The host receives the task through the existing agent heartbeat/task channel, creates the destination directory, downloads the prepared zip with a short-lived per-job token, and writes it to the requested path.
- Transfer modal is now single-step: select host, optional owner, and destination directory. No SSH password is requested or sent from the browser.
- Status panel keeps the existing download phase and then shows the separate host transfer phase while the agent task is pending/running/completed/failed.
- Removed the direct
ssh2dependency from the web package; remainingssh2lockfile entries are transitive dev dependencies oftestcontainers.
Build state
pnpm --filter web type-check— zero errors ✅pnpm --filter web lint -- app/api/tools/bundle-transfer/route.ts 'app/(dashboard)/bundlers/bundle-transfer-dialog.tsx' 'app/(dashboard)/bundlers/bundle-transfer-status.tsx'— zero errors ✅
New tooling page (apps/web/app/(dashboard)/certificate-checker/, apps/web/app/api/tools/certificate-checker/route.ts)
- Interactive X.509 cert inspector — no
opensslneeded locally - Supply the cert three ways on one tab: drag-and-drop a file, click to browse, or paste PEM text directly — PEM drops auto-populate the textarea, binary drops (DER/PKCS#12) are sent as base64
- Check URL tab: server-side TLS connect to any host:port with optional SNI override — internal hosts reachable from the web container are inspectable
- Private key validation is upfront on both tabs; match result returned in the same API call as parse/fetch
- Download the leaf cert in PEM, DER, or PKCS#7
- Supports PEM, DER, PKCS#7 (
.p7b), PKCS#12 (.pfx/.p12with password) input formats - Full detail rendering: subject/issuer DN, validity, SHA-1/256/512 fingerprints, key algo/size/curve, all extensions (KU, EKU, SANs, policies, basic constraints, AKI/SKI), OCSP/CRL/CA-issuer URLs, chain table, raw PEM with copy
node-forgeadded for PKCS#7/PKCS#12 parsing; newtabs.tsxshadcn primitive from Radix- Docs: new
apps/docs/docs/features/certificate-checker.md(updated for paste + drag-drop + inline key validation) - PRs: #252 (initial tool), #257 (paste + drag-drop)
Build state
pnpm run build— zero TypeScript errors ✅
Repo/image rename — the repository moved to carrtech-dev/ct-ops, so container images now publish to ghcr.io/carrtech-dev/ct-ops/*
- Updated hardcoded references in
docker-compose.single.yml,.env.example, customer-bundle README, andapps/docs/docs/deployment/docker-compose.md - PR: #253
Enrolment URL env var (apps/web/app/(dashboard)/settings/agents/, apps/web/.env.example, docker-compose.single.yml)
- Agent enrolment
curlinstall command was showinglocalhostwhen the UI was reached via port-forward/proxy - New
getAppOrigin()helper initially backed byNEXT_PUBLIC_APP_URL, then unified onto the existingAGENT_DOWNLOAD_BASE_URLenv var so one variable drives both the ingest service and the web UI's install command AGENT_DOWNLOAD_BASE_URLalso propagated to the web service indocker-compose.single.yml(previously only wired to ingest, soprocess.envwas undefined in the Next.js server component)- Falls back to
window.location.originwhen unset, preserving zero-config local dev - PRs: #254, #258
Build state
pnpm run build— zero TypeScript errors ✅
Ingest-side dedup (apps/ingest/internal/handlers/register.go, apps/ingest/internal/db/queries/hosts.sql.go, agent/internal/registration/registrar.go)
- Two live hosts in the same instance cannot share a hostname or IP — guard now runs at
Registerand atapproveAgent - Online or revoked match → reject with
ALREADY_EXISTSso the admin deletes the stale record first - Offline / unknown match → adopt the existing
agents/hostsrows, rotate the new public key onto the existing agent, and preserve approval state — covers reinstall-with-wiped-data-dir cases that previously produced a duplicate "Offline" record - Agents now report non-loopback IPs in
PlatformInfoat register time so the server can run the overlap check - Key rotation appended to agent status history with reason
"adopted re-registration (keypair rotated; matched by hostname or IP)"for audit apps/web/lib/actions/agents.tsalso hardened against approving a pending agent that now collides with a live host
Docs
apps/docs/docs/architecture/ingest.md+apps/docs/docs/features/hosts.md— new Duplicate-Host Protection section
Build state
pnpm run build— zero TypeScript errors ✅go build ./...— zero errors ✅
New download route (apps/web/app/api/agent/bundle/route.ts, apps/web/lib/agent/bundle.ts, apps/web/lib/agent/binary.ts)
- Settings → Agent Enrolment → Download Install Bundle — produces a per-OS/arch zip containing the agent binary, install helper (
install.shon Linux/macOS,install.ps1on Windows), pre-populatedagent.toml,SHA256SUMS, and aREADME.md - Three token options: generate a fresh single-use token (default 7-day expiry), embed an existing active token, or ship without a token (operator exports
CT_OPS_ENROLMENT_TOKENbefore install) - Gated to
super_admin/instance_admin; scoped byinstanceId; single-use tokens persisted viaagent_enrolment_tokenswithmetadata.source = 'install-bundle'andmetadata.os/metadata.archfor audit - Shared binary resolver extracted to
apps/web/lib/agent/binary.tsso the new route reuses the download route's cache / GitHub-release / baked-binary fallback - Zip built with
jszip - Closes #244; PR #250
Docs
- New
apps/docs/docs/getting-started/agent-install-bundle.md(full install walk-through, token audit-trail, troubleshooting) - Cross-links added to
apps/docs/docs/deployment/air-gap.md,apps/docs/docs/architecture/agent.md, and top-levelREADME.mdso the bundle is discoverable as the air-gap enrolment path
Build state
pnpm run build— zero TypeScript errors ✅
Root cause (apps/web/components/terminal/terminal-panel-context.tsx)
useStateinitialiser was readingsessionStorageon the client, producing HTML that differed from the server render whenever the terminal panel had persisted tabs- Resulting React #418 hydration mismatch aborted hydration for the entire dashboard tree, leaving client components (notably the directory-lookup typeahead) non-interactive
Fix
- Start with
DEFAULT_STATEon server and client; load persisted state in a post-mount effect gated by ahasHydratedflag so the initial empty state doesn't wipesessionStorage set-state-in-effectlint rule disabled around the one-shot hydration effect with a comment explaining the canonical pattern- PRs: #243, #246
Build state
pnpm run build— zero TypeScript errors ✅
New tooling page (apps/web/app/(dashboard)/directory-lookup/, apps/web/lib/actions/ldap-lookup.ts)
- Tooling → Directory User Lookup — on-demand queries against any configured LDAP/AD server; nothing is synced or stored
- Username typeahead (debounced, prefix-matched via
{{username}}*on the configureduserSearchFilter), directory picker when multiple configs exist - Selecting a user fetches the full attribute set (including operational attrs via
+) and renders:- Summary: display name, username, lock/active status, email, sAMAccountName, UPN, copyable DN
- Password: expires, last changed, lock status — parses both AD (
msDS-UserPasswordExpiryTimeComputed,accountExpires,pwdLastSet,lockoutTime,userAccountControl) and OpenLDAP/shadow (shadowLastChange,shadowMax,pwdChangedTime,pwdAccountLockedTime) - Groups: full
memberOflist with CN + DN, copy button, and client-side filter visible whenever the user has any groups - All LDAP Attributes: searchable table of every returned attribute, with Windows file-time / LDAP generalized-time values rendered as human-readable dates and the raw value shown below; binary values as
[binary NB]; password-hash attributes excluded for safety
- Removed unused sync scaffolding from
ldap_configurations(lastSyncAt,syncIntervalMinutes, etc.) and LDAP-sourced columns fromdomain_accounts(ldapConfigurationId,distinguishedName,groups) — the Service Accounts register is now manual-only; live queries go through this tool - Any authenticated instance user can run a lookup; managing LDAP configs remains
instance_admin/super_admin
Follow-up fixes during the same afternoon
- Server-action error surfacing —
searchLdapDirectory/lookupDirectoryUserwrapped in try/catch/finally so a stale client bundle (e.g. after a deploy) no longer leaves the typeahead silently stuck on the spinner; users see a "please reload" message (PR #242) - Portal the suggestions dropdown — the Card ancestor's
overflow-hiddenclipped the absolutely-positioned dropdown; now rendered viacreatePortalondocument.bodywith the input'sgetBoundingClientRecttracked on resize/scroll (PR #247) - Group filter always visible + humanised LDAP timestamps — filter shows whenever there is any group (was >5); attribute table humanises Windows file-time / LDAP generalized-time (PR #249)
Docs
- New
apps/docs/docs/features/directory-lookup.md(+ sidebar) apps/docs/docs/features/service-accounts.mdupdated to direct live lookups to the new tool- Earlier in the same day,
apps/docs/docs/features/networks.mdgained a Graph View section covering the Table/Graph toggle, dashed-bezier edges, dark-mode, and the right-click host-node context menu shipped in Session 45; CLAUDE.md's stale Docusaurus reference replaced with the VuePress path (PR #239)
Build state
pnpm run build— zero TypeScript errors ✅
Terminal preferences (apps/web/components/terminal/)
- New
terminal-preferences.tsstores a global default text size inlocalStoragewith a live change-event bus so every open pane reacts without a reload - Settings gear in the terminal panel toolbar opens a popover with a default text-size slider and presets
- Right-click a terminal tab to pick a per-tab text-size override or revert to the default
terminal-session.tsxsubscribes to preference changes and resizes xterm live- Docs updated at
apps/docs/docs/features/terminal.md - Branch:
feat/terminal-font-size(commit11635d4)
Build state
pnpm run build— zero TypeScript errors ✅
Terminal panel overhaul (apps/web/components/terminal/)
- Right-click a tab for rename, colour presets, split right / split down, or close
- Tabs are draggable for reorder
- New
terminal-pane-tree.tsx— each tab holds a recursive pane tree so a single host session can be split into multiple independent shells with draggable dividers terminal-panel-context.tsxextended with tab colours, rename, reorder, and split operationsterminal-session.tsxupdated for multi-pane rendering per tab- Docs updated at
apps/docs/docs/features/terminal.md - PR: #234
Build state
pnpm run build— zero TypeScript errors ✅
Network graphs (apps/web/app/(dashboard)/hosts/networks/)
- Table/Graph toggle on both individual network page and all-networks page
- Individual network: network node with hosts in a grid below, smoothstep edges
- All-networks: networks in a row, hosts below in columns, cross-network hosts shown once with multiple edges
- New
listNetworksWithHostsserver action for efficient single-query join, lazy-loaded only when graph view is active NetworkNodeComponentandHostNodeComponent(memo-wrapped) with status dots and CIDR badges- Uses
@xyflow/react(MIT) for pan/zoom/minimap/controls - PR: #220
Edge animation & dark-mode polish (apps/web/app/(dashboard)/hosts/networks/components/)
- Custom
AnimatedFlowEdge:getBezierPathcurves with slowstroke-dashoffsetCSS animation (10s cycle) — subtle, React-Flow-homepage style - Endpoint dots at source/target handles; strokes use
var(--muted-foreground)for light/dark - React Flow Controls and MiniMap restyled via
globals.cssto use--card,--border,--muted,--backgroundtheme variables - Earlier iteration with SVG
animateMotionmoving dots replaced for being too busy - PRs: #221 (animated), #224 (dashed bezier)
Host-node context menu (apps/web/app/(dashboard)/hosts/networks/components/)
- Right-click a host node to open in-app terminal session (with username prompt) or navigate to host detail
HostNodeContextMenu— custom fixed-position overlay fired from React Flow'sonNodeContextMenu(shadcn ContextMenu didn't work because React Flow intercepts contextmenu at the pane level)HostNodeTerminalDialoglifted to parent graph level so the terminal dialog survives context-menu unmount- CSS override restoring
pointer-events:allon.react-flow__node-hostNode(xyflow setspointer-events:nonewhen nodes are non-draggable/non-connectable) - PRs: #226, #228, #230, #232
Build state
pnpm run build— zero TypeScript errors ✅
Schema & migrations (apps/web/lib/db/schema/networks.ts, migration 0031)
- New
networkstable — named IP subnets with CIDR range, instance-scoped - New
host_network_membershipsjoin table withis_auto_assignedflag
Server actions & UI (apps/web/lib/actions/networks.ts, apps/web/app/(dashboard)/hosts/networks/)
- Full CRUD and membership management with RBAC (admin/engineer gating)
- Networks list page, network detail page, Networks tab in host detail under Management
- "Networks" nav item added to sidebar under Hosts
Ingest auto-assignment (apps/ingest/internal/)
SyncHostNetworksmatches heartbeat IPs against instance network ranges and syncs auto-assignments; stale assignments removed when IPs change- Called from heartbeat handler on every tick
Docs
- New
apps/docs/docs/features/networks.md - PR: #218
Build state
pnpm run build— zero TypeScript errors ✅go build ./...— zero errors ✅
Migration to VuePress 2 (apps/docs/)
- Full migration from Docusaurus v3 to VuePress 2 — smaller, faster, simpler config
- New
apps/docs/docs/.vuepress/config.ts, custom palette and index SCSS - Full-text body search enabled (air-gap compatible)
- Dockerfile updated for VuePress build → nginx serve
- GitHub Actions workflow updated; Pages deployment source switched to GitHub Actions workflow
- pnpm version pinned in deploy-docs workflow to match
packageManagerfield - System architecture diagram reworked (image replacing hand-drawn diagram), fixed aspect ratio
- PRs: #213 (pnpm pin), #217 (VuePress migration), #216 (diagram fix)
Build state
pnpm run build— zero TypeScript errors ✅
Documentation site (apps/docs/)
- New
apps/docs/workspace: Docusaurus v3 with dark mode, indigo theme, VS Dark code blocks, full-text local search (@easyops-cn/docusaurus-search-local, air-gap compatible) - 15 documentation pages covering all features, architecture, and deployment profiles
- Sidebar structure in
apps/docs/sidebars.ts; "Edit this page" GitHub links on every page - Dockerfile for local development: Node build → nginx serve
- GitHub Actions workflow (
.github/workflows/deploy-docs.yml) auto-deploys to GitHub Pages on push tomain Documentation Rulessection added toCLAUDE.md— docs must be updated in the same PR as any feature change- webpack pinned to 5.99.0 in workspace overrides for Docusaurus compatibility
- pnpm version pinned to 10.6.5 in CI workflow to match
packageManagerfield
Build state
pnpm run build— zero TypeScript errors ✅- PR: #212 (
feat/docusaurus-docs-site)
Software report (apps/web/app/(dashboard)/reports/software/software-report-client.tsx)
- Unified table: all software search results combined into a single sortable table (was separate sections per version) — PR #202
- Clickable hostnames: clicking a host in the results opens its detail page — PR #204
- First-seen column: added to results table to show when a package was first observed — PR #206
- OS distribution chart: pie/bar breakdown of hosts by OS type per package — PR #206
- Version breakdown chart: for the selected package shows distribution of installed versions — PR #206
- Dark mode chart labels: axis and legend labels now visible in dark mode — PR #208
- Export rate limiting: sliding window 3-per-10-seconds limit; export errors now shown in a modal dialog — PR #210
- CSV/PDF export fixes: correct parameter passing for filters; export respects OS family and version filters — PR #204
Build state
pnpm run build— zero TypeScript errors ✅
deleteHost cascade (apps/web/lib/actions/agents.ts)
- Full FK deletion order: notifications → alert_instances → software_scans → task_run_hosts → remaining FKs → host record
- PRs: #196 (notifications), #198 (all FKs), #200 (software_scans)
JWT signing key persistence (apps/ingest/internal/)
- Ingest service now persists its JWT signing key in the database (instance settings table) on first start
- Survives Docker volume resets — agents no longer get 401s after a volume wipe
- PR: #194
Inventory scan reliability (apps/ingest/, apps/web/)
- Inventory tab polls for scan completion and shows live status while a scan is running — PR #187
- Per-collector logging and ingest scan-start diagnostics added — PR #189
- Failed scan errors surfaced in the host Inventory tab (was silently ignored) — PR #191
- Ingest accepts expired agent JWTs in the inventory stream handler to prevent scan failures during token rotation — PR #193
Build state
pnpm run build— zero TypeScript errors ✅go build ./...— zero errors ✅
Software report UX overhaul (apps/web/app/(dashboard)/reports/software/software-report-client.tsx)
- Replaced paginated table with per-package detail view: typeahead combobox selects a package; results show all hosts grouped by version with hostname, OS version, source, architecture, last seen
- Exact version filter shows a dropdown of versions from the DB when a specific package is selected
- Source filter replaced with OS type filter (Linux / macOS / Windows) using
hosts.osfield getPackageDetailsandgetPackageVersionsserver actions added- Export route updated to pass
osFamilyfilter - Inventory wipe bug fixed — if
collectPackagesreturns an error, the task now fails immediately rather than streaming 0 packages; streaming 0 packages causedMarkRemovedPackagesto wipe the host's entire inventory - PR: #185
Monitoring reliability (agent/internal/heartbeat/heartbeat.go, apps/ingest/internal/db/queries/alerts.sql.go, apps/web/app/(dashboard)/hosts/[id]/alerts-tab.tsx)
- CPU spike elimination:
resultsReadyheartbeats now send a cachedhostMetricsSnapshotcollected on the regular 30s tick rather than re-sampling CPU — prevents near-zero delta windows inflating readings to 100% - Alert double-evaluation fix:
GetAlertRulesForHostnow filtersis_global_default = false— global defaults (templates) were being evaluated alongside their host-specific clones - Global defaults visible: Alerts tab on host detail shows a read-only "Instance-wide Default Rules" section linking to Settings → Alerts
- PR: #183
Build state
pnpm run build— zero TypeScript errors ✅go build ./...— zero errors ✅
Agent (agent/internal/tasks/)
- New
software_inventorytask handler with cross-platform package collection:- Linux: dpkg → rpm → pacman → apk (ordered by availability)
- macOS:
system_profiler SPApplicationsDataType+ Homebrew - Windows: registry walk (
HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall)
- Snap/Flatpak/Windows Store sources toggleable via instance settings
- Streams packages in 500-package chunks via new
SubmitSoftwareInventorygRPC endpoint task_idinjected into context fromrunner.goso handlers use it asscan_id
Ingest (apps/ingest/internal/handlers/)
inventory.go: JWT-authenticated client-streaming RPC; bulk UNNEST upsert, marks removed rows onis_last, completestask_run_hostsrowsoftware_sweeper.go: 60s ticker createssoftware_inventorytasks for hosts overdue per instanceintervalHourssettingsoftware.sql.go: bulk UNNEST upsert, removed-package marking, scan tracking queries
Database (apps/web/lib/db/schema/software.ts, migrations 0028, 0029)
software_packages— per-host package rows with name, version, source, architecture, first_seen, last_seen, is_removedsoftware_scans— per-scan metadata (started_at, completed_at, package_count, status)saved_software_reports— per-user saved filter presets
Web (apps/web/)
- Host Inventory tab (
hosts/[id]/inventory-tab.tsx): last scan banner, Rescan button, CSV export, Compare button, stale-scan alert, show-removed toggle, client-side search - Host Compare page (
hosts/[id]/compare/): side-by-side diff of packages between two hosts - Reports → Installed Software (
reports/software/): URL-synced filters (name typeahead, version modes, source, host group), new-in-window, package drift, compare two hosts, saved report filters - Export route (
api/reports/software/export/route.ts): CSV (injection-safe field escaping) + PDF (@react-pdf/rendererserver-side) - Settings card (
settings/settings-client.tsx): enable/disable inventory, interval hours, Snap/Flatpak/Windows Store source toggles
Proto (proto/agent/v1/ingest.proto)
SubmitSoftwareInventoryclient-streaming RPC added toIngestService- Generated Go bindings updated
Build state
pnpm run build— zero TypeScript errors ✅go build ./...— zero errors ✅- PRs: #175 (
feature/software-inventory), #177, #179, #181
Ingest (apps/ingest/internal/handlers/notification_purge_sweeper.go, apps/ingest/internal/db/queries/notifications.go)
- New notification purge sweeper starts with ingest, runs immediately, then repeats daily
- Permanently deletes notifications whose
deleted_atis older than 90 days - Query is covered by a focused unit test using a fake pgx executor
Database / docs
- Added partial
notifications_deleted_at_idxindex for efficient purge scans - Notification retention docs now describe the fixed 90-day soft-delete purge window
Build state
go test ./...fromapps/ingest— passpnpm --filter web db:validate— passpnpm --filter web type-check— passpnpm --filter web lint— exits 0 with existing warnings outside this changeBETTER_AUTH_URL=http://localhost:3000 DATABASE_URL=postgres://postgres:postgres@localhost:5432/ct_ops BETTER_AUTH_SECRET=<32-char-local-build-secret> pnpm --filter web build— pass, with existing Turbopack trace warning
Agent (agent/internal/tasks/uninstall.go, uninstall_unix.go, uninstall_windows.go)
- New
agent_uninstalltask type — agent returns ascheduledresult then spawns a detached child process to run the existing-uninstallflow - The detached process survives the service manager terminating the original agent
- Linux: uses
systemd-run --no-block --collectto place the uninstaller in its own transient cgroup (prevents systemdKillMode=control-groupfrom killing it when the agent service is stopped); falls back tosetsidfor non-systemd Linux - macOS:
setsid-style process detach (launchd tracks by PID, not cgroup) - Windows:
CREATE_NEW_PROCESS_GROUPflag
Web (apps/web/app/(dashboard)/hosts/[id]/host-detail-client.tsx, apps/web/lib/actions/agents.ts)
- Host delete dialog adds "Also uninstall agent from the remote host" checkbox — visible only when the agent is online
- On confirm, dispatches the
agent_uninstalltask before deleting the host record deleteHostaction fixed:task_run_hostsrows now cleaned up before deleting the host (latent FK violation)
Build state
pnpm run build— zero TypeScript errors ✅go build ./agent/...— zero errors ✅- PRs: #171 (
feature/host-delete-uninstall-agent), #173
Dashboard layout (apps/web/app/(dashboard)/layout.tsx)
SidebarProvidercontainer changed frommin-h-svhtoh-svh overflow-hidden— bounds the entire dashboard to the viewport height- The main content area already uses
overflow-autoso page content still scrolls internally; the terminal panel stays pinned at the bottom on all pages regardless of content length - PR: #165 (
feature/terminal-fixed-bottom)
Build state
pnpm run build— zero TypeScript errors ✅
Session 35 — Notification enhancements: bulk actions, charts, soft-delete, and host metrics integration
Database (apps/web/lib/db/schema/alerts.ts, migration 0027_ambiguous_manta)
deleted_atcolumn added tonotificationstable — brings it in line with the project's universal soft-delete convention (was the only major table missing it)
TypeScript actions (apps/web/lib/actions/notifications.ts)
deleteNotification/deleteNotifications— converted from hard delete to soft delete (setdeleted_at)- All inbox queries (
getNotifications,getUnreadCount,markAsRead,markAllAsRead,markBatchReadStatus) now filterWHERE deleted_at IS NULL deleteNotifications(instanceId, userId, ids[])— new batch soft-delete actionmarkBatchReadStatus(instanceId, userId, ids[], read)— new batch read/unread togglegetNotificationStats(instanceId, userId, hostId?)— counts per severity for pie chart; optionalhostIdscopes to a specific hostgetNotificationsOverTime(instanceId, userId, range, hostId?)— daily or hourly aggregation for line chart; intentionally omitsdeleted_atfilter so deleting from the inbox never affects historical trend data; optionalhostIdscopes to a specific host; enforces 90-day maximum retention windowTrendRangetype exported:'1h' | '6h' | '12h' | '24h' | '7d' | '30d' | '90d'
UI — Notifications page (apps/web/app/(dashboard)/notifications/notifications-client.tsx)
- Bulk selection: checkbox on every notification card + select-all checkbox with indeterminate state
- Bulk action toolbar: appears when ≥1 item selected — "Mark as read", "Mark as unread", "Delete", "Clear selection"
- Per-card mark as unread: expanded card now shows "Mark as unread" for already-read notifications (was read-only before)
- Severity breakdown pie chart (donut): critical / warning / info distribution with percentage tooltips; updates on query refetch; "No data" placeholder when empty
- Notification trend line chart: critical & warning daily/hourly counts with a time-range dropdown; description subtitle updates dynamically;
fill: currentColor+ parenttext-muted-foregroundwrapper fixes SVG axis label visibility in dark mode - Time-range dropdown on trend chart: 1h · 6h · 12h · 24h · 7d · 30d · 3 months; sub-24h ranges aggregate per hour (HH:mm labels), longer ranges per day (MMM d labels); TanStack Query key includes range so switching triggers a fresh fetch
- Selection is cleared on filter tab change and pagination
UI — Host detail / Metrics tab (apps/web/app/(dashboard)/hosts/[id]/host-notification-charts.tsx, host-detail-client.tsx)
- New
HostNotificationChartscomponent renders both charts below the Heartbeat Interval chart on the Monitoring → Metrics tab - Charts are scoped to the specific host via the
hostIdfilter on both server actions - Same time-range dropdown and dark-mode-safe axis labels as the global notifications page
- "No notifications for this host" / "No data for this period" placeholders shown when empty
Build state
pnpm run build— zero TypeScript errors ✅- PRs: #159, #161, #163, #165
Database (apps/web/lib/db/schema/alerts.ts, auth.ts, instances.ts, migration 0026_youthful_anthem)
notificationstable: per-user rows with subject, body, severity, resourceType, resourceId, read flag, alertInstanceId FKnotificationsEnabledcolumn added tousertable (default true)InstanceNotificationSettingsadded toInstanceMetadataJSONB:inAppEnabled,inAppRoles,allowUserOptOutNotificationChannelTypeexpanded to'webhook' | 'smtp' | 'slack' | 'telegram'SlackChannelConfig { webhookUrl }andTelegramChannelConfig { botToken; chatId }interfaces added
Go ingest service (apps/ingest/internal/)
alerts.sql.go:GetEnabledSlackChannels,GetEnabledTelegramChannels,GetInstanceNotificationSettings,GetAlertTargetUsers(role + opt-out filter),InsertNotificationBatch(pgx.Batch)notify.go:postSlack(Block Kit JSON),dispatchSlack,postTelegram(Bot API HTML mode),dispatchTelegram,dispatchInApp(instance settings → user targeting → batch insert)alerts.go:notifChannelsstruct expanded; all evaluators (check_status,metric_threshold,cert_expiry) call Slack + Telegram + in-app dispatch on fire and resolve
TypeScript actions (apps/web/lib/actions/)
alerts.ts: Zod discriminated union extended for Slack/Telegram;NotificationChannelSafeunion updated (Telegram masks botToken ashasBotToken);updateNotificationChannelandsendTestNotificationhandle all four typesnotifications.ts:getNotifications,getUnreadCount,markAsRead,markAllAsRead,deleteNotificationnotification-settings.ts:getInstanceNotificationSettings(with defaults),updateInstanceNotificationSettings(admin-only, Zod-validated)profile.ts:updateNotificationPreference(respects instanceallowUserOptOut)
UI — Alerts page (apps/web/app/(dashboard)/alerts/alerts-client.tsx)
AddSlackDialog,EditSlackDialog,AddTelegramDialog,EditTelegramDialogcomponents following existing dialog pattern- "Add Slack" and "Add Telegram" buttons in channels card header
- Type badges (MessageSquare for Slack, Send for Telegram); details column shows webhookUrl/chatId
UI — Notification bell (apps/web/components/shared/notification-bell.tsx, topbar.tsx)
- Topbar Bell icon with absolute-positioned red badge showing unread count (capped at 99+)
- Dropdown: 10 most recent notifications with severity dot, bold subject (unread), relative timestamp, blue dot indicator
- Click:
markAsRead+ navigate to resource (/hosts/{id}or/certificates/{id}) - Footer: "View all notifications" →
/notifications - Polls every 20 s via TanStack Query
refetchInterval
UI — Notifications page (apps/web/app/(dashboard)/notifications/)
- Server component fetches initial 25 notifications + unread count for SSR
- Filter tabs: All / Unread (with badge)
- Cards: severity dot, subject, severity badge, relative + absolute timestamps; click to expand body + resource link + mark-read + delete
- "Mark all read" header button; Previous/Next pagination (PAGE_SIZE=25)
- Polls every 30 s
UI — Settings (apps/web/app/(dashboard)/settings/settings-client.tsx)
- "Notification Settings" card: Enable in-app toggle, role checkboxes (super_admin/instance_admin/engineer/read_only), Allow user opt-out toggle; admin-only
UI — Profile (apps/web/app/(dashboard)/profile/profile-client.tsx)
- "Notifications" card: toggle visible when instance
inAppEnabled; disabled with explanatory text when instance disallows opt-out
Sidebar (apps/web/components/shared/sidebar.tsx)
- "Notifications" entry added to Monitoring group (BellPlus icon,
/notifications)
Build state
pnpm run build— zero TypeScript errors ✅go build ./...— zero errors ✅- PR: #155
Username memory (apps/web/components/terminal/host-terminal-launcher.tsx, terminal-session.tsx, host-selector-dialog.tsx)
- Last-used terminal username per host per user saved to
localStorage(terminal-username-{hostId}-{userId}) - Pre-fills the username input on subsequent connections to the same host
- Uses
useMemo(notuseEffect) to read saved value — avoids unnecessary re-renders
Reconnect on exit (apps/web/components/terminal/terminal-session.tsx)
- When a terminal session ends (e.g. typing
exit), displays a "Press any key to reconnect" prompt - Reconnects with the same host and username on keypress instead of leaving a dead terminal
Build state
pnpm run build— zero TypeScript errors ✅
Session storage persistence (apps/web/components/terminal/terminal-panel-context.tsx)
- Open terminal tabs (host ID, hostname, username, panel height, active tab index) saved to
sessionStorageon every state change - On page load, restores tabs with fresh session IDs — triggers automatic reconnection to the same hosts
- Correctly scoped to browser tab (
sessionStoragenotlocalStorage) — tabs don't survive closing the browser tab, which is correct since PTY sessions are dead at that point
Provider scope fix (apps/web/app/(dashboard)/layout.tsx)
TerminalPanelProvidermoved above the sidebar component so the terminal trigger button in the sidebar nav has provider context
Build state
pnpm run build— zero TypeScript errors ✅
Panel architecture (apps/web/components/terminal/)
- New
terminal-panel.tsx— VS Code-style resizable bottom panel visible on all dashboard pages terminal-panel-context.tsx— React context managing tab state (add/remove/switch tabs), panel visibility and heightterminal-layout-wrapper.tsx— wraps page content and renders the panel belowterminal-session.tsx— individual xterm.js session component, one per tabhost-selector-dialog.tsx— searchable host picker with username input, accessible from sidebar nav and host detail page- Old
terminal-tab.tsxremoved from host detail page — replaced by the global panel
Sidebar integration (apps/web/components/shared/sidebar.tsx)
- "Terminal" entry added under Tooling section in the sidebar nav
- Opens the host selector dialog; selected host opens as a new tab in the persistent panel
Host detail launcher (apps/web/app/(dashboard)/hosts/[id]/host-terminal-launcher.tsx)
- "Open Terminal" button on host detail page opens a tab in the global panel for that specific host
Build state
pnpm run build— zero TypeScript errors ✅
Per-user authentication (apps/web/lib/db/schema/terminal-sessions.ts, agent/internal/terminal/session.go)
- New
usernamecolumn onterminal_sessionstable — migration0025_luxuriant_smasher.sql - Agent launches PTY via
su -l <username>with dropped privileges (notlogin, which varies across distros) - Instance-level "Direct Access" toggle (
terminalDirectAccessin instance metadata) allows bypassing username requirement - UI shows username input on terminal tab; direct access mode skips it
Shell environment hardening (agent/internal/terminal/session.go)
- Agent sets
TERM=xterm-256color,HOME, and prefersbashover default shell for PTY sessions - Cross-distro compatibility: tested with Ubuntu, AlmaLinux, CentOS patterns
Auth fallback (apps/ingest/)
- Falls back to
session_idauth when agent JWT signature verification fails (handles key rotation gracefully) - Accepts expired agent JWTs for Terminal gRPC streams — terminal sessions shouldn't break during token rotation windows
Build state
pnpm run build— zero TypeScript errors ✅go build ./agent/... ./apps/ingest/...— zero errors ✅
Terminal protocol (proto/agent/v1/terminal.proto, proto/agent/v1/heartbeat.proto)
- New
TerminalSessionmessage: session ID, host ID, status, input/output/resize frames HeartbeatResponsegainspending_terminal_sessionsfield — ingest pushes pending sessions to agent on every heartbeat- New
TerminalStreamRPC on ingest service for bidirectional terminal I/O
Database schema (apps/web/lib/db/schema/terminal-sessions.ts, migration 0024_flat_blade.sql)
terminal_sessionstable: session ID, host ID, user ID, instance ID, status (pending/connected/disconnected/failed), timestamps
Ingest: session routing (apps/ingest/)
- Pending terminal sessions included in every heartbeat response so agent picks them up
- Terminal data streamed over existing gRPC connection — no additional ports required
- Diagnostic messages added during development: session state tracking, push counters, agent status reverse-lookup
Agent: PTY management (agent/internal/terminal/)
session.go— opens PTY, reads/writes terminal frames, handles resize events- Integrated with heartbeat response handler — agent starts terminal session when it receives a pending session
Web: terminal UI (apps/web/app/(dashboard)/hosts/[id]/terminal-tab.tsx)
- xterm.js terminal embedded in host detail page tab
- WebSocket connection from browser → Next.js API route → ingest gRPC stream
- Container shown during "connecting" state to avoid 0x0 dimension bug with xterm
Instance settings (apps/web/app/(dashboard)/settings/settings-client.tsx)
- Terminal enable/disable toggle and port configuration in instance settings
Build state
pnpm run build— zero TypeScript errors ✅go build ./agent/... ./apps/ingest/...— zero errors ✅
Metrics chart improvements (apps/web/components/charts/, apps/web/hooks/use-chart-zoom.ts)
- Extracted Recharts into reusable
HostMetricsLineChartandHostHeartbeatBarChartcomponents undercomponents/charts/ useChartZoomhook: click-drag zoom on any chart, reset button to restore original range- Adaptive
time_bucketsizing: capped at 300 data points regardless of time range — prevents chart overload on 30d views - New 6h and 30d time range presets
Custom script runner (agent/internal/tasks/script.go, apps/web/lib/actions/task-runs.ts)
- New
custom_scripttask type: agent receives script content, writes to temp file, executes with streaming output triggerCustomScriptRun/triggerGroupCustomScriptRunserver actions- Script input dialog on host detail and group detail pages with multiline editor
- Task monitor page shows script content in results panel
Service management (agent/internal/tasks/service.go, apps/web/lib/actions/task-runs.ts)
- New
service_actiontask type: start / stop / restart / status operations on systemd services triggerServiceAction/triggerGroupServiceActionserver actions- Service action dialog with autocomplete: "Query server" button fetches running services from the host via
list_servicesagent query, shows clickable dropdown - Task monitor page shows service-specific result formatting
Interactive terminal on host detail (apps/web/app/(dashboard)/hosts/[id]/terminal-tab.tsx)
- Terminal tab on host detail page: each command creates a
custom_scripttask run, output streams at 1.5s poll intervals - Up/down arrow command history recall, Ctrl+C cancels running command, Clear button wipes session
Task history management (apps/web/app/(dashboard)/hosts/[id]/tasks-tab.tsx, apps/web/lib/actions/task-runs.ts)
- Checkbox selection on task history tables (host and group views)
- Select-all header checkbox, bulk Delete button
- Soft-deletes selected
task_runandtask_run_hostsrows
Build state
pnpm run build— zero TypeScript errors ✅go build ./agent/... ./apps/ingest/...— zero errors ✅
User-persisted theme (apps/web/lib/db/schema/auth.ts, apps/web/lib/actions/profile.ts)
- New
themecolumn ('light' | 'dark' | 'system', default'system') on theuserstable — migration0023_high_loki.sql updateTheme(userId, theme)server action: validates with Zod, writes to DB, and sets a 1-yearthemecookie (path/, sameSitelax, not httpOnly) so subsequent SSR reads work without an extra DB query
Root layout: SSR dark class + FOUC prevention (apps/web/app/layout.tsx)
- Root layout is now
async; reads thethemecookie server-side viacookies()and adds thedarkclass to<html>whentheme === 'dark' - Injects a tiny inline
<script>in<head>that runs before React hydrates: reads the cookie, and forsystemor missing values useswindow.matchMedia('(prefers-color-scheme: dark)')— prevents any flash of wrong theme for returning users and handles OS dark preference on first load
Profile page Appearance card (apps/web/app/(dashboard)/profile/profile-client.tsx)
- New "Appearance" card below the 2FA section with three buttons: Light / Dark / System (Sun / Moon / Monitor icons)
- Selected option highlighted with
border-primary bg-primary/10; on click: applies class todocument.documentElementimmediately (no reload), then saves to DB and sets cookie viaupdateThememutation in the background - Current theme initialised from
user.themeso the correct button is pre-selected
Build state
pnpm run build— zero TypeScript errors ✅- Migration
0023_high_loki.sqlgenerated and applied ✅
Database schema (apps/web/lib/db/schema/tasks.ts, migrations 0021–0022)
task_runstable: type, status, config jsonb,max_parallel, instance/created-by FKs, started/completed timestampstask_run_hoststable: per-host execution state (pending → running → completed/failed/cancelled/skipped),raw_outputtext accumulator, exit_code, reboot_required, packages_updated jsonbmax_parallelenforced at query level — SQL counts active rows before dispatching so concurrent ingest instances cannot over-dispatch
Protocol additions (proto/agent/v1/heartbeat.proto)
AgentTask(server→agent): task_run_host_id, task_type, config_jsonAgentTaskProgress(agent→server): incremental stdout/stderr chunk per heartbeat cycleAgentTaskResult(agent→server): final status, exit code, reboot flag, packages listHeartbeatResponsegainscancel_task_ids(field 10) for agent-side cancellation signals
Ingest: task dispatch and output streaming (apps/ingest/internal/handlers/heartbeat.go)
- 2-second ticker polls
GetPendingTasksForHostrespectingmax_parallel; pushesAgentTaskmessages in eachHeartbeatResponse - Appends output chunks with
raw_output || chunkon everyAgentTaskProgressmessage - Marks
task_run_hoststerminal onAgentTaskResult; closes parenttask_runswhen all hosts reach a terminal state GetCancellingTasksForHostpushescancel_task_idsso the agent can kill in-flight processesTimeoutStuckTaskRunHostsmarksrunninghosts as failed after 60 minutes; fires every 5 minutes from the heartbeat handler
Agent: task runner (agent/internal/tasks/)
runner.go— registry pattern:RegisterHandler(taskType, HandlerFunc); routes bytask_type, stores per-taskcontext.CancelFuncin async.Map; cancellation viahandleResponseoncancel_task_idsarrivalpatch.go— first registered handler; detects package manager (apt/dnf/yum/zypper), supportsallandsecuritymodes, streams real output viaio.Pipe, checks/var/run/reboot-requiredandneeds-restarting, parses updated package list from output- 45-minute context timeout per task;
RunPatchreturns"cancelled by user"or"task timed out"for distinct error messages io.Pipedeadlock fixed: scanner goroutine reads from the pipe while the command writes;pw.Close()deferred aftercmd.Wait()
Web: task monitoring UI (apps/web/app/(dashboard)/tasks/[id]/, apps/web/app/(dashboard)/hosts/[id]/tasks-tab.tsx)
/tasks/[id]monitor page: host list (left column) with pending/running/done status icons, scrolling terminal-style output panel (right), live elapsed timer in panel header, auto-scroll while task is running, 3-second poll stops when task completes- Amber warning after 5 minutes with no output, noting the 60-minute auto-fail
- Host detail "Tasks" tab: "Run Patch" button (Linux hosts only), patch mode dialog (All / Security), task history table with links to monitor page
- Group detail "Patch Group" button: mode selection + parallel host selector (1 / 2 / 5 / 10 / Unlimited), non-Linux skip warning, task history
- "Cancel" button on monitor page sets host to
cancellingstate; ingest sendscancel_task_idsto agent on next heartbeat
Build state
pnpm run build— zero TypeScript errors ✅go build ./agent/... ./apps/ingest/...— zero errors ✅
Database schema (apps/web/lib/db/schema/host-groups.ts, migration 0021_chilly_tomas.sql)
host_groupstable: name, description, instance FK, standard timestampshost_group_memberstable: group FK, host FK, agent FK, instance FK — join table with audit timestamps
Server actions (apps/web/lib/actions/host-groups.ts)
createHostGroup,updateHostGroup,deleteHostGroup— full CRUD, Zod-validated, instance-scopedgetHostGroups(instanceId)— returns groups with member countgetHostGroup(instanceId, groupId)— returns group + full member listaddHostToGroup,removeHostFromGroup— membership management
Groups UI (apps/web/app/(dashboard)/hosts/groups/)
/hosts/groupslist page: create dialog, edit inline, delete with confirmation, member count badge/hosts/groups/[id]detail page: group metadata header, member list table with remove button, "Add Host" dialog with search/filter over instance hosts not already in the group- Host detail page "Groups" tab: shows current group memberships with inline add and remove
Sidebar restructure (apps/web/components/shared/sidebar.tsx)
- Collapsible parent/child navigation: Hosts → All Hosts + Groups; Settings → Instance + Agent Enrolment + Alert Defaults + LDAP + System Health
CollapsibleSidebarItemcomponent with chevron indicator; auto-expands when a child route is active- Added shadcn
textareaandformUI components
Build state
pnpm run build— zero TypeScript errors ✅- Migration
0021_chilly_tomas.sqlgenerated and applied ✅
LDAP TLS certificate preview (apps/web/app/(dashboard)/settings/ldap/)
- Fixed TLS certificate textarea overflowing the modal width across three iterations: added
break-allword-wrap, then capped at 5 visible lines with vertical scroll (max-h-20 overflow-y-auto)
Agent enrollment token list (apps/web/app/(dashboard)/settings/agents/)
- Added copy-to-clipboard actions for token value and install command on each row
- Replaced dual copy icons with a single "View" button opening a modal showing the full token and ready-to-run
curlinstall command in a code block
Development tooling (start.sh)
- Replaced
dev.shwith a unifiedstart.shsupporting both production Docker mode and local development - Single entry point reduces onboarding friction
Build state
pnpm run build— zero TypeScript errors ✅
Agent -uninstall flag (agent/internal/install/uninstall.go, agent/cmd/agent/main.go)
- New
-uninstallCLI flag: stops the running service, removes the binary, service files, config, and data directories - Cross-platform: systemd (Linux), launchd (macOS), Windows SCM
Agent auto re-registration after host deletion (agent/internal/heartbeat/heartbeat.go, apps/web/lib/actions/hosts.ts)
- Agent detects gRPC
NotFound/PermissionDenied/Unauthenticatedon heartbeat stream and returnsErrAgentDeregisteredrather than retrying indefinitely runAgentouter loop: onErrAgentDeregistered, clearsagent_state.jsonand re-registers with the same keypair — host reappears in the UI automatically without reinstalldeleteHostserver action now also deletesagent_status_historyand theagentrecord so the running agent is rejected on its next heartbeat
Ingest heartbeat stream close (apps/ingest/internal/handlers/heartbeat.go)
- Heartbeat streaming goroutine now closes within 30 seconds when the associated agent is deleted, preventing orphaned open streams from keeping the host falsely "online"
Build state
go build ./agent/... ./apps/ingest/...— zero errors ✅pnpm run build— zero TypeScript errors ✅
LDAP post-login flows (apps/web/app/(setup)/setup-email/, apps/web/app/(setup)/pending-approval/)
- LDAP users provisioned with placeholder
@ldap.localemail are redirected to/setup-emailto capture a real address before accessing the dashboard - New LDAP users are provisioned with
role: 'pending'and redirected to/pending-approval; an admin assigns a role from the Team page to grant access - Session cookie signing fixed to match Better Auth's HMAC format; LDAP search base resolution improved
Migration timestamp validation (apps/web/scripts/validate-migrations.js, package.json)
db:generatescript now runsvalidate-migrations.jsafterdrizzle-kit generate: verifies all journal entries have strictly increasingwhentimestamps and fails with a clear error if not- Prevents the silent "already applied" skip that burned us in Session 13
- ESLint config updated to ignore the validator script
LDAP edit dialog with TLS certificate upload (apps/web/app/(dashboard)/settings/ldap/)
- Pencil icon on each LDAP config row opens a pre-filled edit dialog
- TLS certificate field accepts paste or file upload; stored in the
ldap_configurations.tls_certificatecolumn updateLdapConfigurationserver action validates and persists changes
Build state
pnpm run build— zero TypeScript errors ✅
Service account / directory account split (apps/web/app/(dashboard)/service-accounts/, apps/web/app/(dashboard)/hosts/[id]/)
- Local OS users moved from top-level service accounts page to per-host "Users" and "Settings" tabs; these are host-scoped, not instance-level inventory
- New "Service Accounts" top-level page targets network/domain accounts sourced from LDAP/AD
- Per-host "Settings" tab: collection toggles (CPU, Memory, Disk on by default; Local Users opt-in); instance-level defaults applied to newly enrolled hosts
LDAP / Active Directory integration (apps/web/lib/ldap/client.ts, apps/web/app/api/auth/ldap/route.ts, apps/web/app/(dashboard)/settings/ldap/)
ldap_configurationstable: host, port, bind DN/password (AES-256-GCM encrypted at rest usingLDAP_ENCRYPTION_KEY), base DN, user/group filters, TLS mode — migration0016domain_accountstable: synced directory accounts with last-seen, locked, password-age, group memberships — migration0016- LDAP client (
ldapts):testConnection,syncUsers,authenticateUserfunctions POST /api/auth/ldaproute: authenticates domain credentials, upserts Better Auth user + session; dual-mode login form (email/password tab + domain username/password tab)- Settings → LDAP page: create config, test connection, trigger sync, view synced user count
Agent check delivery resilience (agent/internal/heartbeat/heartbeat.go, agent/internal/checks/executor.go)
hostIDresolution failures (pre-approval agent) no longer crash the stream; check results are buffered and retried on the next heartbeat- Check executor survives stream reconnects: goroutines are kept alive; pending results accumulate and drain on the next successful stream
Build state
pnpm run build— zero TypeScript errors ✅go build ./agent/... ./apps/ingest/...— zero errors ✅- Migrations
0014–0017generated and applied ✅
Problem: operational data was on the wrong page
- Certificates and Active Alerts were displayed on the System Health page (
/settings/system), which is an admin view for CT-Ops's own platform internals. The Overview page (/dashboard) was an empty placeholder. - Engineers had to navigate into Settings to see whether alerts were firing or certificates were expiring — the wrong mental model.
Fix: split data by audience
- System Health now shows only CT-Ops platform internals: version, licence tier, database connection status, metric retention, and agent pipeline counts. Description updated to "Platform status and configuration".
- Overview now shows the operational state of infrastructure: Agents (online/offline), Certificates (valid/expiring/expired), Active Alerts (firing/acknowledged), and a Summary panel. All cards link through to their respective detail pages.
New /api/overview endpoint (apps/web/app/api/overview/route.ts)
- Returns
agents,certificates, andalertscounts scoped to the user's instance. /api/system/healthstripped of certificate and alert queries — now only queries agents and instance config.
New DashboardClient component (apps/web/app/(dashboard)/dashboard/dashboard-client.tsx)
- Polls
/api/overviewevery 30 seconds (matching System Health behaviour). - Overview
page.tsxdelegates to this client component, replacing the static placeholder.
Build state
pnpm run build(apps/web) — zero TypeScript errors ✅/api/overviewroute appears in build output ✅
Problem 1: Failed to find Server Action on certificates list and detail pages
- Root cause:
getCertificates,getCertificateCounts, andgetCertificatewere all called from TanStack QueryqueryFninside client components. Server actions use POST and are identified by a build-time hash; in production standalone Docker builds, client and server bundles can have drifted action IDs across deployments, causing Next.js to reject the requests. - Fix (list page): added
GET /api/certificatesandGET /api/certificates/countsroute handlers (apps/web/app/api/certificates/route.ts,apps/web/app/api/certificates/counts/route.ts).CertificatesClientqueryFnnow usesfetch()against these routes. AddedinitialDataandstaleTime: 30sso SSR data is used on first render without an immediate refetch. - Fix (detail page): removed
useQueryfromCertificateDetailCliententirely — the server page already SSR-fetches the certificate and passes it as props, so no client-side refetch was needed. deleteCertificatemutation continues to use the server action (correct pattern — mutations are the intended use case for server actions).
Problem 2: RangeError: Invalid time value crash on certificate chain table
- Root cause:
CertificateChainEntryTypeScript interface inapps/web/lib/db/schema/certificates.tsdeclared fields as camelCase (notAfter,notBefore,fingerprintSha256) but the Go ingest handler serialisescertChainEntrywith snake_case JSON tags (not_after,not_before,fingerprint_sha256). Every read ofentry.notAfterin the chain table returnedundefined→new Date(undefined)→ Invalid Date →date-fns format()threwRangeError: Invalid time valueduring SSR, crashing the detail page. - Fix: corrected
CertificateChainEntryto use snake_case keys matching the Go JSON output; updated the chain table incertificate-detail-client.tsxto useentry.not_afterandentry.fingerprint_sha256.
Other fixes on this branch (not yet merged to main at session start)
fix(ci): trackedagent-distdirectory so DockerCOPYsucceeds in CI (288291d)feat(checks): addedcert_filecheck type to Add Check dialog and fixed cert JSON display (9ad2882)fix(ci): bumped ingest Dockerfile togolang:1.25-alpine(3c5ef71)fix(web): fixed EACCES onagent-distvolume mount by switching to root entrypoint withsu-execprivilege drop (909494d)
Build state
pnpm run build(apps/web) — zero TypeScript errors ✅GET /api/certificatesandGET /api/certificates/countsroutes appear in build output ✅
Database schema (apps/web/lib/db/schema/certificates.ts, migration 0013_certificates.sql)
- New
certificatestable with composite unique index on(instance_id, host, port, server_name, fingerprint_sha256), expiry and status indexes, soft delete,sourcecolumn (discovered|imported|issued) for future CA work,discoveredByHostIdfield (semantically scoped to discovery, not deployment) - New
certificate_eventstable for append-only event spine: discovered, renewed, expiring_soon, expired, restored, removed CertificateStatus,CertificateSource,CertificateEventTypeTypeScript types
Web: check type extension (apps/web/lib/db/schema/checks.ts, apps/web/lib/actions/checks.ts)
- Added
'certificate'toCheckTypeunion andCertificateCheckConfigtoCheckConfig - Zod schema in
createCheck/updateCheckaccepts the new type
Web: server actions (apps/web/lib/actions/certificates.ts, apps/web/lib/certificates/expiry.ts)
getCertificates(instanceId, filters)— paginated, filterable by status/host, sortablegetCertificate(instanceId, certId)— returns cert + eventsgetCertificateCounts(instanceId)— valid/expiring_soon/expired/invalid talliesdeleteCertificate(instanceId, certId)— soft deletecomputeExpiryStatus(notAfter, warnDays)andformatDaysUntil(date)helpers
Web: UI (apps/web/app/(dashboard)/certificates/, apps/web/components/certificates/)
CertificatesClient— summary cards, host filter, status/sort selects, sortable table defaulting to soonest-expiry-first/certificates/[id]detail page — summary cards, fingerprint copy, SANs chips, chain table, event timelineCertificateStatusBadge— valid/expiring_soon/expired/invalid with correct color coding- Replaced placeholder page.tsx
Web: alert rule extension (apps/web/lib/db/schema/alerts.ts, apps/web/lib/actions/alerts.ts, apps/web/app/(dashboard)/hosts/[id]/alerts-tab.tsx)
CertExpiryConfiginterface and'cert_expiry'added toAlertConditionTypeandAlertRuleConfig- Zod
certExpiryConfigSchemaadded to create/update schemas AddRuleDialogextended: scope radio (All / Specific), cert picker, days-before-expiry input;ruleConditionSummaryhandles cert_expiry display
Agent: certificate check (agent/internal/checks/certificate.go, agent/internal/checks/executor.go)
runCertificateCheck(cfg)— dials with TLS skip (intentional, own validation), parses leaf + chain, buildsCertificateReportJSON- Returns
pass(valid),fail(expired/not-yet-valid), orerror(dial failure) - Dispatcher wired in executor.go
Ingest: certificate persistence (apps/ingest/internal/handlers/certificates.go, apps/ingest/internal/db/queries/certificates.sql.go)
persistCertificateResult— unmarshals report, computes status, upserts via natural key, detects renewal (new fingerprint for same endpoint emitsrenewedevent on both old and new rows), writes discovered/status-change events- Wired into heartbeat handler via per-heartbeat
GetChecksForHosttype map
Ingest: cert expiry alert evaluator + sweeper (apps/ingest/internal/handlers/alerts.go, apps/ingest/cmd/ingest/main.go)
evaluateCertExpiryForCert— called immediately after persist; loads instance's cert_expiry rules, evaluates eachevaluateCertExpiryRule— fires/resolvesalert_instancesrow keyed byruleID + metadata.certificateId; uses cert'sdiscovered_by_host_idas FK-safehost_id; dispatches via existing webhook + SMTP pipelineRunCertExpirySweepergoroutine — ticks every 15 min, sweeps all instances with cert_expiry rules- Sweeper started from
main.go
Go queries (apps/ingest/internal/db/queries/certificates.sql.go)
UpsertCertificate,FindCertsForEndpoint,InsertCertificateEvent,GetActiveCertAlertInstance,InsertCertAlertInstance,GetCertExpiryRulesForInstance,GetAllInstancesWithCertExpiryRules,ListCertificatesExpiringWithin,GetCertificateByID
Build state
pnpm run build(apps/web) — zero TypeScript errors ✅go build ./apps/ingest/... ./agent/...— zero errors ✅- Migration
0013_certificates.sqlgenerated ✅
Alert history: pagination + date/severity filters (apps/web/lib/actions/alerts.ts, apps/web/app/(dashboard)/alerts/alerts-client.tsx, apps/web/app/(dashboard)/alerts/page.tsx)
getAlertInstancesnow acceptsoffset,dateFrom,dateTo,severityfilters in addition to the existingstatus/hostId/limitparamsgetAlertInstanceCountadded — returns the total count matching the same filters (used for pagination metadata)- Recent History section replaced with a fully paginated Alert History table (25 rows/page)
- Filter controls: severity dropdown + date-from / date-to inputs in the card header; "Clear" button appears when any filter is active; page resets to 0 when any filter changes
- Page count and "X–Y of Z alerts" summary shown in the card description; Previous/Next buttons shown only when there is more than one page
- Table dims with
opacity-60transition while fetching (TanStack QueryplaceholderData: prev) - Server no longer pre-fetches
initialRecent— history is entirely client-driven so SSR doesn't block on a potentially large query
TimescaleDB hypertable fix (migration 0012_massive_dormammu.sql)
- Root cause: migration 0005 called
create_hypertablebut the table had a single-column PK onid; TimescaleDB requires the partition column (recorded_at) to be part of any unique constraint — so the call silently failed via the EXCEPTION handler - Fix: changed
host_metricsschema to composite PK(id, recorded_at)inapps/web/lib/db/schema/metrics.ts; migration 0012 drops the oldhost_metrics_pkeyand addshost_metrics_id_recorded_at_pk (id, recorded_at), then callscreate_hypertablewithmigrate_data => true - Migration is fully idempotent (uses
IF EXISTS/IF NOT EXISTSguards in a DO block) so it applies cleanly even if partially applied previously
TimescaleDB continuous aggregates (migrations 0011_overrated_mongu.sql, 0012_massive_dormammu.sql)
host_metrics_hourly— 1-hour bucket CAGG, refresh policy: every hour, covering last 3 hourshost_metrics_daily— 1-day bucket CAGG, refresh policy: every day, covering last 3 daysgetHostMetrics(inapps/web/lib/actions/agents.ts) now queries fromhost_metrics_hourlyfor 24h range andhost_metrics_dailyfor 7d range using raw SQL viadb.execute(sql\...`); falls back to the rawhost_metrics` table if the view doesn't exist (graceful degradation for plain PostgreSQL)
Metric retention setting (apps/web/lib/db/schema/instances.ts, apps/web/lib/actions/settings.ts, apps/web/app/(dashboard)/settings/settings-client.tsx)
- New
metricRetentionDaysinteger column (default 30) oninstancestable — migration0011_overrated_mongu.sql updateMetricRetention(instanceId, days)server action validates 1–3650 days; admin-only- New "Metric Retention" card in Settings UI with a Select (7 / 14 / 30 / 60 / 90 / 180 days / 1 year); Save button disabled when value matches current DB value
Build state
pnpm run build— zero TypeScript errors ✅- 13 migrations applied, all with monotonically increasing
whentimestamps ✅ host_metricshypertable confirmed ✅host_metrics_hourly+host_metrics_dailycontinuous aggregates confirmed ✅
Heartbeat backoff reset after stable stream (agent/internal/heartbeat/heartbeat.go)
- The reconnect backoff now resets to 1 s when a stream ran stably for at least 10 s (
minStableTime) - Prevents a transient blip (e.g. firewall state expiry) from locking the agent into a slow 60 s retry cycle on the next failure
Agent self-update: live version refresh in ingest (apps/ingest/internal/config/version_poller.go, apps/ingest/internal/handlers/heartbeat.go, apps/ingest/cmd/ingest/main.go)
- Root cause: ingest read
latestVersionfrom.release-please-manifest.jsononce at startup and cached it for the process lifetime. The UI's "available version" display uses the/api/agent/latestendpoint which queries GitHub live, so UI and ingest diverged whenever a new release was cut without an ingest restart — producing step-wise upgrades (v0.9.0 → v0.11.0 on first restart, v0.11.0 → v0.11.1 on a second). - Added
VersionPollerstruct: seeds from the startup config value, then re-reads the manifest from disk every 5 minutes in a background goroutine usingatomic.Valuefor lock-free reads HeartbeatHandlerreplacedlatestVersion stringfield with*config.VersionPoller; callsversionPoller.Get()on each heartbeat so agents are notified of new releases within 5 minutes of the manifest being updated — no service restart required- Version changes logged at Info level ("agent latest version updated") for observability
Docker multi-arch builds: native runners instead of QEMU (.github/workflows/docker-publish.yml)
- Root cause: both web and ingest jobs used a single
ubuntu-latestrunner with QEMU emulating arm64;pnpm installunder QEMU was taking 60+ minutes. - Replaced with a platform matrix —
ubuntu-latest(linux/amd64) andubuntu-24.04-arm(linux/arm64) — running in parallel as native builds; arm64 build time drops to ~2–3 minutes - Each platform job builds and pushes by digest (
push-by-digest=true), uploads the digest as an artifact; a downstreammerge-web/merge-ingestjob downloads both digests and runsdocker buildx imagetools createto produce the final multi-arch manifest list - GHA cache scoped per platform (
scope=web-amd64,scope=web-arm64) to prevent cross-arch cache collisions - No more QEMU step required in any job
Build state
go build ./apps/ingest/...— compiles ✅go build ./agent/...— compiles ✅
Alert silencing feature (apps/web/lib/db/schema/alerts.ts, apps/web/lib/actions/alerts.ts, apps/web/app/(dashboard)/alerts/alerts-client.tsx, apps/web/app/(dashboard)/hosts/[id]/alerts-tab.tsx, apps/ingest/internal/db/queries/alerts.sql.go, apps/ingest/internal/handlers/alerts.go)
- New
alert_silencestable — host-scoped or instance-wide time windows that suppress alert evaluation; migration0010_eager_chameleon.sqlgenerated viadb:generate - Server actions:
getSilences,getActiveSilencesForHost,createSilence,deleteSilence - Go ingest:
IsHostSilencedquery short-circuitsevaluateAlertsso silenced hosts skip rule evaluation entirely - UI: dedicated Silences card on
/alertspage with Active/Upcoming/Expired badges + add dialog; per-host "Silence Host" button and amber active-silence banner with one-click remove on the host detail Alerts tab
Migration runner root-cause fix (apps/web/lib/db/migrations/meta/_journal.json, apps/web/lib/db/migrations/0009_global_alert_defaults.sql, apps/web/Dockerfile, start.sh)
- Recurring "migrations not applied" symptom traced to
_journal.json: drizzle-orm's migrator decides pending migrations by comparing each entry'swhentimestamp againstMAX(created_at)in__drizzle_migrations. Migration 0008 had been hand-crafted withwhen: 1775900000000(artificially in the future), so 0009 and 0010 — with smallerwhenvalues — were silently classified as already applied and skipped with no error. - Fix: bumped 0009 →
1775900000001and 0010 →1775900000002so timestamps are strictly monotonic - 0009 SQL rewritten with
IF NOT EXISTSguards because its column had been applied to live DBs without being tracked - Dockerfile: migration SQL files now copied directly from build context (
COPY --chown=nextjs:nodejs apps/web/lib/db/migrations …) so the layer is always invalidated when migrations change, regardless of builder-stage cache hits start.sh: explicitDOCKER_DB_URLconstant and fail-fast onpnpm db:migratefailure so silent skips can never happen again- Going forward: never hand-craft migration files or
_journal.jsonentries — alwayspnpm run db:generatesowhenisDate.now()and remains monotonic
Build state
pnpm run build— zero TypeScript errors ✅go build ./apps/ingest/...— compiles ✅alert_silencestable verified present;__drizzle_migrationshas all 11 rows
Test notification button (apps/web/app/(dashboard)/alerts/alerts-client.tsx, apps/web/lib/actions/alerts.ts)
- Flask icon button per channel row; sends a real test payload and shows a
TestLogDialogwith success confirmation or the exact error string (e.g. SMTP auth failure, HTTP 401, TLS version mismatch) - Webhook test: POSTs
alert.testJSON payload with HMAC-SHA256 signature when a secret is set; 10 s timeout - SMTP test: sends via nodemailer using the stored channel config; nodemailer installed as a new web dependency
- Button shows
Loader2spinner while in flight; result dialog stays open until dismissed
Edit notification channel (apps/web/app/(dashboard)/alerts/alerts-client.tsx, apps/web/lib/actions/alerts.ts)
- Pencil icon button opens type-specific edit dialog (
EditWebhookDialog/EditSmtpDialog) pre-filled from the safe config - Secret/password fields labelled "leave blank to keep existing" when a value is already stored; empty submission preserves the existing credential
updateNotificationChannelserver action merges with the existing DB row so secrets are never lost
SMTP encryption field (apps/web/lib/db/schema/alerts.ts, apps/web/lib/actions/alerts.ts, apps/web/app/(dashboard)/alerts/alerts-client.tsx)
SmtpChannelConfig.secure: booleanreplaced withencryption: 'none' | 'starttls' | 'tls'; newSmtpEncryptiontype exported from schema- Zod schemas and
NotificationChannelSafetype updated throughout normaliseSmtpConfig()backward-compat function converts legacysecure: boolrows on read (true → 'tls',false → 'starttls') — no migration needed (JSONB)sendTestNotificationmaps encryption to nodemailer:tls → secure: true,starttls → requireTLS: true,none → plain- UI: checkbox replaced with labelled
SmtpEncryptionSelect(three options with descriptions); selecting a mode auto-fills the conventional port (465/587/25) - Channel details column now shows encryption mode: e.g.
smtp.eu.mailgun.org:587 (STARTTLS) → ...
SMTP alert dispatch — Go ingest service (apps/ingest/internal/db/queries/alerts.sql.go, apps/ingest/internal/handlers/alerts.go, apps/ingest/internal/handlers/notify.go)
SmtpChannelRowstruct andGetEnabledSmtpChannelsquery added — previously absent; SMTP channels were silently never fetchedsmtpChannelConfigstruct witheffectiveEncryption()handles both newencryptionfield and legacysecure: boolsendSmtpEmail: implements all three modes —tls(direct TLS viacrypto/tls+smtp.NewClient),starttls(smtp.Dial+StartTLS),none(smtp.SendMail);smtpSendhelper for MAIL/RCPT/DATA sequencedispatchSmtpfans out to all SMTP channels in goroutines, logging failures (best-effort, same pattern as webhooks)notifChannelsstruct bundleswebhooksandsmtpslices;evaluateCheckStatusRuleandevaluateMetricThresholdRuleupdated to accept and dispatch to both- All four fire/resolve points in both rule evaluators now call both
dispatchWebhooksanddispatchSmtp
Build state
pnpm run build— zero TypeScript errors ✅go build ./apps/ingest/...— compiles ✅
HTTP check resource leak (agent/internal/checks/http.go)
- Shared a single
http.Client(withTransport) across all HTTP check goroutines instead of creating a new one per check — prevents file-descriptor exhaustion from accumulated idle transports on hosts with many HTTP checks - Response bodies are now always drained before close so TCP connections are cleanly returned to the pool
Stream dedup map reset on reconnect (agent/internal/heartbeat/heartbeat.go)
seenQueryIDsmap is cleared at the start of each new stream session so ad-hoc queries that were pending when a stream died are re-executed on the new stream rather than silently dropped
Build state
go build ./agent/...— compiles ✅
SMTP notification channel (apps/web/lib/db/schema/alerts.ts, apps/web/lib/actions/alerts.ts, apps/web/app/(dashboard)/alerts/alerts-client.tsx)
- New
SmtpChannelConfiginterface: host, port, secure, optional username/password, fromAddress, fromName, toAddresses (array) NotificationChannelTypeunion type'webhook' | 'smtp'replaces the hard-coded'webhook'literal onnotificationChannels.typenotificationChannels.confignow typed asWebhookChannelConfig | SmtpChannelConfigcreateNotificationChannelaction handles both channel types; SMTP passwords are redacted (hasSecret) the same way webhook secrets are- Alerts page updated: Add Channel dialog has a type selector that switches between webhook and SMTP field sets; channel list renders type badge and appropriate masked credentials
Global alert defaults (apps/web/lib/db/schema/alerts.ts, migration 0009_global_alert_defaults.sql, apps/web/lib/actions/alerts.ts, apps/web/lib/actions/agents.ts)
- New
isGlobalDefaultboolean column onalertRules(default false); migration0009adds it getGlobalAlertDefaults(instanceId)— fetches all global-default rules for the instancecreateGlobalAlertDefault(instanceId, input)— creates a rule withisGlobalDefault = true; onlymetric_thresholdtype allowed for defaultsdeleteGlobalAlertDefault(instanceId, ruleId)— soft-deletes a global default ruleapplyGlobalDefaultsToHost(instanceId, hostId)— clones each active global-default rule as a host-scoped rule; called fromapproveAgentimmediately after manual approvalgetAlertRulesnow excludes global defaults from regular host/instance rule listings (prevents duplicates in the Alerts tab)
Global Alert Defaults settings page (apps/web/app/(dashboard)/settings/alerts/)
- New
page.tsx— admin-only server component; fetches initial defaults, passes to client alerts-client.tsx— table of default metric threshold rules with Add/Delete; Add dialog: metric (cpu/memory/disk), operator, threshold %, severity- Sidebar link "Global Alert Defaults" added under Administration
Build state
pnpm run build— zero TypeScript errors ✅
Schema (apps/web/lib/db/schema/alerts.ts, migration 0008_alert_rules.sql)
alertRulestable: instance-scoped, hostId nullable (null = instance-wide), conditionType, config JSONB, severity, enabledalertInstancestable: ruleId, hostId, instanceId, status (firing/resolved/acknowledged), message, triggeredAt, resolvedAt, acknowledgedAt/BynotificationChannelstable: instanceId, name, type='webhook', config JSONB (url + optional secret), enabled
Ingest alert evaluation (apps/ingest/internal/db/queries/alerts.sql.go, apps/ingest/internal/handlers/alerts.go, apps/ingest/internal/handlers/notify.go)
GetAlertRulesForHost,GetActiveAlertInstance,InsertAlertInstance,ResolveAlertInstance,GetRecentCheckResults,GetEnabledWebhookChannels— Go query functionsevaluateAlertscalled fromprocessHeartbeatafter check results are persisted; evaluates bothcheck_statusandmetric_thresholdrulescheck_status: fetches last N results (N = failureThreshold); fires when all failing, resolves when latest passes; guards against insufficient historymetric_threshold: compares current heartbeat metric (float32→float64 cast) against threshold; fires/resolves each heartbeat cyclenotify.go:postWebhookwith HMAC-SHA256 signing, 5 s timeout, best-effort goroutine fan-outprocessHeartbeatsignature extended withhostname stringparam; both call sites updated
Server actions (apps/web/lib/actions/alerts.ts)
getAlertRules(instanceId, hostId?)— usesor(eq, isNull)for instance-wide rule inclusioncreateAlertRule,updateAlertRule,deleteAlertRule(soft delete),getAlertInstances,acknowledgeAlertgetActiveAlertCountsForHosts(instanceId, hostIds[])— GROUP BY for inventory badgegetNotificationChannels(redacts secret →hasSecret: boolean),createNotificationChannel,deleteNotificationChannel
Alerts page (apps/web/app/(dashboard)/alerts/page.tsx, alerts-client.tsx)
- Replaced placeholder; server component fetches initial data, passes to client with
currentUserId - Active alerts table: SeverityBadge, host link, rule name, message, triggered-at, Acknowledge button
- History table: last 50 resolved/acknowledged
- Severity filter dropdown
- Notification channels section: webhook table + Add Webhook dialog (URL + optional secret; secret masked as
hasSecretafter save)
Host detail Alerts tab (apps/web/app/(dashboard)/hosts/[id]/alerts-tab.tsx)
- Host-specific rules section with Add Rule dialog (conditionType selector, check picker / metric config, severity)
- Enable/disable
<Switch>and delete per rule; instance-wide rules shown read-only in separate card - Active alert count badge pulled via TanStack Query; shown in red if > 0
- Host detail
page.tsxnow passescurrentUserIdtoHostDetailClient host-detail-client.tsx: new'alerts'tab with red count badge;getAlertInstancesquery for badge count
Host inventory alert badge (apps/web/app/(dashboard)/hosts/hosts-client.tsx)
getActiveAlertCountsForHostsquery (enabled when hosts list is non-empty)- New "Alerts" column: red badge if count > 0,
—otherwise
Build state
npm run build— zero TypeScript errors ✅go build github.com/carrtech-dev/ct-ops/ingest/...— compiles ✅go build github.com/carrtech-dev/ct-ops/agent/...— compiles ✅
--tls-skip-verify in install flow (agent/cmd/agent/main.go, agent/internal/install/install.go, apps/web/app/api/agent/install/route.ts, apps/web/app/(dashboard)/settings/agents/agents-client.tsx)
- New
--tls-skip-verifyCLI flag threaded throughinstall.Run()→mergeConfig()→writeConfig(), writingtls_skip_verify = trueintoagent.tomlwhen set — no manual config editing required after install in self-signed cert environments - Install script route accepts
skip_verify=truequery param and appends the flag to the generated agent command - Token creation UI adds an "Accept self-signed certificates" checkbox (checked by default) that controls whether the generated curl command includes the param
Pinned required agent version (apps/web/lib/agent/version.ts, apps/web/app/api/agent/download/route.ts, apps/web/lib/agent/cache-prewarm.ts)
lib/agent/version.ts— new module withREQUIRED_AGENT_VERSIONconstant; single source of truth for which agent version a given server release requires- Cache-prewarm fetches the specific GitHub release tagged
agent/<version>rather than latest; logs a clear message when the release doesn't exist yet (local dev) - Download route serves the pinned versioned binary, falling back to an unversioned locally-built binary (
make agent) for development - 503 error message names the missing release tag explicitly
Version derived from release-please manifest (apps/web/lib/agent/version.ts, apps/web/Dockerfile)
REQUIRED_AGENT_VERSIONis now read from.release-please-manifest.jsonat the repo root, which release-please updates automatically on every agent release — no manual version bumping required- Dockerfile updated to copy the manifest into both builder and runner stages so it is available at container runtime
- Falls back to a hardcoded version with a console warning if the manifest cannot be found
Cross-platform agent build fixes (Makefile, .gitignore, agent/internal/heartbeat/disk_linux.go, agent/internal/heartbeat/disk_other.go)
make agentnow builds for all six platforms (linux/darwin/windows × amd64/arm64) and outputs toapps/web/data/agent-dist/with the correct naming convention matching the download routeGOCACHE/GOPATHfixed so the Docker build container can write its cache under--userreadAllDisks()extracted intodisk_linux.go(build taglinux) and a stubdisk_other.go(build tag!linux) so the agent cross-compiles cleanly for Windows/macOSapps/web/data/added to.gitignore(generated binaries)
Build state
npm run build— zero errors ✅go build ./agent/...— compiles ✅go build ./apps/ingest/...— compiles ✅
agent_queries schema (apps/web/lib/db/schema/agent-queries.ts)
agent_queriestable: instance_id, host_id, query_type (list_ports|list_services), status (pending|complete|error), result jsonb, error_message, expires_at (2-minute TTL), requested/completed timestamps
API routes (apps/web/app/api/hosts/[id]/queries/)
POST /api/hosts/[id]/queries— creates a pending query, returns query ID; auth-guarded with instance membership checkGET /api/hosts/[id]/queries/[queryId]— polls query status and returns result when complete; 1-second client poll interval
Ingest: push pending queries to open streams (apps/ingest/internal/handlers/heartbeat.go)
- Polls DB every 2 s for pending queries scoped to the connected host
- Pushes queries into
HeartbeatResponse.PendingQueries; agent processes and returns results in ~2–3 s rather than waiting for the 30 s heartbeat - Saves completed results back to
agent_queries, updating status frompending→complete(orerror) - Normalises agent-returned status
"ok"→"complete"so UI renders correctly
Agent query executor (agent/internal/queries/)
- Handles
list_ports: reads/proc/net/tcp,/proc/net/tcp6,/proc/net/udp,/proc/net/udp6; resolves inode → process name via/proc/<pid>/fd/; returns port, protocol, optional process name - Handles
list_services: reads systemd unit files and status viasystemctl list-units; returns service name, description, status - Responses drain into each heartbeat request alongside check results
UI — "Query server" in Add Check dialog (apps/web/app/(dashboard)/hosts/[id]/checks-tab.tsx)
- "Query server" button in the Add Check dialog triggers
list_portsorlist_servicesdepending on check type - Polls the GET endpoint every second until complete; renders a clickable list of discovered ports/services
- Clicking a result auto-populates the port/name field in the check config form
Build state
npm run build— zero errors ✅go build ./agent/...— compiles ✅go build ./apps/ingest/...— compiles ✅
(Built between Sessions 3 and 4; not previously documented)
Automated releases via release-please + GitHub Actions (.github/workflows/agent-release.yml, release-please-config.json)
- Conventional commits on
agent/paths trigger release-please PRs - On merge, GitHub Actions builds agent binaries for all platforms (linux/darwin/windows × amd64/arm64) with build-time version injection (
-ldflags "-X main.version=<tag>") - Binaries uploaded as release artifacts under
agent/vX.Y.Ztags
Server-hosted binaries and version-aware cache (apps/web/app/api/agent/download/route.ts)
GET /api/agent/download?os=X&arch=Y— fetches from GitHub Releases on first request, caches locally inAGENT_DIST_DIR- Filenames are versioned (
ct-ops-agent-linux-amd64-v0.5.0); new releases picked up automatically without cache invalidation - 5-minute TTL check against GitHub for latest version
- Enables air-gapped deployments — server is the single binary source
Prewarm agent binary cache on server startup (apps/web/instrumentation.ts, apps/web/lib/agent/cache-prewarm.ts)
instrumentation.tsserver hook downloads all platform binaries (6 combinations) in parallel on startup- Already-cached versions skipped; failures logged but never prevent server start
- Fresh servers are immediately ready to serve installs without a cold-cache delay on first request
Bootstrap install script (apps/web/app/api/agent/install/route.ts, agent/cmd/agent/main.go)
curl -fsSL "https://server/api/agent/install" | shdownloads the installer without placing enrolment tokens into URLs- Shell script detects OS/arch, downloads the versioned binary, and installs when
CT_OPS_ENROLMENT_TOKENis present in the runtime environment - Agent
--installaccepts either--tokenorCT_OPS_ENROLMENT_TOKEN, then copies the binary to the system path, writes TOML config, installs the service unit, and starts the service - Also supports
-addressCLI flag andCT_OPS_ENROLMENT_TOKEN/CT_OPS_INGEST_ADDRESSenv vars for config-less operation - Enrolment token dialog now keeps the token separate from the installer command so the secret is not reflected into copied URLs
Multi-platform service install (agent/internal/install/install.go, agent/cmd/agent/service_windows.go)
- Linux: systemd unit written to
/etc/systemd/system/ct-ops-agent.service,systemctl enable --now - macOS: launchd plist written to
/Library/LaunchDaemons/dev.carrtech.ct-ops.agent.plist,launchctl load - Windows: binary copied to
C:\Program Files\ct-ops\; service installed viasc.exewith proper Stop/Shutdown signal handling
Agent self-update (agent/internal/updater/updater.go, apps/ingest/internal/handlers/heartbeat.go)
- Ingest compares agent version in each heartbeat against configured latest version
- When a newer version exists, ingest sets
update_available+download_urlinHeartbeatResponse - Agent calls
updater.Update(version, downloadURL): downloads to temp file, atomically replaces running binary, re-execs with same args; cleans up temp on failure
Other fixes in this period
tls_skip_verifyconfig option added to agent for self-signed ingest certs in dev- Fixed: re-inviting a soft-deleted user restores them rather than attempting re-registration
Build state
npm run build— zero errors ✅go build ./agent/...— compiles ✅go build ./apps/ingest/...— compiles ✅
Proto additions (proto/agent/v1/heartbeat.proto + proto/gen/go/agent/v1/messages.go)
- New
CheckDefinitionmessage: check_id, check_type, config_json (string), interval_seconds - New
CheckResultmessage: check_id, status (pass/fail/error), output, duration_ms, ran_at_unix HeartbeatRequestgainscheck_resultsfield;HeartbeatResponsegainschecksfield
checks + check_results schema (apps/web/lib/db/schema/checks.ts)
checkstable: instance/host scoped, check_type, config jsonb, enabled, interval_seconds, soft delete, metadatacheck_resultshypertable: check_id, host_id, instance_id, ran_at (partition key), status, output, duration_ms- Migration
0006_icy_trish_tilby.sql— includes TimescaleDB hypertable + 30-day retention, graceful degradation wrapped inDO $$block
Ingest handler updates (apps/ingest/internal/handlers/heartbeat.go)
- Resolves
hostIDonce at stream start viaGetHostByAgentID(cached for stream lifetime) - Persists each
CheckResultfromHeartbeatRequestintocheck_resultsviaInsertCheckResult - Queries
GetChecksForHost(hostID)on every heartbeat and pushes active check definitions inHeartbeatResponse.Checks
Agent check executor (agent/internal/checks/)
executor.go— manages per-check goroutines with agent-level context (survives stream reconnects); reconciles definitions on each heartbeat; accumulates results for drainport.go— TCP dial with 5s timeoutprocess.go— scans/proc/<pid>/command/proc/<pid>/cmdlineto find process by namehttp.go— GET with 10s timeout, checks expected status code (default 200)heartbeat.goupdated to drain check results into each request and update definitions from each response
Web server actions (apps/web/lib/actions/checks.ts)
getChecks,createCheck,updateCheck,deleteCheck,getCheckResults— all Zod-validated, instance-scoped
Checks tab (apps/web/app/(dashboard)/hosts/[id]/checks-tab.tsx)
- Expandable check rows: name, type badge, status badge, last run time
- Enable/disable toggle, delete button, inline result history
- "Add Check" dialog with type-specific config fields
- shadcn
Select+Switchcomponents added
Build state
npm run build— zero errors ✅go build ./agent/...— compiles ✅go build ./apps/ingest/...— compiles ✅
host_metrics TimescaleDB hypertable (apps/web/lib/db/schema/metrics.ts)
- New table:
id, instance_id, host_id, recorded_at, cpu_percent, memory_percent, disk_percent, uptime_seconds, created_at - Migration
0005_wet_photon.sqlcreates the table, converts it to a TimescaleDB hypertable onrecorded_at, and adds a 30-day retention policy. Wrapped inDO $$block for graceful degradation if TimescaleDB is not available.
Ingest: persist metric rows (apps/ingest/internal/db/queries/metrics.sql.go)
InsertHostMetricByAgentID— inserts intohost_metricsvia subquery onhosts.agent_id; no extra round-trip needed- Called from
processHeartbeaton every heartbeat alongside the existingUpdateHostVitals
Fix newCUID() (apps/ingest/internal/db/queries/hosts.sql.go)
- Replaced
math/randwithcrypto/rand— IDs are now cryptographically random
getHostMetrics server action (apps/web/lib/actions/agents.ts)
getHostMetrics(instanceId, hostId, range: '1h'|'24h'|'7d')— querieshost_metricswith a computed cutoff timestamp, returns rows ordered byrecorded_atasc
Metrics tab on host detail page (apps/web/app/(dashboard)/hosts/[id]/host-detail-client.tsx)
- Fourth tab: Metrics
- Range selector buttons: Last hour / Last 24 hours / Last 7 days
- Recharts
LineChartwith three lines: CPU (blue), Memory (green), Disk (amber); Y-axis 0–100 % - Empty state when no data yet; loading state while fetching
- Refetches every 60 s; only fetches when the Metrics tab is active
Offline period visualisation (apps/web/app/(dashboard)/hosts/[id]/host-detail-client.tsx)
getAgentOfflinePeriods(instanceId, agentId, range)server action — walksagent_status_historyto build{start, end}offline windows within the visible time range; looks back one extra hour to capture periods that started before the window- Chart X-axis domain always extends to
Date.now()via a sentinel null point so time advances even when no new rows are arriving ReferenceArearendered for each offline window — light gray tint (fillOpacity: 0.15), dark readable "Offline" label- Zero-value boundary points injected at each offline start/end so lines visually drop to 0% during the outage and rise again on reconnect
Build state
npm run build— zero errors ✅go build ./apps/ingest/...— compiles ✅
Agent metrics collection (agent/internal/heartbeat/heartbeat.go)
- CPU % — two-sample
/proc/statdelta (first call returns 0 as baseline; accurate from second sample onward) - Memory % —
/proc/meminfo(MemTotal − MemAvailable) / MemTotal × 100 - Disk % —
/proc/mounts+syscall.Statfsper mount; pseudo-filesystems (tmpfs, devtmpfs, cgroup, proc, etc.) excluded - Uptime —
/proc/uptimefirst field converted to seconds - OS version —
/etc/os-releasePRETTY_NAME field - Per-disk inventory —
DiskInfostructs (mount point, device, fs_type, total/used/free bytes, percent_used) sent in every heartbeat - Network interfaces —
NetworkInterfacestructs (name, MAC, IP addresses, is_up) vianet.Interfaces(); loopback excluded - OS / arch sent via heartbeat (
runtime.GOOS,runtime.GOARCH)
Ingest: HeartbeatHandler updates (apps/ingest/internal/handlers/heartbeat.go)
- Persists disks and network_interfaces into
hosts.metadata(JSONB) on every heartbeat - Writes
osandarchback tohoststable (were missing before) - Syncs host status to
offlinewhen the gRPC stream closes - Allows
offlineagents to reconnect and transition back toactive
Host detail page (apps/web/app/(dashboard)/hosts/[id]/)
page.tsx— server component, fetches host viagetHost(instanceId, hostId), 404 if not foundhost-detail-client.tsx— tabbed UI:- Overview tab: CPU / memory / disk gauges (green ≤ 70 %, amber ≤ 90 %, red > 90 %); system info panel (hostname, OS, version, arch, uptime, IPs); agent info panel (status badge, version, agent ID, last heartbeat, registration date)
- Storage tab: per-disk table (mount point, device, filesystem, total/used/free, usage %) from
host.metadata.disks - Network tab: interface table (name, MAC, IPs extracted from CIDR, Up/Down badge) from
host.metadata.network_interfaces
SSE streaming (apps/web/app/api/hosts/[id]/stream/route.ts)
- GET
/api/hosts/{id}/stream— requires valid session and instance membership - Sends initial snapshot immediately on connection
- Polls DB every 5 s and pushes
updateevents as SSE JSON - Sends
errorevent if host not found; closes cleanly on client disconnect (abort signal)
useHostStream hook (apps/web/hooks/use-host-stream.ts)
'use client'hook consumed byHostDetailClient- Opens
EventSourceto/api/hosts/{hostId}/stream - Writes each
updateevent directly into React Query cache key['host', instanceId, hostId] - Closes on unmount; auto-reconnects on remount
Server action added
getHost(instanceId, hostId)inlib/actions/agents.ts— single-host fetch with agent LEFT JOIN
Agent example config (agent/examples/agent.toml)
- Reference config file for operators
Bug fixes
- OS and architecture now correctly populated on
hoststable at registration and heartbeat - Host status synced to
offlinewhen heartbeat stream closes - Offline agents can reconnect without manual intervention
Build state
npm run build— zero errors ✅go build ./agent/...— compiles ✅go build ./apps/ingest/...— compiles ✅- End-to-end smoke test passed — agent registers, approves, heartbeats, web UI updates live ✅
Proto definitions (proto/agent/v1/)
agent.proto—PlatformInfo,AgentInfomessagesregistration.proto—RegisterRequest,RegisterResponseheartbeat.proto—HeartbeatRequest,HeartbeatResponse(bidirectional stream)ingest.proto—IngestServicewithRegister(unary) +Heartbeat(bidi stream)
Generated Go proto stubs (proto/gen/go/agent/v1/)
messages.go— all message types as plain Go structscodec.go— JSON codec registered as "proto" (development stub; replace withmake protooutput)ingest_grpc.go— full gRPC client/server interfaces, stream types,ServiceDescproto/gen/go/go.mod—module github.com/carrtech-dev/ct-ops/proto
Go workspace (go.work at repo root)
- References
./proto/gen/go,./agent,./apps/ingest replacedirectives in eachgo.modfor local proto module
Go agent (agent/)
internal/config/— TOML config +CT_OPS_env overridesinternal/identity/keypair.go— Ed25519 key generation + persistence todata_dirinternal/identity/token.go— agent state (ID + JWT) persistence toagent_state.jsoninternal/grpc/— gRPC connection builder + TLS credentials (server-side, structured for mTLS)internal/registration/registrar.go—RegisterRPC, polls every 30s while pendinginternal/heartbeat/heartbeat.go— bidi stream, reconnects with exponential backoffcmd/agent/main.go— full startup sequence withSIGTERM/SIGINTshutdown
Ingest service (apps/ingest/)
internal/config/— YAML config +INGEST_env overridesinternal/db/—pgxpoolsetup + hand-written queries (agents, hosts, enrolment tokens)internal/auth/jwt.go— RS256 JWT issuance +JWKSHTTP endpointinternal/queue/—Publisherinterface + in-process buffered channel implementationinternal/handlers/register.go— full registration flow (validate token → idempotent check → insert → auto-approve)internal/handlers/heartbeat.go— JWT validation → active check → update vitals → publish to queue → mark offline on closeinternal/grpc/— gRPC server wiring with unary+stream interceptors (logging, panic recovery)internal/tls/— TLS credential builder (structured for mTLS)cmd/ingest/main.go— startup with DB connect, JWT issuer, queue, gRPC + JWKS HTTP serversDockerfile— multi-stage Go build
Drizzle schema additions
lib/db/schema/agents.ts—agents,agent_status_history,agent_enrolment_tokenstableslib/db/schema/hosts.ts—hoststable (stores latest vitals per heartbeat)lib/db/schema/resource_tags.ts— universalkey:valuetag join table (two indexes)lib/db/schema/index.ts— updated with three new exports
Server actions (lib/actions/agents.ts)
listPendingAgents(instanceId)— pending agents for admin approvalapproveAgent(instanceId, agentId, actorId)— sets status active + appends historyrejectAgent(instanceId, agentId, actorId)— sets status revoked + appends historylistHosts(instanceId)— left join with agents, returnsHostWithAgent[]createEnrolmentToken(instanceId, userId, input)— creates token with label/auto-approve/maxUses/expirylistEnrolmentTokens(instanceId)— active tokens for instancerevokeEnrolmentToken(instanceId, tokenId)— soft delete
Web UI
app/(dashboard)/hosts/page.tsx— server component (replaced placeholder, fetches initial data)app/(dashboard)/hosts/hosts-client.tsx— TanStack Query, hosts inventory table, pending agents panel with Approve/Reject, auto-refresh every 30sapp/(dashboard)/settings/agents/page.tsx— admin-only server componentapp/(dashboard)/settings/agents/agents-client.tsx— enrolment token list + create dialog (with full token reveal on creation) + revokecomponents/shared/sidebar.tsx— added "Agent Enrolment" link under Administration
shadcn components added
components/ui/table.tsx— added viashadcn add table
Dependencies added
date-fns ^4.1.0— date formatting in host/agent UI
Build & deploy
Makefile—proto,go-build,go-test,agent,ingest,dev-tls,cleantargets- Root
package.json— addedproto,go:build,go:testscripts docker-compose.single.yml— addedingestservice (port 9443 gRPC, 8080 JWKS)deploy/scripts/gen-dev-tls.sh— generates self-signed cert intodeploy/dev-tls/.gitignore— addeddeploy/dev-tls/,go.work.sum
Build state
npm run build— zero errors ✅go build ./agent/...— compiles ✅go build ./apps/ingest/...— compiles ✅
Auth middleware / proxy
proxy.ts— Next.js 16 renamedmiddleware.tstoproxy.ts; already implemented. Checksbetter-auth.session_tokencookie; unauthenticated requests to protected routes redirect to/login. Full session verification (including instance check) done in server components.
Auth session helper
lib/auth/session.ts—getRequiredSession()fetches Better Auth session + full DB user row, redirects to/loginif unauthenticated
Feature flags
lib/features.ts—hasFeature(tier, feature)checks licence tier against feature mapcomponents/shared/feature-gate.tsx— client component that gates UI behind licence tier; renders fallback/upgrade message if not entitled
Licence validation
lib/licence.ts— offline RS256 JWT validation with bundled dev public key; validates tier, instance, expiry, and signature
Server actions
lib/actions/users.ts—getInstanceUsers,inviteUser(7-day token),updateUserRole,deactivateUser,cancelInvitelib/actions/settings.ts—updateInstanceName,saveLicenceKey(validates via licence.ts, persists tier)
Pages (real UI, not placeholders)
app/(dashboard)/team/page.tsx+TeamClient— member table, role management, invite dialog, pending invites, deactivationapp/(dashboard)/settings/page.tsx+SettingsClient— instance name editor, licence key entry with tier badge and error feedbackapp/(dashboard)/profile/page.tsx+ProfileClient— name editor, password change
Database schema additions
lib/db/schema/invitations.ts— email, role, token, instance/user refs, 7-day expiry, soft deleteinstancestable extended —licenceTier,licenceKey,slug,logouserstable extended —instanceId,role,isActive,twoFactorEnabled
Monorepo
- Turborepo root with
pnpm-workspace.yaml,turbo.json,.gitignore,.npmrc - Full directory skeleton:
apps/,packages/,proto/,deploy/,docs/,agent/,consumers/ - pnpm workspaces with root-level turbo tasks
apps/web (Next.js 16.2.1)
- TypeScript strict mode +
noUncheckedIndexedAccess+noImplicitOverride - Tailwind CSS v4 with shadcn/ui (Radix preset, Nova theme)
- Turbopack enabled for dev (
next dev --turbopack) - Standalone output enabled for Docker
Database
- Drizzle ORM with
postgresdriver - Schema:
instances,users,sessions,accounts,verifications,totp_credential drizzle.config.tspointing atlib/db/schema/index.ts- All tables follow CLAUDE.md conventions (id/createdAt/updatedAt/deletedAt/metadata, soft deletes)
- Migration scripts:
db:generate,db:migrate,db:push,db:studio
Auth
- Better Auth v1 with email/password and TOTP two-factor plugin
lib/auth/index.ts— server-side auth config with Drizzle adapterlib/auth/client.ts— client-side auth hooks (signIn,signOut,signUp,useSession)- API route:
app/api/auth/[...all]/route.ts
Pages
/→ redirects to/login(auth)/login— login form with Zod validation, React Hook Form(auth)/register— register form, posts to Better Auth(setup)/onboarding— instance creation wizard, creates instance and links user assuper_admin(dashboard)/dashboard— overview placeholder
Components
components/shared/sidebar.tsx— shadcn Sidebar with all nav sections (Monitoring, Tooling, Administration)components/shared/topbar.tsx— header with user dropdown + sign outcomponents/shared/query-provider.tsx— TanStack Query providercomponents/ui/— shadcn primitives
Infrastructure
docker-compose.single.yml— TimescaleDB/PostgreSQL + Next.jsapps/web/Dockerfile— multi-stage build (deps → builder → runner), node:22-alpine
- Next.js 16.2.1 — Latest stable at time of session. In Next.js 16,
middleware.tsis renamed toproxy.tsand the export isproxy()instead ofmiddleware(). - Tailwind CSS v4 — create-next-app installs this; shadcn Nova preset works with it
- Better Auth v1 Drizzle adapter —
better-auth/adapters/drizzle(built-in) - Zod v4 — uses
.issuesnot.errorsonZodError - Proto JSON codec stub —
codec.goinproto/gen/go/agent/v1/registers a JSON codec as "proto" so the Go code compiles and works without running protoc. Runmake proto(requires protoc + plugins) to replace with proper protobuf encoding, then removecodec.go. - Agent JWT flow — agent stores JWT as a string; ingest issues RS256 JWT keyed to agent_id. Agent's
HeartbeatRequest.agent_idfield is used to carry the JWT for stream authentication on the first message. - ID generation in ingest — uses a simple random ID generator (
newCUID()inhosts.sql.go). This should be replaced with a proper cuid2 equivalent before production. golang.org/x/sys— removed from agent module (was unused after switching fromunix.Gethostnametoos.Hostname()).
codec.gois a development stub ✅ Resolved — real protoc-generated.pb.gofiles are in placenewCUID()in ingest DB queries now usescrypto/rand✅- Docker Compose does not auto-run migrations on startup ✅ Resolved —
entrypoint.shrunsnode migrate.js && node server.jsbefore starting the web server gen_cuid()SQL function does not exist in PostgreSQL ✅ Resolved —InsertAgentnow generates the ID in Go vianewCUID()directly, removing the failed-query fallback path- mTLS client certificates deferred — TLS builder is structured for it; deliberately deferred
- The
go.work.sumfile is gitignored — developers must rungo work syncafter cloning - CPU % on first heartbeat is always 0 — by design (two-sample baseline); accurate from second heartbeat onward
- Metric retention
add_retention_policynot wired dynamically ✅ Resolved —updateMetricRetentionserver action already callsdrop_retention_policy+add_retention_policyviadb.execute(sql\...`)` - Soft-deleted notifications accumulate indefinitely ✅ Resolved — ingest now hard-purges notifications soft-deleted for more than 90 days
None.
Session 39 — Phase 4 Service Accounts & Identity
Notification data hygiene is complete. Suggested next steps:
- Phase 4: Service Accounts & Identity — schema (
service_accountstable with name, type, owner, expiry); list UI with expiry countdown badges; soft delete - SSH key inventory — track SSH public keys linked to service accounts; fingerprint, last-used-at, expiry
- Expiry tracking + alerting — new alert condition type
service_account_expiry; ingest evaluator + sweeper (mirrors cert_expiry pattern) - LDAP/AD integration — community-tier LDAP sync; imports users and service accounts from directory
Outstanding technical debt (carry forward):
- mTLS client certificates deferred — TLS builder is structured for it
go.work.sumis gitignored — developers must rungo work syncafter cloning
- Monorepo scaffold (Turborepo)
- Next.js app with shadcn/ui + Tailwind
- PostgreSQL + Drizzle + migrations pipeline
- Docker Compose single-node
- CI pipeline (GitHub Actions) — pr-checks.yml: lint, type-check, build, go test
- Better Auth — email/password + TOTP
- Instance + user schema
- Basic RBAC (roles + permissions)
- User management UI
- Feature flag system
- Licence key validation scaffold
- Auth middleware (route protection)
- System health / about page — /settings/system with live agent/cert/alert counts
- Go agent scaffold
- Proto definitions
- gRPC ingest service
- Agent registration + approval flow (UI + ingest handler)
- Heartbeat + online/offline detection
- Host inventory UI
- mTLS identity (deferred)
- Agent self-update mechanism (ingest-signalled, atomic binary swap)
- Agent one-command install (curl | sh, systemd/launchd/SCM)
- Server-hosted binaries with version-aware cache + prewarm
- Automated agent releases (release-please + GitHub Actions)
- Redpanda integration (deferred)
- Metrics consumer (deferred)
- Real-time status indicators (SSE stream + useHostStream hook)
- Integration smoke test (end-to-end agent → UI)
- Check definition system
- Check types — port, process, http (shell/file deferred)
- Ad-hoc agent queries (list_ports, list_services — used in check creation UI)
- TimescaleDB continuous aggregates (host_metrics_hourly + host_metrics_daily)
- Metric retention policies (configurable per-instance in settings, default 30 days)
- Metric graphs (Recharts)
- Alert rule builder (check_status + metric_threshold, per-host + instance-wide)
- Alert state machine (fire/resolve in ingest; acknowledge in web)
- Notification channels (webhook HMAC-SHA256, SMTP, Slack, Telegram, in-app)
- In-app notification bell, dropdown, and /notifications page
- Instance notification settings (roles, opt-out) + per-user opt-out
- Notification bulk actions (select-all, bulk mark read/unread, bulk delete)
- Notification severity pie chart + trend line chart on /notifications page
- Notification charts on host detail Metrics tab (host-scoped)
- Soft-delete on notifications (preserves trend history through deletions)
- Hard-purge soft-deleted notifications after 90 days
- Trend chart time-range selector (1h → 3 months, hourly/daily auto-granularity)
- Alert silencing
- Alert acknowledgement
- Alert history pagination + date/severity filter
- Agent-side cert discovery —
certificatecheck type in agent; returns structured CertificateReport JSON - Certificate parser — leaf + chain parsing in agent; upsert with renewal detection in ingest
- Certificate inventory UI — /certificates list + /certificates/[id] detail with SANs, chain, event timeline
- Expiry alerting —
cert_expiryalert condition; per-cert evaluator + 15-min sweeper in ingest - CSR generation wizard
- Approval workflow
- Internal CA management
- Service account inventory
- Expiry tracking + alerting
- SSH key inventory
- LDAP/AD integration
- Custom script runner — run arbitrary scripts on hosts/groups with streaming output
- Service management — start/stop/restart/status with live service autocomplete
- Ansible automation MVP — feature-gated SSH private-key credential profiles plus host/group
ansible_pingtask runs via the optionalansible-apicontainer - Interactive terminal — WebSocket PTY terminal with persistent panel, tabs, per-user auth
- Chart zoom and smart bucketing — click-drag zoom, adaptive time_bucket, reusable chart components
- Jenkins plugin bundler (port existing)
- Docker image bundler
- Ansible collection bundler
- Terraform provider bundler
- Runbook library
- Scheduled task runner
- SAML 2.0
- OIDC
- Advanced RBAC + resource scoping
- Audit log
- Compliance packs
- White labelling
- Instance-scoped hardening
- Usage metering
- Billing (Stripe)
- Customer portal