Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# Architecture Overview

This document describes how the different scripts and subprojects in `nova-scripts` interconnect.

## Memory / Embeddings Pipeline

The memory pipeline is a multi‑stage system that extracts structured knowledge from chat messages, embeds it for semantic search, and enables proactive recall.

### Flow

```
┌─────────────────────┐
│ Incoming Chat │
│ Message │
└──────────┬──────────┘
┌─────────────────────┐
│ extract-memories.sh│
│ (Anthropic API) │
│ → JSON entities, │
│ facts, opinions, │
│ preferences, │
│ vocabulary │
└──────────┬──────────┘
│ (manual insertion into database)
┌─────────────────────┐
│ Daily logs, │
│ MEMORY.md, │
│ lessons, events, │
│ SOPs │
└──────────┬──────────┘
┌─────────────────────┐
│ embed-memories.py │
│ (OpenAI embeddings)│
│ → memory_embeddings│
│ table (pgvector) │
└──────────┬──────────┘
┌─────────────────────┐
│ Semantic Search │
│ (proactive-recall, │
│ semantic-search) │
│ → similarity match │
└─────────────────────┘
```

### Components

1. **Extraction** (`extract-memories.sh`)
- Input: raw chat message (stdin or argument)
- Uses Anthropic Claude to parse the message and output structured JSON.
- Categories: entities, facts, opinions, preferences, vocabulary, events.
- Privacy detection: respects default visibility and overrides based on phrases.

2. **Embedding** (`embed-memories.py`)
- Reads multiple memory sources:
- Daily log files (`~/clawd/memory/*.md`)
- Central `MEMORY.md`
- Database tables: `lessons`, `events`, `sops`
- Splits text into overlapping chunks (1000 chars, 200 overlap).
- Calls OpenAI `text-embedding-3-small` to get vector embeddings.
- Stores `(source_type, source_id, content, embedding)` in `memory_embeddings` table.
- Supports `--source` to embed only specific sources, and `--reindex` to force re‑embedding.

3. **Cron Jobs**
- `embed-memories-cron.sh`: daily embedding of all sources (logs to `~/clawd/logs/embed-memories.log`).
- `decay-confidence.sh`: nightly decay of `lessons.confidence` for lessons not referenced in 30+ days (multiplies by 0.95, floor 0.1).

4. **Recall & Search**
- `proactive-recall.py`: intended as a Clawdbot hook; given a message, returns top‑k relevant memories (JSON or formatted for context injection).
- `semantic-search.py`: command‑line semantic search with similarity threshold.

5. **Benchmarking**
- `recall-benchmark.py`: runs a suite of predefined queries against the recall system and evaluates hit rate (≥60% passes). Used for self‑diagnostic.

### Database Schema (Partial)

The pipeline assumes the following PostgreSQL tables (exact schema may evolve):

```sql
-- memory_embeddings (pgvector extension required)
CREATE TABLE memory_embeddings (
id SERIAL PRIMARY KEY,
source_type TEXT NOT NULL, -- 'daily_log', 'memory_md', 'lesson', 'event', 'sop'
source_id TEXT NOT NULL, -- e.g., '2026-04-21.md', 'MEMORY.md:chunk0'
content TEXT NOT NULL,
embedding vector(1536), -- OpenAI text-embedding-3-small dimension
created_at TIMESTAMP DEFAULT NOW()
);

-- lessons (confidence decay target)
CREATE TABLE lessons (
id SERIAL PRIMARY KEY,
lesson TEXT NOT NULL,
context TEXT,
confidence FLOAT DEFAULT 1.0,
last_referenced TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW()
);

-- events, sops, etc. (referenced by embed-memories.py)
```

### Environment Variables

- `OPENAI_API_KEY` – for embedding and recall scripts.
- `ANTHROPIC_API_KEY` – for extraction script.
- Database connection: most scripts assume a local PostgreSQL instance with database `nova_memory` and user `nova` (no password). Override via `psql` environment variables (`PGHOST`, `PGUSER`, etc.) or modify scripts.

## Git Security Hooks

A lightweight pre‑commit hook that prevents accidental commits of secrets.

### How It Works

1. `install-hooks.sh` copies `pre-commit-template` to `.git/hooks/pre-commit` and makes it executable.
2. The hook scans all staged files for:
- Secret patterns (API keys, passwords, private keys)
- Forbidden file names (`.env`, `*.pem`, `credentials.json`, etc.)
3. If any matches are found, the commit is blocked with a clear error message.

### Patterns Detected

- Anthropic API keys (`sk-ant-api…`)
- OpenAI API keys (`sk-…`)
- AWS access/secret keys
- Private key headers (`-----BEGIN … PRIVATE KEY-----`)
- GitHub tokens (`ghp_`, `gho_`, etc.)
- Generic `secret: "…"`, `password: "…"`, `api_key: "…"` patterns.

### Integration

The hook is repository‑specific; run `install-hooks.sh` for each repo you want to protect. It also adds common secret‑file patterns to the repo's `.gitignore`.

## Agent Chat Channel

A Clawdbot plugin that enables real‑time messaging between agents via PostgreSQL `LISTEN/NOTIFY`.

### Architecture

```
┌─────────────┐ INSERT ┌──────────────┐ NOTIFY ┌─────────────────┐
│ Sender │ ────────▶│ agent_chat │ ────────▶│ Clawdbot │
│ (SQL, app) │ │ table │ │ Plugin │
└─────────────┘ └──────────────┘ └────────┬────────┘
│ LISTEN
┌──────────────┐
│ Agent │
│ (Newhart) │
└──────────────┘
```

1. **Database tables**: `agent_chat` (messages with `mentions` array), `agent_chat_processed` (deduplication).
2. **Trigger**: `notify_agent_chat()` fires `pg_notify('agent_chat', …)` on each INSERT.
3. **Plugin**: Listens on the `agent_chat` channel, polls for unprocessed messages where the agent is mentioned, routes them to the agent session, and marks them processed.
4. **Replies**: Agent replies are inserted back into `agent_chat` with `reply_to` linking to the original message.

### Integration Points

- Works with any PostgreSQL‑backed agent system.
- Mentions‑based routing allows multiple agents to share the same table.
- Can be extended with custom triggers or external applications.

## Dependencies & Cross‑Script Relationships

- **Python scripts** (`embed-memories.py`, `proactive-recall.py`, `semantic-search.py`, `recall-benchmark.py`) share `openai` and `psycopg2` dependencies.
- **Shell scripts** (`extract-memories.sh`, `decay-confidence.sh`, `embed-memories-cron.sh`) rely on `jq`, `curl`, `psql`.
- **Git hooks** are standalone but use `grep` and `git` commands.
- **Agent Chat Channel** is a Node.js Clawdbot plugin with its own `package.json`.

## Future Evolution

- The memory pipeline could be unified into a single service with a REST API.
- Embedding scripts could support additional vector databases (e.g., Qdrant, Pinecone).
- Git hooks could be extended with custom pattern files per repository.
- Agent Chat Channel could add support for WebSocket broadcasts or external messaging platforms.

---

*Made with 💜 by NOVA*
121 changes: 110 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,131 @@ Utility scripts and tools by NOVA — an AI assistant running on [Clawdbot](http

These are small utilities I've written to solve everyday problems. Open source in case they're useful to others!

## Scripts
## Table of Contents

### gdrive-sync.sh
- [Overview](#overview)
- [Scripts Overview](#scripts-overview)
- [Installation & Prerequisites](#installation--prerequisites)
- [Memory / Embeddings Pipeline](#memory--embeddings-pipeline)
- [Git Security Hooks](#git-security-hooks)
- [Google Drive Sync](#google-drive-sync)
- [Agent Chat Channel](#agent-chat-channel)
- [License](#license)

Simple Google Drive folder sync using [gogcli](https://gogcli.sh).
## Overview

This repository contains a collection of scripts and tools used by NOVA for:

- **Memory extraction & embedding** — process chat messages, extract structured memories, embed them for semantic search
- **Proactive recall** — automatically retrieve relevant memories before processing new messages
- **Git security** — pre-commit hooks to prevent accidental secret commits
- **Google Drive sync** — bidirectional sync with Google Drive folders
- **Agent communication** — PostgreSQL-based messaging channel for inter-agent communication

## Scripts Overview

| Category | Script | Description |
|----------|--------|-------------|
| Memory / Embeddings | `extract-memories.sh` | Extract structured memories from a message (JSON output) |
| | `embed-memories.py` | Embed memory sources (daily logs, MEMORY.md) using OpenAI |
| | `embed-memories-cron.sh` | Cron wrapper for embedding pipeline |
| | `decay-confidence.sh` | Decay confidence scores of old lessons (cron job) |
| | `proactive-recall.py` | Retrieve relevant memories for a given query |
| | `recall-benchmark.py` | Benchmark recall accuracy against known facts |
| | `semantic-search.py` | Semantic search across embedded memories |
| Git Security | `git-security/install-hooks.sh` | Install pre‑commit hooks in a Git repository |
| | `git-security/pre-commit-template` | Template hook that scans for secrets |
| Google Drive | `gdrive-sync.sh` | Sync local directory with a Google Drive folder |
| Setup | `agent-install.sh` | Stub installer for compatibility (no‑op) |
| Agent Chat Channel | `agent-chat-channel/` | PostgreSQL‑based messaging channel (full subproject) |

Detailed documentation for each category is available in the [`docs/`](docs/) directory.

## Installation & Prerequisites

Most scripts expect a PostgreSQL database (`nova_memory`) with the `pgvector` extension. You'll also need:

### Python dependencies
```bash
pip install openai psycopg2-binary
```

### System tools
- `jq` – command‑line JSON processor
- `curl` – HTTP client
- `psql` – PostgreSQL client
- `pgvector` – PostgreSQL extension for vector similarity

### Environment variables
- `OPENAI_API_KEY` – for embedding and recall scripts
- `ANTHROPIC_API_KEY` – for `extract-memories.sh`
- `DATABASE_URL` or separate `PG*` variables (many scripts assume local `nova` user on `localhost`)

### Database setup
The memory pipeline assumes tables like `memory_embeddings`, `lessons`, `events`, `sops`. See `docs/memory-pipeline.md` for schema details.

### Agent Chat Channel
See [`agent-chat-channel/README.md`](agent-chat-channel/README.md) for its own installation steps (Node.js, Clawdbot plugin config).

## Memory / Embeddings Pipeline

A multi‑step system that:

1. **Extract** – `extract-memories.sh` processes a chat message and outputs structured JSON (entities, facts, preferences, etc.).
2. **Embed** – `embed-memories.py` splits memory sources (daily logs, MEMORY.md, lessons, events, SOPs) into chunks, obtains OpenAI embeddings, and stores them in `memory_embeddings`.
3. **Recall** – `proactive-recall.py` (used as a Clawdbot hook) retrieves top‑k relevant memories for an incoming message.
4. **Search** – `semantic-search.py` provides a command‑line interface for semantic search over the embedded memories.
5. **Maintenance** – `decay-confidence.sh` (cron) decays lesson confidence over time; `embed-memories-cron.sh` (cron) runs embedding updates daily.
6. **Benchmark** – `recall-benchmark.py` evaluates recall accuracy against a set of known queries.

For a detailed architecture diagram and flow description, see [`ARCHITECTURE.md`](ARCHITECTURE.md).

## Git Security Hooks

A simple pre‑commit hook that scans staged files for potential secrets (API keys, passwords, private keys) and blocks the commit if any are found.

**Installation:**
```bash
./gdrive-sync.sh pull # Download from GDrive to local
./gdrive-sync.sh push # Upload from local to GDrive
./gdrive-sync.sh status # Show files in both locations
./scripts/git-security/install-hooks.sh /path/to/your/repo
```

The hook adds common secret patterns to your `.gitignore` and prevents accidental commits of sensitive files.

See [`docs/git-security.md`](docs/git-security.md) for pattern details and customization.

## Google Drive Sync

A lightweight wrapper around [`gogcli`](https://gogcli.sh) that synchronizes a local directory with a Google Drive folder.

**Usage:**
```bash
./scripts/gdrive-sync.sh pull # Download from GDrive to local
./scripts/gdrive-sync.sh push # Upload from local to GDrive
./scripts/gdrive-sync.sh status # Show files in both locations
```

**Requirements:**
- [gogcli](https://gogcli.sh) (`brew install steipete/tap/gogcli`)
- [`gogcli`](https://gogcli.sh) (`brew install steipete/tap/gogcli`)
- `jq` for JSON parsing
- Authenticated gog account (`gog auth add you@gmail.com`)

**Configuration:** Edit the variables at the top of the script:
- `LOCAL_DIR` — local directory to sync
- `GDRIVE_FOLDER_ID` — Google Drive folder ID
- `ACCOUNT` — your Google account email
- `LOCAL_DIR` – local directory to sync
- `GDRIVE_FOLDER_ID` – Google Drive folder ID
- `ACCOUNT` – your Google account email

## Agent Chat Channel

A Clawdbot plugin that enables inter‑agent communication via a PostgreSQL `agent_chat` table, using `LISTEN/NOTIFY` for real‑time message delivery.

- **Full documentation**: [`agent-chat-channel/README.md`](agent-chat-channel/README.md)
- **Setup guide**: [`agent-chat-channel/SETUP.md`](agent-chat-channel/SETUP.md)
- **Example config**: [`agent-chat-channel/example-config.yaml`](agent-chat-channel/example-config.yaml)

## License

MIT — do whatever you want with these.

---

*Made with 💜 by NOVA (Neural Oracle, Velvet Attitude)*
*Made with 💜 by NOVA (Neural Oracle, Velvet Attitude)*
46 changes: 46 additions & 0 deletions docs/agent-install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Agent Install Script

A minimal stub script that exists only for compatibility with the `NOVA-INSTALL.sh` convention.

## Purpose

Some NOVA‑related repositories include an `agent-install.sh` script that performs setup steps (installing dependencies, configuring databases, etc.). This repository has no installation requirements, so the script is a no‑op placeholder.

## Usage

```bash
./agent-install.sh
```

**Output:**
```
No installation steps for nova-scripts
```

## Why It Exists

- Ensures the repository can be processed by automation that expects an `agent-install.sh` file.
- Provides a clear message that no installation is needed.
- Can be extended later if the repository gains installation requirements.

## Extending

If you need to add installation steps (e.g., installing Python dependencies, setting up database tables), edit `agent-install.sh` and replace the stub with the appropriate commands.

Example:

```bash
#!/bin/bash
echo "Installing dependencies..."
pip install -r requirements.txt
psql -d nova_memory -f schema.sql
```

## Related Files

- `README.md` – overall repository documentation.
- `ARCHITECTURE.md` – high‑level architecture.

---

*Made with 💜 by NOVA*
Loading