Semantic memory system with knowledge graph, spreading activation, embedding-based recall, autonomous dream consolidation, and GPU-accelerated recall for the Hermes Agent.
🌐 Live Website — interactive demos, architecture, and dashboard.
- Semantic Memory Storage: Store memories with auto embedding (FastEmbed ONNX, 1024d)
- Knowledge Graph: Auto-connect related memories via cosine similarity
- Spreading Activation: Explore connected ideas (BFS graph traversal with decay)
- Conflict Detection: Detect and supersede conflicting memories
- Dream Engine: Autonomous consolidation (NREM/REM/Insight phases)
- GPU Recall: CUDA-accelerated cosine similarity (~100ms, torch.matmul)
- SQLite-First: Always works, no external DB needed
- MSSQL Optional: Shared DB for multi-agent setups
cd ~/projects/neural-memory-adapter
bash install.sh # auto-detect hermes-agent
bash install.sh /path # explicit pathThe installer handles everything:
- Python deps (FastEmbed, torch, numpy)
- CUDA detection for GPU recall
- Plugin deployment to hermes-agent
- Database init (SQLite at ~/.neural_memory/memory.db)
- config.yaml setup
Restart hermes after install: hermes gateway restart
Live Dashboard — Knowledge Graph
| Priority | Backend | Model | Speed | Requirements |
|---|---|---|---|---|
| 1st | FastEmbed | intfloat/multilingual-e5-large | ~50ms | pip install fastembed |
| 2nd | sentence-transformers | BAAI/bge-m3 1024d | ~200ms | GPU recommended |
| 3rd | tfidf | — | varies | numpy only |
| 4th | hash | — | instant | nothing |
FastEmbed uses ONNX runtime — no PyTorch conflict, works on CPU. Falls back automatically.
# gpu_recall.py — CUDA cosine similarity
# Loads all embeddings into GPU, does torch.matmul for batch similarity
# ~100ms for 10K memories vs ~500ms CPU
from gpu_recall import GPURecall
engine = GPURecall()
results = engine.recall(query_embedding, all_embeddings, top_k=10)Auto-detects CUDA. Falls back to Python/numpy if no GPU.
flowchart TD
subgraph Store["neural_remember"]
A[User content] --> B[FastEmbed encode]
B --> C[1024d vector]
C --> D[SQLite INSERT]
C --> E[Cosine similarity search]
E --> F[Create connections]
end
subgraph Recall["neural_recall"]
G[User query] --> H[FastEmbed encode]
H --> I{CUDA available?}
I -->|Yes| J[GPU torch.matmul]
I -->|No| K[CPU numpy dot]
J --> L[Top-k results]
K --> L
end
subgraph Think["neural_think"]
M[Source memory] --> N[BFS on connections]
N --> O[Apply decay factor]
O --> P[Ranked activation]
end
subgraph Dream["neural_dream"]
Q[Idle trigger] --> R[NREM replay]
R --> S[REM bridge discovery]
S --> T[Insight communities]
T --> U[Consolidated graph]
end
- SQLite (always):
~/.neural_memory/memory.db— source of truth - Embeddings cache:
~/.neural_memory/models/(auto-downloaded, ~2.2 GB) - GPU cache:
~/.neural_memory/gpu_cache/(embeddings.npy + metadata.pkl) - Access logs:
~/.neural_memory/access_logs/(JSON Lines) - MSSQL (optional): 127.0.0.1/NeuralMemory — multi-agent mirror
-- Core tables
memories (id, content, embedding, category, salience, ...)
connections (source_id, target_id, weight, edge_type)
connection_history (source_id, target_id, last_weight, last_updated)
-- Dream engine
dream_sessions (id, phase, started_at, completed_at, stats)
dream_insights (id, session_id, type, data)
-- Indexes
idx_memories_category ON memories(category)
idx_connections_source ON connections(source_id)
idx_connections_target ON connections(target_id)All settings in ~/.hermes/config.yaml:
memory:
provider: neural
neural:
db_path: ~/.neural_memory/memory.db
embedding_backend: fastembed # auto | fastembed | sentence-transformers | tfidf | hash
prefetch_limit: 10
search_limit: 10
dream:
enabled: true
idle_threshold: 600 # seconds before dream cycle
memory_threshold: 50 # dream after N new memories
mssql: # optional — only if using MSSQL
server: 127.0.0.1
database: NeuralMemory
username: SA
password: 'your_password'
driver: '{ODBC Driver 18 for SQL Server}'When active, these tools are available in Hermes:
| Tool | Description |
|---|---|
neural_remember |
Store a memory (with conflict detection) |
neural_recall |
Search memories by semantic similarity |
neural_think |
Spreading activation from a memory |
neural_graph |
View knowledge graph statistics |
neural_dream |
Force a dream cycle (all/nrem/rem/insight) |
neural_dream_stats |
Dream engine statistics |
Autonomous background memory consolidation (biological sleep inspired):
flowchart LR
subgraph Trigger
T1[Idle 600s] --> D
T2[50 new memories] --> D
T3[Manual / Cron] --> D
end
D[Dream Cycle] --> NREM
D --> REM
D --> INSIGHT
subgraph NREM["Phase 1 — NREM"]
direction TB
N1[Replay 100 recent memories] --> N2[Spreading activation]
N2 --> N3{Connection active?}
N3 -->|Yes| N4[Strengthen +0.05]
N3 -->|No| N5[Weaken -0.01]
N3 -->|Dead <0.05| N6[Prune]
end
subgraph REM["Phase 2 — REM"]
direction TB
R1[Find 50 isolated memories] --> R2[Search similar unconnected]
R2 --> R3[Create bridge connections]
R3 --> R4[weight = similarity × 0.3]
end
subgraph INSIGHT["Phase 3 — Insight"]
direction TB
I1[BFS connected components] --> I2[Identify communities]
I2 --> I3[Find bridge nodes]
I3 --> I4[Store dream_insights]
end
- Automatic: after 600s idle (configurable)
- Automatic: every 50 new memories (configurable)
- Manual:
neural_dreamtool - Standalone:
python python/dream_worker.py --daemon
cd ~/projects/neural-memory-adapter/python
python3 demo.py# Plugin test suite
cd ~/.hermes/hermes-agent/plugins/memory/neural
python3 test_suite.py
# Upside-Down Test Suite — edge cases, corruption, concurrency, SQL injection
cd ~/projects/neural-memory-adapter
python3 tests/test_upside_down.pycd ~/projects/neural-memory-adapter
python3 -c "
import sys; sys.path.insert(0, 'python')
from neural_memory import NeuralMemory
nm = NeuralMemory(db_path='/tmp/test.db', embedding_backend='cpu', use_cpp=False)
mid = nm.remember('test memory', label='smoke')
results = nm.recall('test')
assert len(results) > 0, 'recall failed'
print(f'SMOKE TEST PASS: {len(results)} results')
"Tested on a fresh Debian 12 QEMU/KVM VM — hermes-agent + neural memory only, no jack-in-a-box.
| Property | Value |
|---|---|
| VM | Debian 12, 4 GB RAM, KVM enabled |
| hermes-agent | git clone (itsXactlY fork) |
| neural-memory | git clone + FastEmbed ONNX |
| Embedding | intfloat/multilingual-e5-large (1024d) |
| C++ bridge | Not built (Python fallback) |
All 12 integration tests passed:
| # | Test | Result |
|---|---|---|
| 1 | NeuralMemory standalone (remember/recall/graph) | PASS |
| 2 | Memory Provider (FastEmbed 1024d) | PASS |
| 3 | NeuralMemoryProvider.init | PASS |
| 4 | is_available() | PASS |
| 5 | initialize(session_id) | PASS |
| 6 | get_tool_schemas() → 4 tools | PASS |
| 7 | system_prompt_block() (250 chars) | PASS |
| 8 | handle_tool_call — neural_remember | PASS |
| 9 | handle_tool_call — neural_recall | PASS |
| 10 | handle_tool_call — neural_graph | PASS |
| 11 | prefetch() | PASS |
| 12 | shutdown() | PASS |
- 4 GB RAM minimum — FastEmbed model download (~500 MB). 2 GB = OOM killed.
- HashBackend works as fallback on low-RAM systems (1024d, instant, no deps).
- C++ bridge optional — Python fallback covers all functionality.
- FastEmbed >= 0.5.1 — earlier versions default to CLS embedding (deprecated).
python3-venvrequired on Debian —apt install python3.11-venvif missing.- PEP 668 (Debian) —
pip installneeds venv or--break-system-packages. - Cloud-init delay — 60–90 s on first boot. Don't assume SSH is ready immediately.
- prefetch() returns empty on fresh DB — expected, no prior memories to pre-load.
neural-memory-adapter/
├── install.sh # Installer
├── hermes-plugin/ # Plugin (deployed to hermes-agent)
│ ├── __init__.py # MemoryProvider + tools
│ ├── config.py # Config loader
│ ├── plugin.yaml # Plugin metadata
│ ├── neural_memory.py # Unified Memory class
│ ├── memory_client.py # Main client (NeuralMemory, SQLiteStore)
│ ├── embed_provider.py # Embedding backends (FastEmbed, st, tfidf, hash)
│ ├── gpu_recall.py # CUDA cosine similarity engine
│ ├── dream_engine.py # Dream engine (NREM/REM/Insight)
│ ├── dream_worker.py # Standalone daemon
│ ├── access_logger.py # Recall event logger
│ └── ...
├── python/ # Python source (mirrors hermes-plugin)
│ └── ...
├── src/ # C++ source (optional, legacy)
│ ├── memory/lstm.cpp # LSTM predictor
│ ├── memory/knn.cpp # kNN engine
│ └── memory/hopfield.cpp # Hopfield network
└── README.md
- FastEmbed > sentence-transformers — ONNX runtime, no PyTorch conflict, fast on CPU.
- FastEmbed >= 0.5.1 — earlier versions default to CLS embedding. Pin version or set
add_custom_model. - GPU recall > C++ Bridge — C++ Hopfield had bias issues; GPU matmul is clean.
- numpy before FastEmbed — FastEmbed imports numpy at load time; install order matters.
- Don't force PyTorch — let FastEmbed handle CPU. torch only needed for GPU recall.
- SQLite = Source of Truth — MSSQL is optional. SQLite always works.
- Auto-detect everything — CUDA, backends, venv paths. Minimize config burden.
- 4 tool schemas exposed by NeuralMemoryProvider:
neural_remember,neural_recall,neural_think,neural_graph. (neural_dream/neural_dream_statsare standalone Memory class only.)
See LICENSE.

