The engine layer for Mnemo — an on-device system that records what you need (from your screen, mic, clipboard, files, or a deliberate "remember this") and expresses it back to you in whatever form you can receive: voice, non-speech sound, screen, haptics, large type, or plain-language simplification. Everything stays on the device. The model that does the remembering and the reasoning is Gemma 4, running on-device.
Phases 1–2 + all of Phase 3 except the MLX text generator have landed. It is not the product yet. It is the architecture, proven and tested: the model types, the
MemoryStoreactor (in-memory and a SQLite-backed on-disk store), a flat-cosineVectorIndex+ a real on-deviceEmbeddingService(NLEmbeddingService, via Apple'sNaturalLanguage— no SPM dep, no download) alongside the stub, aRecallEngineskeleton +ContextBudgeter+ the frozenRecallFunctionContract+ a tolerantFunctionCallParser+ aRecallPromptBuilder+GemmaReasoningOverFunctionCalls(a completeGemmaReasoningbuilt from aFunctionCallGenerating— the only thing left external is the model's text-generation), a stubGemmaService+AbstentionGate+FunctionCallOrchestrator-pattern, theSummaryEnginerollup job (closed-bucket daily → weekly → monthly → yearly summaries), theExpressionRouter(the adaptive heart — an explicit precedence lattice) + 6 value-emitting adapters, aClockabstraction, and a thinMnemoCoordinator. Pure logic + on-device SQLite + system NLP — plusBlackoutPolicy(Phase 4's pure decision half, landed early). What's not wired here: the on-device transformer runtime (Gemma 4 via MLX — that's the separateMnemoEngineMLXpackage, written but Xcode-built) and the platform capture APIs (Phase 4+ — Xcode, GUI, entitlements). Compiles with CommandLineTools alone; 94 swift-testing tests pass.See
docs/mnemo-implementation-plan.mdfor the full plan, the 3-critic loop-validation (§10), and the honest deployable-state assessment.
Sources/MnemoEngine/
MnemoEngine.swift — umbrella + version
Support/Clock.swift — TimeProvider (SystemClock / FixedClock) — injectable time
Models/
CaptureEvent.swift — CaptureEvent (embedding/entities/structure OPTIONAL — deferred enrichment), CaptureSource, StructureTag, EntityMention, SensitivityTag, BlobRef, AppContext; SHA-256 content-fingerprint dedup
RecallTypes.swift — RecallQuery, RecallResult, Urgency, CitationRef (with availability — pruning-degradation contract)
ExpressionTypes.swift — ExpressionModality, UserProfile (rawRetention defaults .textOnly; pruneAfterDays defaults 30), AccessibilityNeed (→ required modality floors), RawRetentionPolicy
Summary.swift — DailySummary, RollupSummary (weekly/monthly/yearly), SummaryTier
Memory/
MemoryStore.swift — MemoryStore protocol (Actor) + InMemoryMemoryStore actor (dedup, retrieval, tombstone deletes)
VectorIndex.swift — VectorIndex protocol + FlatCosineVectorIndex + a SqliteVecVectorIndex stub (seam proven)
EmbeddingService.swift — EmbeddingService protocol + StubEmbeddingService (deterministic hashed-bag-of-words; real MiniLM-class model is Phase 3)
EventEnricher.swift — EventEnriching protocol + StubEventEnricher (embedding is NEVER in the capture write path — deferred pass)
Reason/
AbstentionGate.swift — "recall, don't advise" — medical/legal/financial/immigration/emergency advice-seeking → flag_for_human
GemmaService.swift — GemmaReasoning protocol + StubGemmaService (.real wires Gemma 4 E4B via mlx-swift-lm in Phase 3)
Recall/
RecallFunctionContract.swift — the frozen 5-function contract (recall_events / summarize_period / find_entity_mentions / set_reminder / flag_for_human)
ContextBudgeter.swift — packs the recall context into the model window: top-K raw retrieval (any age) + temporal summary scaffold, overhead budgeted first, slack rolls between them; token-counting injected → pure & deterministic
RecallEngine.swift — query → abstention check → embed → retrieve → budget → Gemma → RecallResult
Express/
ExpressionPlans.swift — VoicePlan / EarconPlan / ScreenPresentation / HapticPattern / LargeTypePresentation / SimplificationRequest; ExpressionPlan union; RoutingDecision
ExpressionAdapter.swift — ExpressionAdapter protocol + 6 adapters (each emits a value; the app performs side effects) + DefaultAdapters
ExpressionRouter.swift — THE ADAPTIVE HEART — the explicit precedence lattice (accessibility floor → profile → query override → urgency escalation → quiet-hours suppression → suggestedModality narrowing → alert-threshold suppression)
MnemoCoordinator.swift — RecallService (query → routed expression); CaptureControlling protocol + NoopCaptureControl (real providers are Phase 4); the thin MnemoCoordinator (ingest → deferred enrich; ask)
Tests/MnemoEngineTests/
ExpressionRouterTests.swift — exhausts the precedence lattice (the most-tested thing)
AbstentionGateTests.swift — recall-allowed vs advice-abstained, per domain
ContextBudgeterTests.swift — overhead-first, capping, slack-rolling, finer-tiers-preferred, ordering
MemoryAndRecallTests.swift — dedup, deferred enrichment, recall happy path, recall defers advice, recall-empty, coordinator end to end
Sources/MnemoEngine/Memory/
SQLiteSupport.swift — a deliberately tiny wrapper over the system `SQLite3` C module (no SPM dependency — the dependency gate stays clean): open + WAL pragmas + prepared statements + transactions + VACUUM
StorageHardening.swift — invariant #5: `isExcludedFromBackup`, the `.metadata_never_index` Spotlight marker, iOS `FileProtectionType` — hardens whatever path the app layer hands it (the *location* is the app's job, Phase 4/5)
SQLiteMemoryStore.swift — the on-disk `MemoryStore` (one row per event: indexed scalars + the `Codable` event as a JSON blob; the flat-cosine vector index is held in memory, rebuilt from disk on open). Deletes are REAL: payload columns nulled + `deleted = 1`, leaving an `(id, timestamp)` tombstone; `compact()` runs `VACUUM`
SummaryStore.swift — `SummaryStore` protocol + `InMemorySummaryStore` + `SQLiteSummaryStore` (its own hardened `summaries.sqlite3`)
SummaryEngine.swift — the rollup job: walks CLOSED day/week/month/year buckets (ISO-8601, UTC), writes a summary for any bucket that lacks one, NEVER mutates a `CaptureEvent`. Idempotent.
Sources/MnemoEngine/Reason/GemmaService.swift — `GemmaReasoning` gains `summarizeDay` / `summarizeRollup` (the stub is deterministic; the real model is Phase 3)
Tests/MnemoEngineTests/
SQLiteMemoryStoreTests.swift — append/dedup/enrich+retrieve/range/real-delete+reopen/persistence+index-rebuild/compact
StorageHardeningTests.swift — directory excluded-from-backup + Spotlight marker; the store hardens its own paths; idempotent (this is the path-attributes CI test the plan §7 calls for, landing early)
SummaryEngineTests.swift — closed-buckets-only; idempotent; never mutates events; monthly rollup after the month closes; SQLite summary store
GemmaService.real = a text generator (Gemma 4 E4B-it 4-bit, on-device via MLX) + the prompt builder, the function-call parser, the answer extractor, and a real embedder. All of that except the text generator ships here; the generator needs MLX (Apple-Silicon-only, Xcode-only to build, ~4 GB of weights) and lives in a separate MnemoEngineMLX target / the app layer, mirroring He Was Socrates's #if canImport(MLXLLM) split.
Sources/MnemoEngine/Memory/NLEmbeddingService.swift — a REAL on-device EmbeddingService backed by Apple's NaturalLanguage embeddings (a system framework — no SPM dep, no model download; the language assets ship with the OS): sentence embedding when available (512-dim for English), else averaged word embeddings, else the hashed-bag-of-words stub. L2-normalized. Not the engine default (the stub is — deterministic for tests); the app wires `NLEmbeddingService(locale:)`.
Sources/MnemoEngine/Recall/FunctionCallParser.swift — model text → a typed RecallFunctionCall against RecallFunctionContract's 5 functions. Tolerant of ```fences```, <tool_call> tags, prose around the JSON, alternate key names (`function`/`parameters`/`q`/…), {start,end} vs [a,b] vs "yyyy-MM" ranges, ISO-8601 dates; a `}` inside a JSON string doesn't fool the brace matcher. Genuinely-unparseable output → `.unparseable(rawText:)` so the caller falls back.
Sources/MnemoEngine/Recall/RecallPromptBuilder.swift — assembles the strings the model completes: a recall prompt (system preamble + numbered context events/summaries + the question + an answer-JSON instruction + the `flag_for_human`/`set_reminder` escape hatches), a simplify prompt, day/rollup summary prompts. Pure.
Sources/MnemoEngine/Reason/FunctionCallGenerating.swift — the seam: `FunctionCallGenerating { func generate(prompt:maxTokens:) async throws -> String }` (whoever provides MLX implements it) + the verified model identity (HF `mlx-community/gemma-4-e4b-it-4bit`, `LLMRegistry.gemma4_e4b_it_4bit`, mlx-swift-lm ≥ 3.31.3) + `UnavailableFunctionCallGenerator` (the dependency-free engine ships no runtime — throws)
Sources/MnemoEngine/Reason/GemmaReasoningOverFunctionCalls.swift — a COMPLETE `GemmaReasoning` (recall / simplify / summarizeDay / summarizeRollup) built from a `FunctionCallGenerating` + the prompt builder + the parser + the answer-JSON extractor. With this in place, "wire the real model" = "provide one `generate` method". If the generator throws (no runtime, model not staged, OOM), every method degrades to the deterministic `StubGemmaService` — a recall turn never hard-fails.
Tests/MnemoEngineTests/ — FunctionCallParserTests (all 5 functions; fenced/tagged/prose; alt keys; inlined args; brace-in-string; the 3 range shapes; missing-arg/unknown-fn → unparseable), NLEmbeddingServiceTests (shape/L2-norm/determinism; similar > dissimilar; retrieval end-to-end; unsupported-language fallback), RecallPromptBuilderTests, GemmaReasoningOverFunctionCallsTests (answer-JSON/fenced/flag_for_human/set_reminder/tool-already-run/prose-only/generator-throws-→-fallback/simplify/summarizeDay; end-to-end through RecallEngine with a fake generator; parseAnswerJSON units)
make build # swift build — builds with CommandLineTools alone (no Xcode required)
make test # swift test — 94 swift-testing tests
make lint # swift-format lint -r Sources Tests
make ci-local # build + test + lint — the same gates CI runsCI (.github/workflows/ci.yml, macos-15): build-and-test · swift-format lint · gitleaks secret scan. This repo was extracted from Two-Weeks-Team/he-was-socrates (packages/MnemoEngine/); Mnemo is a distinct product that reuses that POC's on-device substrate. See CLAUDE.md for working conventions and docs/mnemo-implementation-plan.md for the validated plan (§1 invariants, §10 binding revisions).
| Phase | What | Effort (raw eng) | Realistic calendar |
|---|---|---|---|
| 1 ✅ | the engine core | ~1.5 sessions | done |
| 2 ◑ | SQLiteMemoryStore (tombstone deletes, outside all backup/sync/Spotlight scopes + a CI test for the path attributes), the SummaryEngine rollup job — done. Still pending: a real on-disk vector index (the flat index is rebuilt in memory on open today — fine to ~10⁶ events), at-rest encryption (today: FileVault + iOS FileProtection; the Mnemo-vault key is Phase 5) |
days | landed; encryption + on-disk ANN are follow-ups |
| 3 ◑ | FunctionCallParser + RecallPromptBuilder + GemmaReasoningOverFunctionCalls (a complete GemmaReasoning lacking only the generator) + NLEmbeddingService (a real on-device EmbeddingService via NaturalLanguage — no dep, no download) + the verified model identity — done, here. The MnemoEngineMLX package (mlx/) — MLXGemmaGenerator: FunctionCallGenerating over LLMRegistry.gemma4_e4b_it_4bit — is written (mirrors He Was Socrates's GemmaService.real, #if canImport(MLXLLM)-guarded) but not yet built/verified (needs Xcode + ~4 GB weights — the authoring env had no Xcode). Optionally: a stronger embedding model (~25 MB) behind the same protocol |
reconcile + build + verify on Xcode | the engine side is done; the MLX side needs one build pass on a Mac with Xcode |
| 4 ◔ | The pure decision half — BlackoutPolicy (global pause / absolute & recurring time windows / app-bundle blocklist, with reasons; Sources/MnemoEngine/Capture/) — done, here. The platform side: ScreenCaptureKit (+ the screen-recording entitlement, the TCC flow, a non-dismissible indicator while the mic is live), AVAudioEngine+VAD+STT (audio default = push-to-capture), clipboard (read-only), files, manual — needs Xcode + a GUI session + entitlement provisioning |
1–2 weeks (the platform side) | ~1–2 months with the privacy UX done correctly |
| 5 | the macOS app: the Mnemo mode/window, the query bar, the screen-presentation renderer, the haptic player (degraded on macOS), the timeline view, onboarding (the affirmative privacy framing + the accessibility-needs guided setup + the recording-legality note at audio-enable time + a separate Mnemo vault credential + a panic-wipe reachable without unlocking the app), the privacy-controls UI | 1–2 weeks | ~1–2 months |
| 6 | iOS (where the haptic adapter is strong): iOS app + iOS capture (ReplayKit/RPScreenRecorder, Core Haptics), the iOS LLM runtime |
2–3 weeks | later |
| 7 | hardening + deploy: privacy review, the dependency gate (CI fails on any analytics/crash-reporting SDK and on unpinned deps), the CI network-entitlement gate, the path-attributes CI test (isExcludedFromBackup, .metadata_never_index), performance (thermal, battery, idle-scheduled rollups — may force architecture changes), the accessibility audit (a tool for accessibility must itself be accessible — VoiceOver etc.), notarization / App Store review (a bundled-LLM screen-recorder gets extra scrutiny — may be Developer-ID-only) |
weeks | months |
Honest read: engine prototype (Phases 1–3) ~ weeks; a runnable macOS demo (through Phase 5) ~ 2–3 months for a small team; "actually deployable" (Phase 7 done) ~ 5–9 months.
Reconcile + build + verify MnemoEngineMLX. The package now exists at mlx/ — mlx/Package.swift (path-depends on MnemoEngine + mlx-swift-lm ≥ 3.31.3) and mlx/Sources/MnemoEngineMLX/MLXGemmaGenerator.swift (an actor conforming to the engine's FunctionCallGenerating, #if canImport(MLXLLM)-guarded so it compiles even without MLX). Its MLX call surface mirrors He Was Socrates's GemmaService.real but has not been built against the real mlx-swift-lm here (no Xcode in the authoring environment). On a machine with Xcode + Apple Silicon:
cd mlx && swift build— resolvesmlx-swift-lm(a sizeable download), needs the Metal toolchain.- Reconcile any API drift (
loadContainersignature,ChatSession.init,GenerateParameters,streamResponse(to:)) against the pinned version. He Was Socrates is the reference. - Verify: feed
RecallPromptBuilder().recallPrompt(...)toMLXGemmaGenerator().generate(...), parse withFunctionCallParser.parse(...); thenGemmaReasoningOverFunctionCalls(generator: MLXGemmaGenerator())is a fullGemmaReasoning. - Add a CI job — mirror He Was Socrates (
macos-15/macos-26+setup-xcode); the repo's main CI deliberately stays MLX-free.
Budget for the ~4 GB first-run model download + the E4B latency/thermal questions the plan §7 flags. (A stronger EmbeddingService — an all-MiniLM-class model, ~25 MB — could also live in mlx/; NLEmbeddingService covers the dependency-free case so it's optional.)
After that, Phase 4 — real macOS capture: ScreenCaptureKit + the screen-recording entitlement + the TCC flow + the non-dismissible mic indicator, AVAudioEngine+VAD+STT (push-to-capture default), clipboard (read-only), files, manual. The pure decision half — BlackoutPolicy (global pause / absolute & recurring time windows / app-bundle blocklist, with reasons) — already landed (Sources/MnemoEngine/Capture/); the real capture providers consult it before recording a frame. Then the macOS app (Phase 5), iOS (Phase 6), and the hardening/distribution pass (Phase 7). Those are platform work measured in months, not a session — they need Xcode, GUI sessions, entitlement provisioning, and (for Phase 7) an Apple Developer account — see the phase table above and the plan §6/§10.
Smaller follow-ups that round out Phase 2: an on-disk ANN vector index behind the VectorIndex protocol (the flat index is fine to ~10⁶ events but is rebuilt in memory on open today), and at-rest encryption integration (the engine sets isExcludedFromBackup + iOS FileProtection; the Mnemo-vault Keychain key is the app layer's, Phase 5).