Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: CI

on:
push:
branches: ["main", "review", "review-1"]
pull_request:

env:
PIP_DISABLE_PIP_VERSION_CHECK: "1"

jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Check out
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install toolchain
run: |
pip install uv
uv pip install --system -e .
uv pip install --system ruff pyright typeguard toml-sort yamllint
- name: Lint and format checks
run: make fmt-check
- name: Docs guard
env:
BASE_REF: ${{ github.event.pull_request.base.sha || 'HEAD~1' }}
run: make docs-guard

tests:
runs-on: ubuntu-latest
steps:
- name: Check out
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: |
pip install uv
uv pip install --system -e .
uv pip install --system pytest
- name: Run pytest
run: make test
18 changes: 18 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Agent Instructions

## Documentation Workflow
- After each batch of changes, add a `CHANGELOG.md` entry with an ISO 8601 date/time stamp and developer-facing detail (files, modules, functions, variables, and rationale). Every commit should correspond to a fresh entry.
- Maintain `README.md` as the canonical description of the project; update it whenever behaviour or workflows change. Archive older versions separately when requested.
- Keep the `docs/` wiki and provisioning guides (`SETUP.md`, `ENVIRONMENT_NEEDS.md`) in sync with code updates; add or revise the
relevant page whenever features, modules, or workflows change.
- After each iteration, refresh `ISSUES.md`, `SOT.md`, `PLAN.md`, `RECOMMENDATIONS.md`, `TODO.md`, and related documentation to stay in sync with the codebase.
- Ensure `TODO.md` retains the `Completed`, `Priority Tasks`, and `Recommended Waiting for Approval Tasks` sections, moving finished items under `Completed` at the end of every turn.
- Update `RESUME_NOTES.md` at the end of every turn so the next session starts with accurate context.
- When beginning a turn, review `README.md`, `PROJECT.md`, `PLAN.md`, `RECOMMENDATIONS.md`, `ISSUES.md`, and `SOT.md` to harvest new actionable work. Maintain at least ten quantifiable, prioritised items in the `Priority Tasks` section of `TODO.md`, adding context or links when needed.
- After completing any task, immediately update `TODO.md`, check for the next actionable item, and continue iterating until all unblocked `Priority Tasks` are exhausted for the session.
- Continuously loop through planning and execution: finish a task, document it, surface new follow-ups, and resume implementation so long as environment blockers allow. If extra guidance would improve throughput, extend these instructions proactively.

## Style Guidelines
- Use descriptive Markdown headings starting at level 1 for top-level documents.
- Keep lines to 120 characters or fewer when practical.
- Prefer bullet lists for enumerations instead of inline commas.
126 changes: 126 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Changelog

## [2025-10-17T18:45:00Z]
### Added
- Created a Dockerfile for integration workloads and introduced targeted Compose stacks
under `meshmind/tests/docker/` (Memgraph, Neo4j, Redis, full-stack) alongside a
developer-facing provisioning guide in `SETUP.md` to document service bootstrapping
commands and environment requirements.

### Changed
- Expanded `pyproject.toml` to install optional dependencies (`fastapi`,
`uvicorn[standard]`, `neo4j`, `mgclient`, `redis`) by default and defined extras
(`dev`, `docs`, `testing`); updated the `Makefile` `install` target accordingly and
regenerated setup documentation across `README.md`, `docs/`, `PROJECT.md`, `PLAN.md`,
`SOT.md`, `NEEDED_FOR_TESTING.md`, `ENVIRONMENT_NEEDS.md`, `FINDINGS.md`,
`RECOMMENDATIONS.md`, and `RESUME_NOTES.md` to reference the new workflow and
credentials.
- Reworked the root `docker-compose.yml` to provision Memgraph, Neo4j, and Redis with
health checks and volumes, added Compose variants in `meshmind/tests/docker/`, and
refreshed onboarding materials (`SETUP.md`, `README.md`, `docs/configuration.md`,
`docs/operations.md`, `docs/testing.md`) to call out the new ports, credentials, and
teardown guidance.
- Replaced references to `pymgclient` with `mgclient` throughout dependency notes and
environment files to match the updated driver import.

### Fixed
- Patched `meshmind/cli/admin.py` to import `argparse`, restoring CLI admin command
registration after the module refactor.
- Updated `.github/workflows/ci.yml` to pass `--system` to `uv pip install`, resolving
the "No virtual environment found" failure during lint/test setup.

## [2025-10-16T18:30:00Z]
### Fixed
- Adjusted `meshmind/tests/test_service_interfaces.py::test_memory_service_ingest_and_search` to return a hydrated `Memory`
instance from the monkey-patched `list_memories` stub, ensuring pagination-aware search paths remain asserted while avoiding
empty result sets during verification.

## [2025-10-16T12:00:00Z]
### Added
- Introduced pagination-aware graph access by adding `search_entities` and `count_entities` to every `GraphDriver` implementation, wiring a new `meshmind admin counts` CLI subcommand and REST `/memories/counts` route through `MemoryManager`, `MemoryService`, and the MeshMind client.
- Added `scripts/check_docs_sync.py` plus a Makefile target, CI step, and pytest coverage to guard documentation updates whenever code under mapped modules changes.

### Changed
- Extended `MemoryManager.list_memories`, MeshMind client helpers, retrieval graph wrappers, and service adapters to forward `offset`, `limit`, and `query` hints, delegating filtering to the active driver before in-memory scoring.
- Updated examples and tests (`meshmind/tests/test_db_drivers.py`, `test_service_interfaces.py`, `test_graph_retrieval.py`, `test_cli_admin.py`, `test_client.py`, `test_docs_guard.py`) to cover pagination, counts, and driver-side search semantics.

### Documentation
- Refreshed `README.md`, `PROJECT.md`, `PLAN.md`, `RECOMMENDATIONS.md`, `ISSUES.md`, `SOT.md`, `FINDINGS.md`, `AGENTS.md`, `TODO.md`, and the developer wiki (`docs/api.md`, `docs/development.md`, `docs/operations.md`, `docs/persistence.md`, `docs/retrieval.md`, `docs/troubleshooting.md`) to describe pagination, counts, docs-guard workflows, and updated service interfaces.
## [2025-10-15T15:30:00Z]
### Added
- Created a developer wiki under `docs/` covering architecture, pipelines, persistence, retrieval, configuration, testing, operations, telemetry, and development workflows so code changes stay synchronized with reference material.
- Authored `ENVIRONMENT_NEEDS.md` to request optional dependency installs and external services, plus `RESUME_NOTES.md` for session-to-session continuity.

### Changed
- Expanded the `GraphDriver` contract to accept namespace and entity-label filters when listing entities, updating the in-memory, SQLite, Neo4j, and Memgraph drivers to push filtering into their native query layers.
- Propagated the new filtering through `MemoryManager`, `MeshMind.list_memories`, graph-backed retrieval wrappers, and service interfaces (REST/gRPC), ensuring hybrid searches hydrate only the required entity types.
- Updated tests (`meshmind/tests/test_graph_retrieval.py`, `test_pipeline_preprocess_store.py`, `test_service_interfaces.py`) to cover entity-label filtering across client, REST, and gRPC paths.

### Documentation
- Refreshed `README.md`, `PROJECT.md`, `PLAN.md`, `RECOMMENDATIONS.md`, `ISSUES.md`, `SOT.md`, `DISCREPANCIES.md`, `FINDINGS.md`, `TODO.md`, and `AGENTS.md` to describe the new driver filtering, documentation workflow, environment checklist, and wiki requirements.

## [2025-02-15T00:45:00Z]
### Added
- Introduced `meshmind/retrieval/graph.py` with hybrid/vector/regex/exact/BM25/fuzzy wrappers that hydrate candidates from the active `GraphDriver` before delegating to existing scorers, plus `meshmind/tests/test_graph_retrieval.py` to verify namespace filtering and hybrid integration.
- Added `meshmind/cli/admin.py` and wired `meshmind/cli/__main__.py` to expose `admin` subcommands for predicate management, maintenance telemetry, and graph connectivity checks; created `meshmind/tests/test_cli_admin.py` to cover the new flows.
- Created `meshmind/tests/test_neo4j_driver.py` and a `Neo4jGraphDriver.verify_connectivity` helper to exercise driver-level sanity checks without a live cluster.
- Logged importance score distributions via `meshmind/pipeline/preprocess.summarize_importance` so telemetry captures mean/stddev/recency metrics after scoring.

### Changed
- Updated `MeshMind` search helpers (`meshmind/client.py`) to auto-load memories from the configured driver when `memories` is `None`, reusing the new graph-backed wrappers.
- Reworked `meshmind/pipeline/consolidate.py` to return a `ConsolidationPlan` with batch/backoff thresholds and skipped-group tracking; `meshmind/tasks/scheduled.consolidate_task` now emits skip counts and returns a structured summary.
- Tuned Python compatibility metadata to `>=3.11,<3.13` in `pyproject.toml` and refreshed docs (`README.md`, `NEEDED_FOR_TESTING.md`, `SOT.md`) accordingly.
- Enhanced `meshmind/pipeline/preprocess.py` to emit telemetry gauges for importance scoring and added `meshmind/tests/test_pipeline_preprocess_store.py::test_score_importance_records_metrics`.
- Expanded retrieval, CLI, and driver test coverage (`meshmind/tests/test_retrieval.py`, `meshmind/tests/test_tasks_scheduled.py`) to account for graph-backed defaults and new return types.

### Documentation
- Updated `README.md`, `PROJECT.md`, `PLAN.md`, `SOT.md`, `FINDINGS.md`, `DISCREPANCIES.md`, `RECOMMENDATIONS.md`, `NEEDED_FOR_TESTING.md`, `ISSUES.md`, and `TODO.md` to describe graph-backed retrieval wrappers, CLI admin tooling, consolidation backoff behaviour, telemetry metrics, and revised Python support.
- Copied the refreshed README guidance into `README_OLD.md` as an archival reference while keeping `README.md` as the primary source.

## [2025-10-14T14:57:47Z]
### Added
- Introduced `meshmind/_compat/pydantic.py` to emulate `BaseModel`, `Field`, and `ValidationError` when Pydantic is unavailable, enabling tests to run in constrained environments.
- Added `meshmind/testing/fakes.py` with `FakeMemgraphDriver`, `FakeRedisBroker`, and `FakeEmbeddingEncoder`, plus a package export and dedicated pytest coverage (`meshmind/tests/test_db_drivers.py`, `meshmind/tests/test_tasks_scheduled.py`).
- Created heuristics-focused test cases for consolidation outcomes, maintenance tasks, and the revised retrieval dispatcher to guarantee behaviour without external services.

### Changed
- Replaced the constant importance assignment in `meshmind/pipeline/preprocess.score_importance` with a heuristic that factors token diversity, recency, metadata richness, and embedding magnitude.
- Rebuilt `meshmind/pipeline/consolidate` around a `ConsolidationOutcome` dataclass that merges metadata, averages embeddings, and surfaces removal IDs; `meshmind/tasks/scheduled.consolidate_task` now applies updates and deletes duplicates lazily via `_get_manager`/`_reset_manager` helpers.
- Hardened Celery maintenance tasks by logging driver initialization failures, tracking update counts, and returning deterministic totals; compression counts now reflect the number of persisted updates.
- Updated `meshmind/core/similarity`, `meshmind/retrieval/bm25`, and `meshmind/retrieval/fuzzy` with pure-Python fallbacks so numpy, scikit-learn, and rapidfuzz remain optional.
- Adjusted `meshmind/pipeline/extract.extract_memories` to defer `openai` imports until a default client is required, unblocking DummyLLM-driven tests.
- Reworked `meshmind/retrieval/search.search` to rerank the original (filtered) candidate ordering, prepend reranked results, and append hybrid-sorted fallbacks, preventing index drift when rerankers return relative positions.
- Normalised SQLite entity hydration in `meshmind/db/sqlite_driver._row_to_dict` so JSON metadata is decoded only when stored as strings.
- Refreshed pytest fixtures (`meshmind/tests/conftest.py`, `meshmind/tests/test_pipeline_preprocess_store.py`) to use deterministic encoders and driver doubles, ensuring CRUD and retrieval suites run without live services.

### Documentation
- Promoted `README.md` as the single source of truth (archiving the previous copy in `README_OLD.md`) and documented the new heuristics, compatibility shims, and test doubles.
- Updated `NEEDED_FOR_TESTING.md` with notes about the compatibility layer, optional dependencies, and fake drivers.
- Reconciled `PROJECT.md`, `ISSUES.md`, `PLAN.md`, `SOT.md`, `RECOMMENDATIONS.md`, `DISCREPANCIES.md`, `FINDINGS.md`, `TODO.md`, and `CHANGELOG.md` to capture the new persistence behaviour, heuristics, fallbacks, and remaining roadmap items.

## [Unreleased] - 2025-02-14
### Added
- Configurable graph driver factory with in-memory, SQLite, Memgraph, and optional Neo4j implementations plus supporting tests.
- REST and gRPC service layers (with FastAPI stub fallback) for ingestion and retrieval, including coverage in the test suite.
- Observability utilities that collect metrics and structured logs across pipelines and scheduled Celery tasks.
- Docker Compose definition provisioning Memgraph, Redis, and a Celery worker for local development.
- Vector-only, regex, exact-match, and optional LLM rerank retrieval helpers with reranker utilities and exports.
- MeshMind client wrappers for hybrid, vector, regex, and exact searches plus driver accessors.
- Example script demonstrating triplet storage and diverse retrieval flows.
- Pytest fixtures for encoder and memory factories alongside new retrieval tests that avoid external services.
- Makefile targets for linting, formatting, type checks, and tests, plus a GitHub Actions workflow running lint and pytest.
- README_LATEST.md capturing the current implementation and CHANGELOG.md for release notes.

### Changed
- Settings now surface `GRAPH_BACKEND`, Neo4j, and SQLite options while README/NEEDED_FOR_TESTING document the expanded setup.
- README, README_LATEST, and NEW_README were consolidated so the promoted README reflects current behaviour.
- PROJECT, PLAN, SOT, FINDINGS, DISCREPANCIES, ISSUES, RECOMMENDATIONS, and TODO were refreshed to capture new capabilities and
re-homed backlog items under a "Later" section.
- Updated `SearchConfig` to support rerank models and refreshed MeshMind documentation across PROJECT, PLAN, SOT, FINDINGS,
DISCREPANCIES, RECOMMENDATIONS, ISSUES, TODO, and NEEDED_FOR_TESTING files.
- Revised `meshmind.retrieval.search` to apply filters centrally, expose new search helpers, and integrate reranking.
- Exposed graph driver access on MeshMind and refreshed retrieval-facing examples and docs.

### Fixed
- Example ingestion script now uses MeshMind APIs correctly and illustrates relationship persistence.
- Tests rely on fixtures rather than deprecated hooks, improving portability across environments without Memgraph/OpenAI.
53 changes: 53 additions & 0 deletions DISCREPANCIES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# README vs Implementation Discrepancies

## Overview
- The legacy README has been superseded by `README.md`, which now reflects the implemented feature set.
- The current codebase delivers extraction, preprocessing, triplet persistence, CRUD helpers, and expanded retrieval strategies
that were missing when the README was written.
- Remaining gaps primarily involve pushing retrieval workloads into the graph backend, exporting observability to external sinks, and automated infrastructure provisioning.

## API Surface
- ✅ `MeshMind` now exposes CRUD helpers (`create_memory`, `update_memory`, `delete_memory`, `list_memories`, triplet helpers)
that the README referenced implicitly.
- ✅ Triplet storage routes through `store_triplets` and `MemoryManager.add_triplet`, calling `GraphDriver.upsert_edge`.
- ⚠️ The README still references `register_entity`, `register_allowed_predicates`, and `add_predicate`; predicate management is
handled automatically but there is no public API matching those method names.
- ⚠️ README snippets showing `mesh_mind.store_memory(memory)` should be updated to call `store_memories([memory])` or the new
CRUD helpers.

## Retrieval Capabilities
- ✅ Vector-only, regex, exact-match, hybrid, BM25, fuzzy, and optional LLM rerank searches exist in `meshmind.retrieval.search`
and are surfaced through `MeshMind` helpers.
- ⚠️ README implies retrieval queries the graph directly. Search helpers now fetch candidates from the configured driver when no
list is supplied but still score results in Python; Memgraph/Neo4j-native search remains future work.
- ⚠️ Named helpers like `search_facts` or `search_procedures` never existed; the README should reference the dispatcher plus
specialized helpers now available.

## Data & Relationship Modeling
- ✅ Predicates are persisted automatically when storing triplets and tracked in `PredicateRegistry`.
- ⚠️ README examples that look up subjects/objects by name still do not match the implementation, which expects UUIDs. Add
documentation explaining how to resolve names to UUIDs before storing edges.
- ⚠️ Consolidation and expiry run via Celery jobs; README narratives should highlight that heuristics require further validation even though persistence is now wired up.

## Configuration & Dependencies
- ✅ `README.md` and `ENVIRONMENT_NEEDS.md` document required environment variables, dependency guards, and setup steps.
- ⚠️ README still omits optional tooling now required by the Makefile/CI (ruff, pyright, typeguard, toml-sort, yamllint);
highlight these prerequisites more prominently.
- ✅ Python version support in `pyproject.toml` now pins `>=3.11,<3.13`, matching the dependency landscape documented in the README.

## Example Code Paths
- ✅ Updated example scripts demonstrate extraction, triplet creation, and multiple retrieval strategies.
- ⚠️ Legacy README code that instantiates custom Pydantic entities remains inaccurate; extraction returns `Memory` objects and
validates `entity_label` names only.
- ⚠️ Search examples should be updated to show the new helper functions and optional rerank usage instead of nonexistent
`search_facts`/`search_procedures` calls.

## Tooling & Operations
- ✅ Makefile and CI workflows now exist, aligning with README promises about automation once the README is refreshed.
- ✅ Docker Compose now provisions Memgraph, Redis, and a Celery worker; README sections should highlight the workflow and
caveats for environments lacking container tooling.
- ⚠️ Celery tasks still depend on optional infrastructure; README should clarify that heuristics and scheduling need production hardening even though persistence now works.

## Documentation State
- Continue promoting `README.md` as the authoritative guide and propagate updates to supporting docs
(`SOT.md`, `PLAN.md`, `ENVIRONMENT_NEEDS.md`, `docs/`).
24 changes: 24 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
FROM python:3.11-slim

ENV PIP_NO_CACHE_DIR=1 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1

WORKDIR /app

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
build-essential \
cmake \
libssl-dev \
libkrb5-dev \
curl \
git \
&& rm -rf /var/lib/apt/lists/*

COPY . /app

RUN pip install uv \
&& uv pip install --system -e .[dev,docs,testing]

CMD ["bash"]
Loading
Loading