cpdata · cpdata · Oct 14, 2025
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,48 @@
+name: CI
+
+on:
+  push:
+    branches: ["main", "review", "review-1"]
+  pull_request:
+
+env:
+  PIP_DISABLE_PIP_VERSION_CHECK: "1"
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out
+        uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - name: Install toolchain
+        run: |
+          pip install uv
+          uv pip install --system -e .
+          uv pip install --system ruff pyright typeguard toml-sort yamllint
+      - name: Lint and format checks
+        run: make fmt-check
+      - name: Docs guard
+        env:
+          BASE_REF: ${{ github.event.pull_request.base.sha || 'HEAD~1' }}
+        run: make docs-guard
+
+  tests:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out
+        uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - name: Install dependencies
+        run: |
+          pip install uv
+          uv pip install --system -e .
+          uv pip install --system pytest
+      - name: Run pytest
+        run: make test
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,19 @@
+# Agent Instructions
+
+## Documentation Workflow
+- After each batch of changes, add a `CHANGELOG.md` entry with an ISO 8601 date/time stamp in United States Eastern time (include the timezone code, e.g., `America/New_York` or `ET`) and developer-facing detail (files, modules, functions, variables, and rationale). Every commit should correspond to a fresh entry.
+- Maintain `README.md` as the canonical description of the project; update it whenever behaviour or workflows change. Archive older versions separately when requested.
+- Keep the `docs/` wiki and provisioning guides (`SETUP.md`, `ENVIRONMENT_NEEDS.md`) in sync with code updates; add or revise the
+  relevant page whenever features, modules, or workflows change.
+- After each iteration, refresh `ISSUES.md`, `SOT.md`, `PLAN.md`, `RECOMMENDATIONS.md`, `TODO.md`, and related documentation to stay in sync with the codebase.
+- Ensure `TODO.md` retains the `Completed`, `Priority Tasks`, and `Recommended Waiting for Approval Tasks` sections, moving finished items under `Completed` at the end of every turn.
+- Make every task in `TODO.md` atomic: each entry must describe a single, self-contained deliverable with enough detail to execute and verify without cross-referencing additional context.
+- Update `RESUME_NOTES.md` at the end of every turn so the next session starts with accurate context.
+- When beginning a turn, review `README.md`, `PROJECT.md`, `PLAN.md`, `RECOMMENDATIONS.md`, `ISSUES.md`, and `SOT.md` to harvest new actionable work. Maintain at least ten quantifiable, prioritised items in the `Priority Tasks` section of `TODO.md`, adding context or links when needed.
+- After completing any task, immediately update `TODO.md`, check for the next actionable item, and continue iterating until all unblocked `Priority Tasks` are exhausted for the session.
+- Continuously loop through planning and execution: finish a task, document it, surface new follow-ups, and resume implementation so long as environment blockers allow. If extra guidance would improve throughput, extend these instructions proactively.
+
+## Style Guidelines
+- Use descriptive Markdown headings starting at level 1 for top-level documents.
+- Keep lines to 120 characters or fewer when practical.
+- Prefer bullet lists for enumerations instead of inline commas.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,159 @@
+# Changelog
+
+## [2025-10-14T19:44:39-04:00 (America/New_York)]
+### Added
+- Authored `DUMMIES.md` to catalogue compatibility shims (`meshmind/_compat/pydantic.py`), REST/gRPC stubs, Celery fallbacks,
+  and fake drivers with guidance on which artifacts to retire versus keep for offline testing.
+
+### Changed
+- Updated `README.md`, `SOT.md`, `PLAN.md`, `RECOMMENDATIONS.md`, `PROJECT.md`, and `FINDINGS.md` to reference the new
+  compatibility inventory so contributors know where to track shim removal work.
+## [2025-10-14T16:46:48-04:00 (America/New_York)]
+### Changed
+- Swapped the Memgraph dependency in `pyproject.toml` from `mgclient` to `pymgclient` and confirmed optional packages install
+  cleanly with the refreshed network access (`uv pip install`).
+- Updated environment references—`ENVIRONMENT_NEEDS.md`, `NEEDED_FOR_TESTING.md`, `SETUP.md`, `README.md`, `README_OLD.md`, `SOT.md`, `PROJECT.md`,
+  `FINDINGS.md`, `ISSUES.md`, `TODO.md`, `docs/` wiki pages—to describe `pymgclient` as the Memgraph package while preserving
+  the runtime `mgclient` module references.
+- Revised `AGENTS.md` to require Eastern Time timestamps with timezone codes for every changelog entry and aligned `RESUME_NOTES.md`
+  with the newly installed optional dependencies and confirmed internet availability.
+
+## [2025-10-14T15:53:42-04:00 (America/New_York)]
+### Added
+- Authored `run/install_setup.sh` and `run/maintenance_setup.sh` bash scripts that install system packages (`build-essential`,
+  `cmake`, `libssl-dev`, `libopenblas-dev`, etc.) and synchronize Python dependencies via `uv pip sync` so fresh and cached
+  environments can bootstrap optional tooling (`neo4j`, `mgclient`, `redis`, REST extras) once internet access is available.
+
+### Changed
+- Updated `AGENTS.md` with an atomic-task requirement and refreshed `TODO.md` to prepend granular items for drafting
+  `CLEANUP.md`, introducing a provider-agnostic `meshmind/llm_client.py`, replacing direct OpenAI imports, and wiring cascaded
+  LLM overrides across configuration, CLI, API, tests, and documentation.
+- Extended planning/backlog documents—`ISSUES.md`, `PLAN.md`, `RECOMMENDATIONS.md`, `SOT.md`, `RESUME_NOTES.md`—to capture the
+  upcoming LLM client refactor, dependency sync expectations, and the new automation scripts.
+- Added setup guidance in `README.md` and `SETUP.md` pointing to the `run/` scripts so developers with sudo access can
+  bootstrap environments automatically.
+
+## [2025-10-17T18:45:00Z]
+### Added
+- Created a Dockerfile for integration workloads and introduced targeted Compose stacks
+  under `meshmind/tests/docker/` (Memgraph, Neo4j, Redis, full-stack) alongside a
+  developer-facing provisioning guide in `SETUP.md` to document service bootstrapping
+  commands and environment requirements.
+
+### Changed
+- Expanded `pyproject.toml` to install optional dependencies (`fastapi`,
+  `uvicorn[standard]`, `neo4j`, `mgclient`, `redis`) by default and defined extras
+  (`dev`, `docs`, `testing`); updated the `Makefile` `install` target accordingly and
+  regenerated setup documentation across `README.md`, `docs/`, `PROJECT.md`, `PLAN.md`,
+  `SOT.md`, `NEEDED_FOR_TESTING.md`, `ENVIRONMENT_NEEDS.md`, `FINDINGS.md`,
+  `RECOMMENDATIONS.md`, and `RESUME_NOTES.md` to reference the new workflow and
+  credentials.
+- Reworked the root `docker-compose.yml` to provision Memgraph, Neo4j, and Redis with
+  health checks and volumes, added Compose variants in `meshmind/tests/docker/`, and
+  refreshed onboarding materials (`SETUP.md`, `README.md`, `docs/configuration.md`,
+  `docs/operations.md`, `docs/testing.md`) to call out the new ports, credentials, and
+  teardown guidance.
+- Replaced references to `pymgclient` with `mgclient` throughout dependency notes and
+  environment files to match the updated driver import.
+
+### Fixed
+- Patched `meshmind/cli/admin.py` to import `argparse`, restoring CLI admin command
+  registration after the module refactor.
+- Updated `.github/workflows/ci.yml` to pass `--system` to `uv pip install`, resolving
+  the "No virtual environment found" failure during lint/test setup.
+
+## [2025-10-16T18:30:00Z]
+### Fixed
+- Adjusted `meshmind/tests/test_service_interfaces.py::test_memory_service_ingest_and_search` to return a hydrated `Memory`
+  instance from the monkey-patched `list_memories` stub, ensuring pagination-aware search paths remain asserted while avoiding
+  empty result sets during verification.
+
+## [2025-10-16T12:00:00Z]
+### Added
+- Introduced pagination-aware graph access by adding `search_entities` and `count_entities` to every `GraphDriver` implementation, wiring a new `meshmind admin counts` CLI subcommand and REST `/memories/counts` route through `MemoryManager`, `MemoryService`, and the MeshMind client.
+- Added `scripts/check_docs_sync.py` plus a Makefile target, CI step, and pytest coverage to guard documentation updates whenever code under mapped modules changes.
+
+### Changed
+- Extended `MemoryManager.list_memories`, MeshMind client helpers, retrieval graph wrappers, and service adapters to forward `offset`, `limit`, and `query` hints, delegating filtering to the active driver before in-memory scoring.
+- Updated examples and tests (`meshmind/tests/test_db_drivers.py`, `test_service_interfaces.py`, `test_graph_retrieval.py`, `test_cli_admin.py`, `test_client.py`, `test_docs_guard.py`) to cover pagination, counts, and driver-side search semantics.
+
+### Documentation
+- Refreshed `README.md`, `PROJECT.md`, `PLAN.md`, `RECOMMENDATIONS.md`, `ISSUES.md`, `SOT.md`, `FINDINGS.md`, `AGENTS.md`, `TODO.md`, and the developer wiki (`docs/api.md`, `docs/development.md`, `docs/operations.md`, `docs/persistence.md`, `docs/retrieval.md`, `docs/troubleshooting.md`) to describe pagination, counts, docs-guard workflows, and updated service interfaces.
+## [2025-10-15T15:30:00Z]
+### Added
+- Created a developer wiki under `docs/` covering architecture, pipelines, persistence, retrieval, configuration, testing, operations, telemetry, and development workflows so code changes stay synchronized with reference material.
+- Authored `ENVIRONMENT_NEEDS.md` to request optional dependency installs and external services, plus `RESUME_NOTES.md` for session-to-session continuity.
+
+### Changed
+- Expanded the `GraphDriver` contract to accept namespace and entity-label filters when listing entities, updating the in-memory, SQLite, Neo4j, and Memgraph drivers to push filtering into their native query layers.
+- Propagated the new filtering through `MemoryManager`, `MeshMind.list_memories`, graph-backed retrieval wrappers, and service interfaces (REST/gRPC), ensuring hybrid searches hydrate only the required entity types.
+- Updated tests (`meshmind/tests/test_graph_retrieval.py`, `test_pipeline_preprocess_store.py`, `test_service_interfaces.py`) to cover entity-label filtering across client, REST, and gRPC paths.
+
+### Documentation
+- Refreshed `README.md`, `PROJECT.md`, `PLAN.md`, `RECOMMENDATIONS.md`, `ISSUES.md`, `SOT.md`, `DISCREPANCIES.md`, `FINDINGS.md`, `TODO.md`, and `AGENTS.md` to describe the new driver filtering, documentation workflow, environment checklist, and wiki requirements.
+
+## [2025-02-15T00:45:00Z]
+### Added
+- Introduced `meshmind/retrieval/graph.py` with hybrid/vector/regex/exact/BM25/fuzzy wrappers that hydrate candidates from the active `GraphDriver` before delegating to existing scorers, plus `meshmind/tests/test_graph_retrieval.py` to verify namespace filtering and hybrid integration.
+- Added `meshmind/cli/admin.py` and wired `meshmind/cli/__main__.py` to expose `admin` subcommands for predicate management, maintenance telemetry, and graph connectivity checks; created `meshmind/tests/test_cli_admin.py` to cover the new flows.
+- Created `meshmind/tests/test_neo4j_driver.py` and a `Neo4jGraphDriver.verify_connectivity` helper to exercise driver-level sanity checks without a live cluster.
+- Logged importance score distributions via `meshmind/pipeline/preprocess.summarize_importance` so telemetry captures mean/stddev/recency metrics after scoring.
+
+### Changed
+- Updated `MeshMind` search helpers (`meshmind/client.py`) to auto-load memories from the configured driver when `memories` is `None`, reusing the new graph-backed wrappers.
+- Reworked `meshmind/pipeline/consolidate.py` to return a `ConsolidationPlan` with batch/backoff thresholds and skipped-group tracking; `meshmind/tasks/scheduled.consolidate_task` now emits skip counts and returns a structured summary.
+- Tuned Python compatibility metadata to `>=3.11,<3.13` in `pyproject.toml` and refreshed docs (`README.md`, `NEEDED_FOR_TESTING.md`, `SOT.md`) accordingly.
+- Enhanced `meshmind/pipeline/preprocess.py` to emit telemetry gauges for importance scoring and added `meshmind/tests/test_pipeline_preprocess_store.py::test_score_importance_records_metrics`.
+- Expanded retrieval, CLI, and driver test coverage (`meshmind/tests/test_retrieval.py`, `meshmind/tests/test_tasks_scheduled.py`) to account for graph-backed defaults and new return types.
+
+### Documentation
+- Updated `README.md`, `PROJECT.md`, `PLAN.md`, `SOT.md`, `FINDINGS.md`, `DISCREPANCIES.md`, `RECOMMENDATIONS.md`, `NEEDED_FOR_TESTING.md`, `ISSUES.md`, and `TODO.md` to describe graph-backed retrieval wrappers, CLI admin tooling, consolidation backoff behaviour, telemetry metrics, and revised Python support.
+- Copied the refreshed README guidance into `README_OLD.md` as an archival reference while keeping `README.md` as the primary source.
+
+## [2025-10-14T14:57:47Z]
+### Added
+- Introduced `meshmind/_compat/pydantic.py` to emulate `BaseModel`, `Field`, and `ValidationError` when Pydantic is unavailable, enabling tests to run in constrained environments.
+- Added `meshmind/testing/fakes.py` with `FakeMemgraphDriver`, `FakeRedisBroker`, and `FakeEmbeddingEncoder`, plus a package export and dedicated pytest coverage (`meshmind/tests/test_db_drivers.py`, `meshmind/tests/test_tasks_scheduled.py`).
+- Created heuristics-focused test cases for consolidation outcomes, maintenance tasks, and the revised retrieval dispatcher to guarantee behaviour without external services.
+
+### Changed
+- Replaced the constant importance assignment in `meshmind/pipeline/preprocess.score_importance` with a heuristic that factors token diversity, recency, metadata richness, and embedding magnitude.
+- Rebuilt `meshmind/pipeline/consolidate` around a `ConsolidationOutcome` dataclass that merges metadata, averages embeddings, and surfaces removal IDs; `meshmind/tasks/scheduled.consolidate_task` now applies updates and deletes duplicates lazily via `_get_manager`/`_reset_manager` helpers.
+- Hardened Celery maintenance tasks by logging driver initialization failures, tracking update counts, and returning deterministic totals; compression counts now reflect the number of persisted updates.
+- Updated `meshmind/core/similarity`, `meshmind/retrieval/bm25`, and `meshmind/retrieval/fuzzy` with pure-Python fallbacks so numpy, scikit-learn, and rapidfuzz remain optional.
+- Adjusted `meshmind/pipeline/extract.extract_memories` to defer `openai` imports until a default client is required, unblocking DummyLLM-driven tests.
+- Reworked `meshmind/retrieval/search.search` to rerank the original (filtered) candidate ordering, prepend reranked results, and append hybrid-sorted fallbacks, preventing index drift when rerankers return relative positions.
+- Normalised SQLite entity hydration in `meshmind/db/sqlite_driver._row_to_dict` so JSON metadata is decoded only when stored as strings.
+- Refreshed pytest fixtures (`meshmind/tests/conftest.py`, `meshmind/tests/test_pipeline_preprocess_store.py`) to use deterministic encoders and driver doubles, ensuring CRUD and retrieval suites run without live services.
+
+### Documentation
+- Promoted `README.md` as the single source of truth (archiving the previous copy in `README_OLD.md`) and documented the new heuristics, compatibility shims, and test doubles.
+- Updated `NEEDED_FOR_TESTING.md` with notes about the compatibility layer, optional dependencies, and fake drivers.
+- Reconciled `PROJECT.md`, `ISSUES.md`, `PLAN.md`, `SOT.md`, `RECOMMENDATIONS.md`, `DISCREPANCIES.md`, `FINDINGS.md`, `TODO.md`, and `CHANGELOG.md` to capture the new persistence behaviour, heuristics, fallbacks, and remaining roadmap items.
+
+## [Unreleased] - 2025-02-14
+### Added
+- Configurable graph driver factory with in-memory, SQLite, Memgraph, and optional Neo4j implementations plus supporting tests.
+- REST and gRPC service layers (with FastAPI stub fallback) for ingestion and retrieval, including coverage in the test suite.
+- Observability utilities that collect metrics and structured logs across pipelines and scheduled Celery tasks.
+- Docker Compose definition provisioning Memgraph, Redis, and a Celery worker for local development.
+- Vector-only, regex, exact-match, and optional LLM rerank retrieval helpers with reranker utilities and exports.
+- MeshMind client wrappers for hybrid, vector, regex, and exact searches plus driver accessors.
+- Example script demonstrating triplet storage and diverse retrieval flows.
+- Pytest fixtures for encoder and memory factories alongside new retrieval tests that avoid external services.
+- Makefile targets for linting, formatting, type checks, and tests, plus a GitHub Actions workflow running lint and pytest.
+- README_LATEST.md capturing the current implementation and CHANGELOG.md for release notes.
+
+### Changed
+- Settings now surface `GRAPH_BACKEND`, Neo4j, and SQLite options while README/NEEDED_FOR_TESTING document the expanded setup.
+- README, README_LATEST, and NEW_README were consolidated so the promoted README reflects current behaviour.
+- PROJECT, PLAN, SOT, FINDINGS, DISCREPANCIES, ISSUES, RECOMMENDATIONS, and TODO were refreshed to capture new capabilities and
+  re-homed backlog items under a "Later" section.
+- Updated `SearchConfig` to support rerank models and refreshed MeshMind documentation across PROJECT, PLAN, SOT, FINDINGS,
+  DISCREPANCIES, RECOMMENDATIONS, ISSUES, TODO, and NEEDED_FOR_TESTING files.
+- Revised `meshmind.retrieval.search` to apply filters centrally, expose new search helpers, and integrate reranking.
+- Exposed graph driver access on MeshMind and refreshed retrieval-facing examples and docs.
+
+### Fixed
+- Example ingestion script now uses MeshMind APIs correctly and illustrates relationship persistence.
+- Tests rely on fixtures rather than deprecated hooks, improving portability across environments without Memgraph/OpenAI.