patchmemory · patchmemory · Jan 12, 2026 · Jan 12, 2026 · Jan 12, 2026 · Jan 12, 2026
diff --git a/docs/E2E_and_Neo4j_Task_Planning_REVISED.md b/docs/E2E_and_Neo4j_Task_Planning_REVISED.md
@@ -0,0 +1,33 @@
+# E2E and Neo4j Task Planning (Revised — Interpreter Terminology)
+
+This plan aligns E2E testing and the Neo4j refactor with the Interpreter Management System and current API contracts.
+
+## Story: E2E Testing & Neo4j Integration
+- ID: story:e2e-testing
+- Objective: Establish reliable E2E scaffolding (pytest + Playwright) to validate SciDK core flows and support Neo4j persistence refactor.
+
+## Phases
+1. Smoke E2E baseline: Validate core flows (Scan, Browse, Interpreters, Map) without Neo4j.
+2. Neo4j refactor: Make Neo4j the live graph store (foundational).
+3. Expanded E2E: Add Neo4j-specific tests, interpreter workflows, and negatives.
+
+## Success Criteria
+- Core MVP flows pass E2E in CI; Neo4j driver integration solid and tested.
+- Interpreter registration and execution validated E2E.
+
+## Tasks
+- task:e2e:01-smoke-baseline — Playwright smoke E2E baseline (MVP flows). RICE 999. Status: Ready.
+- task:e2e:02-neo4j-refactor — Neo4j as live graph store. RICE 998. Status: Ready.
+- task:e2e:03-expanded-e2e — Neo4j-specific E2E + interpreter workflows + negatives. RICE 997. Status: Planned.
+
+## Interpreter Terminology and APIs
+- Use Interpreters (not Enrichers) consistently.
+- Required endpoints: GET/POST /api/interpreters, GET /api/interpreters/<id>, POST /api/interpreters/<id>/test, POST /api/scans/<id>/interpret.
+
+## E2E Notes
+- Prefer BASE_URL injection; keep smoke tests fast (<5s/spec) and independent of external services.
+- Add data-testid hooks for Settings, Interpreters, Map, Scan flows.
+
+## References
+- MVP_Architecture_Overview_REVISED.md
+- SciDK_Interpreter_Management_System.md
diff --git a/docs/MVP_Architecture_Overview_REVISED.md b/docs/MVP_Architecture_Overview_REVISED.md
@@ -0,0 +1,83 @@
+# MVP Architecture Overview (Revised — Interpreter‑centric)
+
+This document aligns the MVP architecture with the Interpreter Management System and current repository terminology. Interpreters are lightweight, read‑only metadata extractors that understand specific file formats.
+
+## Core UI Areas
+- Home / Scan: start scans via POST /api/scan (or background via /api/tasks)
+- Files / Browse: explore scan snapshot via GET /api/scans/<id>/fs
+- Interpreters: render per‑file insights (Python, CSV, IPYNB for MVP)
+- Map: view schema and export instances
+- Interpreter Settings: configure interpreter assignments/rules and registration
+- Rclone Mounts (feature‑flagged): manage safe local FUSE mounts
+- Background Tasks: monitor async scan/interpret/commit
+
+## Key APIs (MVP)
+
+### Filesystem providers
+- GET /api/providers
+- GET /api/provider_roots?provider_id=<id>
+- GET /api/browse?provider_id=<id>&root_id=<root>&path=<path>[&recursive=false&max_depth=1&fast_list=false]
+- POST /api/scan
+- GET /api/datasets, GET /api/datasets/<id>
+
+### Scans
+- GET /api/scans/<scanId>/status
+- GET /api/scans/<scanId>/fs
+- POST /api/scans/<scanId>/interpret
+  - Body: { include?, exclude?, max_size_bytes?, after_rowid?, max_files?, overwrite? }
+  - Returns: { status, processed_count, error_count, filtered_by_size, filtered_by_include, filtered_no_interpreter, next_cursor }
+- POST /api/scans/<scanId>/commit
+  - Returns commit summary including optional Neo4j verification fields
+
+### Background tasks
+- POST /api/tasks { type: 'scan' | 'commit' | 'interpret', ... }
+- GET /api/tasks, GET /api/tasks/<task_id>
+
+### Interpreters: registry and execution
+- GET /api/interpreters → list available interpreters { id, name, runtime, supported_extensions, metadata_schema }
+- GET /api/interpreters/<interpreter_id>
+- POST /api/interpreters → register new interpreter { name, runtime, extensions, script, metadata_schema, ... }
+- POST /api/interpreters/<interpreter_id>/test → run test on a sample file { file_path } → { status, result, errors, warnings, execution_time_ms }
+
+### Graph: schema and instance exports
+- GET /api/graph/schema
+- GET /api/graph/schema.neo4j (optional; 501 if driver/misconfig)
+- GET /api/graph/schema.apoc (optional; 502 if APOC unavailable)
+- GET /api/graph/instances.csv?label=<Label>
+- GET /api/graph/instances.xlsx?label=<Label> (requires openpyxl)
+- GET /api/graph/instances.arrow?label=<Label> (requires pyarrow)
+- GET /api/graph/instances.pkl?label=<Label>
+
+### Rclone Mount Manager (feature‑flagged)
+- GET /api/rclone/mounts, POST /api/rclone/mounts, DELETE /api/rclone/mounts/<id>
+- GET /api/rclone/mounts/<id>/logs?tail=N, GET /api/rclone/mounts/<id>/health
+
+## Interpreter Settings
+- File type assignments: map extensions → interpreters (e.g., .py → Python Interpreter)
+- Pattern rules: conditional selection (e.g., OME‑TIFF for /microscopy/*.tif)
+- Custom interpreters: register/upload user interpreters
+- Execution config: timeouts, caching, parallelization, sampling
+- Neo4j connection: URI/auth; used by optional schema endpoints and commit flows
+- Feature flags summary: active providers and enabled features
+
+## Mental Model
+- Home → POST /api/scan, poll status
+- Browse → GET /api/scans/<id>/fs
+- Interpreters → POST /api/scans/<id>/interpret, then read results
+- Map → GET /api/graph/schema* and /api/graph/instances.*
+- Interpreter Settings → GET/POST /api/interpreters
+- Rclone Mounts → /api/rclone/mounts*
+
+## Feature Flags & Env
+- SCIDK_PROVIDERS: local_fs,mounted_fs[,rclone]
+- SCIDK_RCLONE_MOUNTS or SCIDK_FEATURE_RCLONE_MOUNTS: toggles Mount Manager
+- NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD, NEO4J_AUTH
+- Optional deps: openpyxl, pyarrow
+
+## Out of Scope (MVP)
+- Persistent graph storage by default (Neo4j planned)
+- Full RO‑Crate export and direct file streaming endpoints
+- Advanced interpreter features (remote/distributed, full audit trails)
+
+## E2E Next Steps
+- Add E2E coverage for interpreter execution pipeline (register → test → apply to scan).
diff --git a/docs/e2e-testing.md b/docs/e2e-testing.md
@@ -0,0 +1,46 @@
+# E2E Testing Quickstart
+
+This guide explains how to run the smoke E2E tests and how they map to the active story/phase.
+
+## Prerequisites
+- Python environment set up (see README/requirements.txt)
+- Playwright installed for the chosen variant (Python or TS). For Python,
+  run: `pip install pytest pytest-playwright` then `playwright install --with-deps`
+- Application can be started locally (default dev port assumed)
+
+## Start the app server
+- Local dev: `python -m scidk.app`
+- Or use your preferred launcher (Makefile/script). Note the base URL shown in logs (e.g., http://localhost:5000).
+
+## Environment for tests
+- Set BASE_URL for E2E to point to the running server, for example:
+  - Linux/macOS: `export BASE_URL="http://localhost:5000"`
+  - Windows (Powershell): `$env:BASE_URL = "http://localhost:5000"`
+
+## Running E2E smoke tests
+- Python Playwright (pytest):
+  - `pytest -q -m e2e --maxfail=1`
+  - Or run a single spec: `pytest tests/e2e/test_home_scan.py -q`
+- TypeScript Playwright:
+  - `npx playwright test`
+
+Notes:
+- Smoke specs should run in <5s each and not require external services.
+- Optional features (Neo4j/APOC/rclone) are gated behind feature flags.
+
+## dev.cli helpers
+- Show prioritized Ready Queue: `python -m dev.cli ready-queue`
+- Validate a task file: `python -m dev.cli validate task:e2e:01-smoke-baseline`
+- Start working on the top task: `python -m dev.cli start`
+
+## Active Story & Phase
+- See `dev/cycles.md` for the current Active Story/Phase pointer.
+- E2E Story: `dev/stories/e2e-testing/story.md`
+- Phases: `dev/stories/e2e-testing/phases/`
+- Tasks: `dev/tasks/e2e/`
+
+## Troubleshooting
+- Ensure BASE_URL is set and reachable.
+- If running headless in CI, confirm browsers are installed (`playwright install`).
+- If optional deps (openpyxl/pyarrow) are not installed, related export tests will be skipped or should target CSV-only paths.
+- Check server logs for endpoint errors during tests (`/api/interpreters*`, `/api/scans/<id>/interpret`).