Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions docs/E2E_and_Neo4j_Task_Planning_REVISED.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# E2E and Neo4j Task Planning (Revised — Interpreter Terminology)

This plan aligns E2E testing and the Neo4j refactor with the Interpreter Management System and current API contracts.

## Story: E2E Testing & Neo4j Integration
- ID: story:e2e-testing
- Objective: Establish reliable E2E scaffolding (pytest + Playwright) to validate SciDK core flows and support Neo4j persistence refactor.

## Phases
1. Smoke E2E baseline: Validate core flows (Scan, Browse, Interpreters, Map) without Neo4j.
2. Neo4j refactor: Make Neo4j the live graph store (foundational).
3. Expanded E2E: Add Neo4j-specific tests, interpreter workflows, and negatives.

## Success Criteria
- Core MVP flows pass E2E in CI; Neo4j driver integration solid and tested.
- Interpreter registration and execution validated E2E.

## Tasks
- task:e2e:01-smoke-baseline — Playwright smoke E2E baseline (MVP flows). RICE 999. Status: Ready.
- task:e2e:02-neo4j-refactor — Neo4j as live graph store. RICE 998. Status: Ready.
- task:e2e:03-expanded-e2e — Neo4j-specific E2E + interpreter workflows + negatives. RICE 997. Status: Planned.

## Interpreter Terminology and APIs
- Use Interpreters (not Enrichers) consistently.
- Required endpoints: GET/POST /api/interpreters, GET /api/interpreters/<id>, POST /api/interpreters/<id>/test, POST /api/scans/<id>/interpret.

## E2E Notes
- Prefer BASE_URL injection; keep smoke tests fast (<5s/spec) and independent of external services.
- Add data-testid hooks for Settings, Interpreters, Map, Scan flows.

## References
- MVP_Architecture_Overview_REVISED.md
- SciDK_Interpreter_Management_System.md
83 changes: 83 additions & 0 deletions docs/MVP_Architecture_Overview_REVISED.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# MVP Architecture Overview (Revised — Interpreter‑centric)

This document aligns the MVP architecture with the Interpreter Management System and current repository terminology. Interpreters are lightweight, read‑only metadata extractors that understand specific file formats.

## Core UI Areas
- Home / Scan: start scans via POST /api/scan (or background via /api/tasks)
- Files / Browse: explore scan snapshot via GET /api/scans/<id>/fs
- Interpreters: render per‑file insights (Python, CSV, IPYNB for MVP)
- Map: view schema and export instances
- Interpreter Settings: configure interpreter assignments/rules and registration
- Rclone Mounts (feature‑flagged): manage safe local FUSE mounts
- Background Tasks: monitor async scan/interpret/commit

## Key APIs (MVP)

### Filesystem providers
- GET /api/providers
- GET /api/provider_roots?provider_id=<id>
- GET /api/browse?provider_id=<id>&root_id=<root>&path=<path>[&recursive=false&max_depth=1&fast_list=false]
- POST /api/scan
- GET /api/datasets, GET /api/datasets/<id>

### Scans
- GET /api/scans/<scanId>/status
- GET /api/scans/<scanId>/fs
- POST /api/scans/<scanId>/interpret
- Body: { include?, exclude?, max_size_bytes?, after_rowid?, max_files?, overwrite? }
- Returns: { status, processed_count, error_count, filtered_by_size, filtered_by_include, filtered_no_interpreter, next_cursor }
- POST /api/scans/<scanId>/commit
- Returns commit summary including optional Neo4j verification fields

### Background tasks
- POST /api/tasks { type: 'scan' | 'commit' | 'interpret', ... }
- GET /api/tasks, GET /api/tasks/<task_id>

### Interpreters: registry and execution
- GET /api/interpreters → list available interpreters { id, name, runtime, supported_extensions, metadata_schema }
- GET /api/interpreters/<interpreter_id>
- POST /api/interpreters → register new interpreter { name, runtime, extensions, script, metadata_schema, ... }
- POST /api/interpreters/<interpreter_id>/test → run test on a sample file { file_path } → { status, result, errors, warnings, execution_time_ms }

### Graph: schema and instance exports
- GET /api/graph/schema
- GET /api/graph/schema.neo4j (optional; 501 if driver/misconfig)
- GET /api/graph/schema.apoc (optional; 502 if APOC unavailable)
- GET /api/graph/instances.csv?label=<Label>
- GET /api/graph/instances.xlsx?label=<Label> (requires openpyxl)
- GET /api/graph/instances.arrow?label=<Label> (requires pyarrow)
- GET /api/graph/instances.pkl?label=<Label>

### Rclone Mount Manager (feature‑flagged)
- GET /api/rclone/mounts, POST /api/rclone/mounts, DELETE /api/rclone/mounts/<id>
- GET /api/rclone/mounts/<id>/logs?tail=N, GET /api/rclone/mounts/<id>/health

## Interpreter Settings
- File type assignments: map extensions → interpreters (e.g., .py → Python Interpreter)
- Pattern rules: conditional selection (e.g., OME‑TIFF for /microscopy/*.tif)
- Custom interpreters: register/upload user interpreters
- Execution config: timeouts, caching, parallelization, sampling
- Neo4j connection: URI/auth; used by optional schema endpoints and commit flows
- Feature flags summary: active providers and enabled features

## Mental Model
- Home → POST /api/scan, poll status
- Browse → GET /api/scans/<id>/fs
- Interpreters → POST /api/scans/<id>/interpret, then read results
- Map → GET /api/graph/schema* and /api/graph/instances.*
- Interpreter Settings → GET/POST /api/interpreters
- Rclone Mounts → /api/rclone/mounts*

## Feature Flags & Env
- SCIDK_PROVIDERS: local_fs,mounted_fs[,rclone]
- SCIDK_RCLONE_MOUNTS or SCIDK_FEATURE_RCLONE_MOUNTS: toggles Mount Manager
- NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD, NEO4J_AUTH
- Optional deps: openpyxl, pyarrow

## Out of Scope (MVP)
- Persistent graph storage by default (Neo4j planned)
- Full RO‑Crate export and direct file streaming endpoints
- Advanced interpreter features (remote/distributed, full audit trails)

## E2E Next Steps
- Add E2E coverage for interpreter execution pipeline (register → test → apply to scan).
46 changes: 46 additions & 0 deletions docs/e2e-testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# E2E Testing Quickstart

This guide explains how to run the smoke E2E tests and how they map to the active story/phase.

## Prerequisites
- Python environment set up (see README/requirements.txt)
- Playwright installed for the chosen variant (Python or TS). For Python,
run: `pip install pytest pytest-playwright` then `playwright install --with-deps`
- Application can be started locally (default dev port assumed)

## Start the app server
- Local dev: `python -m scidk.app`
- Or use your preferred launcher (Makefile/script). Note the base URL shown in logs (e.g., http://localhost:5000).

## Environment for tests
- Set BASE_URL for E2E to point to the running server, for example:
- Linux/macOS: `export BASE_URL="http://localhost:5000"`
- Windows (Powershell): `$env:BASE_URL = "http://localhost:5000"`

## Running E2E smoke tests
- Python Playwright (pytest):
- `pytest -q -m e2e --maxfail=1`
- Or run a single spec: `pytest tests/e2e/test_home_scan.py -q`
- TypeScript Playwright:
- `npx playwright test`

Notes:
- Smoke specs should run in <5s each and not require external services.
- Optional features (Neo4j/APOC/rclone) are gated behind feature flags.

## dev.cli helpers
- Show prioritized Ready Queue: `python -m dev.cli ready-queue`
- Validate a task file: `python -m dev.cli validate task:e2e:01-smoke-baseline`
- Start working on the top task: `python -m dev.cli start`

## Active Story & Phase
- See `dev/cycles.md` for the current Active Story/Phase pointer.
- E2E Story: `dev/stories/e2e-testing/story.md`
- Phases: `dev/stories/e2e-testing/phases/`
- Tasks: `dev/tasks/e2e/`

## Troubleshooting
- Ensure BASE_URL is set and reachable.
- If running headless in CI, confirm browsers are installed (`playwright install`).
- If optional deps (openpyxl/pyarrow) are not installed, related export tests will be skipped or should target CSV-only paths.
- Check server logs for endpoint errors during tests (`/api/interpreters*`, `/api/scans/<id>/interpret`).