All API endpoints are prefixed with /api/v1/.
Authentication is enforced when API_KEY_ENABLED=true or AUTH_TRUST_PROXY_HEADERS=true in environment settings. All endpoints require the get_current_user dependency.
Get documents by source or batch ID.
Query Parameters:
source(string, optional) - Source identifier to filter documentsbatch_id(integer, optional) - Batch ID to filter documents
Response:
200 OK- Array of DocumentURI objects400 Bad Request- Neither source nor batch_id provided
Example:
curl "http://localhost:8000/api/v1/document/?batch_id=1"Ingest a new document into the system.
Content-Type: multipart/form-data
Form Parameters:
file(file, optional) - Document file to uploadinput_uri(string, optional) - URI to fetch document frommime_type(string, optional) - MIME type of the documentsource_uri(string, required) - Source URI/path identifiersource(string, required) - Source system identifierbatch_id(integer, required) - Batch ID to assign documentdoc_meta(string, optional) - JSON string of metadata (default:{})priority(integer, optional) - Processing priority (default: 0)
Response:
201 Created- Document ingested successfully (new document)203 Non-Authoritative Information- Document already exists in a different batch400 Bad Request- Invalid parameters or metadata500 Internal Server Error- Processing error
Success Response Body:
{
"batch_id": 1,
"document_uri": "/path/to/doc.pdf",
"document_hash": "sha256-abc123...",
"source": "filesystem",
"uri_id": 42
}Notes:
- The
batch_idin the response reflects the batch where the document URI actually resides - If a document with the same hash already exists in a different batch, the response returns
203with the original batch ID - This prevents duplicate processing while informing the caller that the document was previously ingested
Example:
curl -X POST "http://localhost:8000/api/v1/document/ingest-document" \
-F "file=@document.pdf" \
-F "source_uri=/documents/report.pdf" \
-F "source=filesystem" \
-F "batch_id=1" \
-F "doc_meta={\"author\":\"John Doe\"}"Delete orphaned documents with no URI references.
Response:
200 OK- Cleanup successful with statistics500 Internal Server Error- Processing error
Success Response Body:
{
"message": "Orphaned documents cleaned up",
"statistics": {
"deleted_documents": 5,
"deleted_history": 12
}
}Example:
curl -X POST "http://localhost:8000/api/v1/document/cleanup-orphans"Delete a DocumentURI by URI and source with cascading deletion.
If only one DocumentURI references the underlying document, all associated records are deleted including workflow runs, steps, lifecycle history, artifacts, and the document itself.
If multiple DocumentURIs reference the same document, only the specified DocumentURI and its history are deleted; the document is preserved.
Query Parameters:
uri(string, required) - The document URI to deletesource(string, required) - The source system identifier
Response:
200 OK- Deletion successful with statistics404 Not Found- DocumentURI not found500 Internal Server Error- Processing error
Success Response Body:
{
"message": "DocumentURI deleted successfully",
"uri": "/documents/report.pdf",
"source": "filesystem",
"statistics": {
"deleted_document_uris": 1,
"deleted_uri_history": 3,
"deleted_documents": 1,
"deleted_workflow_runs": 2,
"deleted_run_steps": 10,
"deleted_lifecycle_history": 6,
"total_deleted": 23
}
}Notes:
- When
deleted_documentsis 0, other DocumentURIs still reference the document - All deletions occur within a single transaction for atomicity
- File artifacts are also deleted from configured storage (filesystem, S3, or database)
Example:
curl -X DELETE "http://localhost:8000/api/v1/document/by-uri?uri=/documents/report.pdf&source=filesystem"List all document batches.
Response:
200 OK- Array of DocumentBatch objects
Example:
curl "http://localhost:8000/api/v1/batch/"Create a new document batch.
Content-Type: application/x-www-form-urlencoded
Form Parameters:
source(string, required) - Source system identifiername(string, required) - Human-readable batch name
Response:
201 Created- Batch created successfully
Response Body:
{
"batch_id": 1
}Example:
curl -X POST "http://localhost:8000/api/v1/batch/" \
-d "source=filesystem" \
-d "name=Q4 Reports"Start workflow processing for all documents in a batch.
Content-Type: application/x-www-form-urlencoded
Form Parameters:
batch_id(integer, required) - Batch ID to processworkflow_definition_id(string, optional) - Workflow to use (default: from config)priority(integer, optional) - Processing priority (default: 0)param_id(string, optional) - Parameter set ID (default: from config)
Response:
201 Created- Workflows started successfully404 Not Found- Batch not found500 Internal Server Error- Processing error
Response Body:
{
"message": "Workflows started",
"workflows": 10,
"run_group": 5
}Example:
curl -X POST "http://localhost:8000/api/v1/batch/start-workflows" \
-d "batch_id=1" \
-d "workflow_definition_id=batch" \
-d "param_id=default"Get detailed status for a batch.
Query Parameters:
batch_id(integer, required) - Batch ID
Response:
200 OK- Batch status details404 Not Found- Batch not found
Response Body:
{
"batch": {
"id": 1,
"name": "Q4 Reports",
"source": "filesystem",
"start_date": "2025-01-15T10:00:00",
"completed_date": null
},
"document_count": 10,
"workflow_count": {
"COMPLETED": 7,
"RUNNING": 2,
"PENDING": 1
},
"workflows": [...],
"parsed": 7,
"remaining": 3
}Example:
curl "http://localhost:8000/api/v1/batch/status?batch_id=1"Get all workflow steps for a batch.
Path Parameters:
batch_id(integer, required) - Batch ID
Response:
200 OK- Array of RunStep objects500 Internal Server Error- Processing error
Example:
curl "http://localhost:8000/api/v1/batch/1/steps"Get workflow runs with optional pagination.
Query Parameters:
batch_id(integer, optional) - Filter by batch IDinclude_steps(boolean, optional) - Include step details (default: false)include_doc_info(boolean, optional) - Include document info (default: false)page(integer, optional) - Page number (1-indexed)rows_per_page(integer, optional) - Results per page (default: 10 when paginated)
Response:
200 OK- Array of WorkflowRun objects (unpaginated) or PaginatedResponse (paginated)
Paginated Response Body:
{
"items": [...],
"total": 100,
"page": 1,
"rows_per_page": 10,
"total_pages": 10
}Example:
curl "http://localhost:8000/api/v1/workflow/?batch_id=1&page=1&rows_per_page=20"Get workflow runs filtered by status with optional pagination.
Query Parameters:
status(enum, required) - One of: PENDING, RUNNING, COMPLETED, ERROR, FAILEDbatch_id(integer, optional) - Filter by batch IDinclude_doc_info(boolean, optional) - Include document info (default: false)page(integer, optional) - Page number (1-indexed)rows_per_page(integer, optional) - Results per page
Response:
200 OK- Array of WorkflowRun objects or PaginatedResponse
Example:
curl "http://localhost:8000/api/v1/workflow/by-status?status=FAILED"List all available workflow definitions.
Response:
200 OK- Array of workflow definition summaries
Response Body:
[
{
"id": "batch",
"name": "Batch Workflow"
},
{
"id": "interactive",
"name": "Interactive Workflow"
}
]Example:
curl "http://localhost:8000/api/v1/workflow/definitions"Get workflow definition YAML content by ID.
Path Parameters:
workflow_id(string, required) - Workflow definition ID
Response:
200 OK- YAML content (Content-Type: text/yaml)404 Not Found- Workflow definition not found
Example:
curl "http://localhost:8000/api/v1/workflow/definitions/batch"List all available parameter sets.
Response:
200 OK- Array of parameter set summaries
Response Body:
[
{
"id": "default",
"name": "Default Parameters",
"source": "app"
},
{
"id": "high_quality",
"name": "High Quality Processing",
"source": "user"
}
]Example:
curl "http://localhost:8000/api/v1/workflow/param-sets"Get parameter set YAML content by ID.
Path Parameters:
set_id(string, required) - Parameter set ID
Response:
200 OK- YAML content (Content-Type: text/yaml)404 Not Found- Parameter set not found
Example:
curl "http://localhost:8000/api/v1/workflow/param-sets/default"Get parameter sets that target a specific LanceDB directory.
Path Parameters:
target(string, required) - LanceDB data directory path
Response:
200 OK- Array of matching WorkflowParams objects
Example:
curl "http://localhost:8000/api/v1/workflow/param_sets/target/lancedb"Upload a new parameter set from YAML content.
Content-Type: application/x-www-form-urlencoded
Form Parameters:
yaml_content(string, required) - Raw YAML content
Response:
201 Created- Parameter set created successfully400 Bad Request- Invalid YAML syntax or format409 Conflict- Parameter set with same ID already exists500 Internal Server Error- Processing error
Success Response Body:
{
"message": "Parameter set created successfully",
"id": "my_params",
"file_path": "/path/to/params/my_params.yaml"
}Notes:
- Uploaded parameter sets have
sourceset to "user" - The parameter set ID is taken from the YAML content
Example:
curl -X POST "http://localhost:8000/api/v1/workflow/param-sets" \
-d "yaml_content=id: my_params\nname: My Parameters\nconfig:\n parse:\n format: markdown"Delete a user-uploaded parameter set.
Path Parameters:
set_id(string, required) - Parameter set ID to delete
Response:
200 OK- Parameter set deleted successfully403 Forbidden- Cannot delete built-in parameter sets404 Not Found- Parameter set not found500 Internal Server Error- Processing error
Notes:
- Only parameter sets with
source="user"can be deleted - Built-in parameter sets cannot be deleted via API
Example:
curl -X DELETE "http://localhost:8000/api/v1/workflow/param-sets/my_params"Get workflow steps filtered by status.
Query Parameters:
status(enum, required) - One of: PENDING, RUNNING, COMPLETED, ERROR, FAILED
Response:
200 OK- Array of RunStep objects
Example:
curl "http://localhost:8000/api/v1/workflow/steps?status=RUNNING"Get workflow run groups, optionally filtered by batch ID.
Query Parameters:
batch_id(integer, optional) - Filter by batch ID
Response:
200 OK- Array of RunGroup objects500 Internal Server Error- Processing error
Example:
curl "http://localhost:8000/api/v1/workflow/run-groups?batch_id=1"Get specific run group by ID.
Path Parameters:
run_group_id(integer, required) - Run group ID
Response:
200 OK- RunGroup object500 Internal Server Error- Processing error
Example:
curl "http://localhost:8000/api/v1/workflow/run_groups/5"Delete a run group and all dependent records.
Path Parameters:
run_group_id(integer, required) - Run group ID to delete
Response:
200 OK- Run group deleted successfully404 Not Found- Run group does not exist500 Internal Server Error- Processing error
Response Body:
{
"message": "RunGroup 5 deleted successfully",
"statistics": {
"deleted_runsteps": 150,
"deleted_lifecyclehistory": 45,
"deleted_workflowruns": 10,
"deleted_rungroups": 1,
"total_deleted": 206
}
}Notes:
- Works with both SQLite and PostgreSQL databases
- Deletes all dependent records: RunSteps, LifecycleHistory, WorkflowRuns, and the RunGroup
- The deletion is performed within a transaction and rolled back if any error occurs
Example:
curl -X DELETE "http://localhost:8000/api/v1/workflow/run_groups/5"Get statistics for a run group.
Path Parameters:
run_group_id(integer, required) - Run group ID
Response:
200 OK- Statistics object with status counts500 Internal Server Error- Processing error
Example:
curl "http://localhost:8000/api/v1/workflow/run_groups/5/stats"Get workflow runs for a batch.
Query Parameters:
batch_id(integer, required) - Batch ID
Response:
200 OK- Array of WorkflowRun objects
Example:
curl "http://localhost:8000/api/v1/workflow/runs?batch_id=1"Get specific workflow run by ID, including steps.
Path Parameters:
workflow_id(integer, required) - Workflow run ID
Response:
200 OK- WorkflowRun object with steps array
Example:
curl "http://localhost:8000/api/v1/workflow/runs/42"Get lifecycle history events for a specific workflow run.
Path Parameters:
workflow_id(integer, required) - Workflow run ID
Response:
200 OK- Array of LifecycleHistory objects ordered by start_date400 Bad Request- Invalid workflow ID500 Internal Server Error- Processing error
Response Body:
[
{
"id": 1,
"event": "item_start",
"handler_name": null,
"run_group_id": 5,
"workflow_run_id": 42,
"step_id": null,
"start_date": "2025-01-15T10:00:00",
"completed_date": "2025-01-15T10:01:30",
"status": "COMPLETED",
"status_date": "2025-01-15T10:01:30",
"status_message": "Item processing completed successfully",
"status_meta": {}
}
]Event Types:
group_start/group_end- Run group lifecycleitem_start/item_end/item_failed- Item processing lifecyclestep_start/step_end/step_failed- Individual step lifecycle
Example:
curl "http://localhost:8000/api/v1/workflow/runs/42/lifecycle"Start a new workflow run for a single document.
Content-Type: application/x-www-form-urlencoded
Form Parameters:
doc_id(string, required) - Document hash to processworkflow_definiton_id(string, optional) - Workflow to useparam_id(string, optional) - Parameter set IDpriority(integer, optional) - Processing priority (default: 0)
Response:
201 Created- Workflow run created500 Internal Server Error- Processing error
Example:
curl -X POST "http://localhost:8000/api/v1/workflow/" \
-d "doc_id=sha256-abc123..." \
-d "workflow_definiton_id=batch" \
-d "priority=10"Retry failed workflow steps for a run group.
Content-Type: application/x-www-form-urlencoded
Form Parameters:
run_group_id(integer, required) - Run group ID to retry
Response:
201 Created- Failed steps reset successfully500 Internal Server Error- Processing error
Example:
curl -X POST "http://localhost:8000/api/v1/workflow/retry" \
-d "run_group_id=5"Check document status for a source system.
Content-Type: application/x-www-form-urlencoded
Form Parameters:
source(string, required) - Source system identifierhashes(string, required) - JSON object mapping URIs to hashes
Response:
200 OK- Status object indicating new/changed/deleted documents
Example:
curl -X POST "http://localhost:8000/api/v1/source-status" \
-d "source=filesystem" \
-d 'hashes={"file1.pdf":"sha256-abc","file2.pdf":"sha256-def"}'Get workflow durations by run group.
Query Parameters:
run_group_id(integer, required) - Run group ID
Response:
200 OK- Duration statistics500 Internal Server Error- Processing error
Example:
curl "http://localhost:8000/api/v1/stats/durations?run_group_id=5"Get workflow step statistics by run group.
Query Parameters:
run_group_id(integer, required) - Run group ID
Response:
200 OK- Step statistics500 Internal Server Error- Processing error
Example:
curl "http://localhost:8000/api/v1/stats/step-stats?run_group_id=5"List all LanceDB vector databases in the configured directory.
Response:
200 OK- List of databases with metadata
Response Body:
{
"status": "ok",
"lancedb_dir": "/data/lancedb",
"database_count": 2,
"databases": [
{
"name": "default",
"path": "default",
"size_bytes": 1048576,
"size_human": "1.00 MB"
}
]
}Example:
curl "http://localhost:8000/api/v1/lancedb/list"Get detailed information about a specific LanceDB database.
Query Parameters:
db(string, required) - Database name relative to lancedb_dir
Response:
200 OK- Database information404 Not Found- Database does not exist500 Internal Server Error- Failed to open database
Response Body:
{
"status": "ok",
"path": "/data/lancedb/default",
"versions": {
"lancedb": "0.25.3",
"haiku_rag": "0.25.0",
"stored_version": "0.25.0"
},
"embeddings": {
"provider": "openai",
"model": "text-embedding-3-small",
"vector_dim": 1536
},
"documents": {
"count": 100,
"size_bytes": 512000,
"size_human": "500.00 KB",
"versions": 5
},
"chunks": {
"count": 1500,
"size_bytes": 2048000,
"size_human": "2.00 MB",
"versions": 5
},
"vector_index": {
"exists": true,
"indexed_rows": 1450,
"unindexed_rows": 50
},
"tables": ["documents", "chunks", "settings"]
}Example:
curl "http://localhost:8000/api/v1/lancedb/info?db=default"Note: The db parameter supports nested paths (e.g., project/data).
Optimize and clean up database tables to reduce disk usage.
Query Parameters:
db(string, required) - Database name relative to lancedb_dir
Response:
200 OK- Vacuum completed successfully404 Not Found- Database does not exist at the resolved path409 Conflict- The DB'sResourceLockis held by another writer (workflow worker, CLI vacuum, lifecycle vacuum). The endpoint usesmax_wait=0for fail-fast behaviour.500 Internal Server Error- Vacuum failed for any other reason
Response Bodies:
200 OK:
{ "status": "ok" }404 Not Found:
{ "status": "not_found", "error": "Database does not exist at ..." }409 Conflict:
{ "status": "locked", "error": "RAG DB locked by worker:<lease> since ..." }500 Internal Server Error:
{ "status": "error", "error": "Failed to vacuum database: ..." }Example:
curl "http://localhost:8000/api/v1/lancedb/vacuum?db=default"Notes:
- Vacuum holds the cross-subsystem
ResourceLock(holder_kind=web) for the duration of the operation, so it cannot race workflowsave_to_ragsteps, CLI vacuums, or lifecycle vacuums. - Returns 409 immediately if the lock cannot be acquired — retry
later, or break the lock from the CLI with
si-diag lancedb vacuum <db> --force. - Run periodically after bulk deletions, or wire the
batch_split_vacuumworkflow to vacuum at the end of every run group.
List documents stored in a LanceDB database.
Query Parameters:
db(string, required) - Database name relative to lancedb_dirlimit(integer, optional) - Maximum number of documents to returnoffset(integer, optional) - Number of documents to skipfilter(string, optional) - SQL WHERE clause to filter documents
Response:
200 OK- List of documents404 Not Found- Database does not exist500 Internal Server Error- Query error
Response Body:
{
"status": "ok",
"path": "/data/lancedb/default",
"document_count": 10,
"documents": [
{
"id": "doc-abc123",
"uri": "/documents/report.pdf",
"title": "Q4 Financial Report",
"created_at": "2025-01-15T10:00:00",
"updated_at": "2025-01-15T12:00:00",
"chunk_count": 25,
"metadata": {"author": "John Doe"}
}
]
}Example:
curl "http://localhost:8000/api/v1/lancedb/documents?db=default&limit=10"Example with filter:
curl "http://localhost:8000/api/v1/lancedb/documents?db=default&filter=uri%20LIKE%20'%25report%25'"{
"id": 1,
"name": "Q4 Reports",
"source": "filesystem",
"start_date": "2025-01-15T10:00:00",
"completed_date": null,
"batch_params": {},
"duration": null
}{
"hash": "sha256-abc123...",
"mime_type": "application/pdf",
"file_size": 1024000,
"doc_meta": {"author": "John Doe"}
}{
"id": 42,
"doc_hash": "sha256-abc123...",
"uri": "/documents/report.pdf",
"source": "filesystem",
"version": 1,
"batch_id": 1
}{
"id": 100,
"workflow_definition_id": "batch",
"run_group_id": 5,
"batch_id": 1,
"doc_id": "sha256-abc123...",
"priority": 0,
"created_date": "2025-01-15T10:00:00",
"start_date": "2025-01-15T10:01:00",
"completed_date": null,
"status": "RUNNING",
"status_date": "2025-01-15T10:05:00",
"status_message": null,
"status_meta": {},
"run_params": {},
"duration": null
}{
"id": 500,
"workflow_run_id": 100,
"workflow_step_number": 2,
"workflow_step_name": "parse",
"step_config_id": 10,
"step_type": "parse",
"is_last_step": false,
"created_date": "2025-01-15T10:01:00",
"priority": 0,
"start_date": "2025-01-15T10:02:00",
"status_date": "2025-01-15T10:05:00",
"completed_date": null,
"retry": 0,
"retries": 1,
"status": "RUNNING",
"status_message": null,
"status_meta": {},
"worker_id": "worker-abc-123",
"duration": null
}{
"id": 5,
"name": "Batch 1 Processing",
"workflow_definition_id": "batch",
"param_definition_id": "default",
"batch_id": 1,
"created_date": "2025-01-15T10:00:00",
"start_date": "2025-01-15T10:01:00",
"completed_date": null,
"status": "RUNNING",
"status_date": "2025-01-15T10:30:00",
"status_message": "Processing documents",
"status_meta": {}
}{
"id": 1,
"event": "item_start",
"handler_name": null,
"run_group_id": 5,
"workflow_run_id": 42,
"step_id": null,
"start_date": "2025-01-15T10:00:00",
"completed_date": "2025-01-15T10:01:30",
"status": "COMPLETED",
"status_date": "2025-01-15T10:01:30",
"status_message": null,
"status_meta": {}
}PENDING- Not yet startedRUNNING- Currently executingCOMPLETED- Finished successfullyERROR- Failed but will retryFAILED- Permanently failed after all retries
ingest- Load documentvalidate- Validate documentparse- Extract text/structurechunk- Split into chunksembed- Generate embeddingsstore- Save to RAG systemenrich- Add metadataroute- Conditional routing
All error responses follow this format:
{
"error": "Error message describing what went wrong",
"status_code": 400
}Common HTTP status codes:
400 Bad Request- Invalid parameters403 Forbidden- Permission denied404 Not Found- Resource not found409 Conflict- Duplicate resource500 Internal Server Error- Server-side error
Interactive API documentation is available at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc