InGest-LLM.as

A microservice for ingesting data into the ApexSigma ecosystem. This service provides NLP-powered text parsing capabilities using Spacy Transformers and NLTK for extracting Knowledge Graphs from text.

🚀 Quick Start

Prerequisites

Python 3.12+
Poetry 2.2.1+
4GB+ RAM (for Transformer model loading)

Installation

# Install dependencies
poetry install

# ⚠️ MANDATORY: Download NLP models (required for Graph Parser)
python -m spacy download en_core_web_trf
python -m nltk.downloader punkt

# Start the server
poetry run uvicorn src.ingest_llm_as.main:app --reload --host 0.0.0.0 --port 8000

Docker

docker build -t ingest-llm-as .
docker run -p 8000:8000 ingest-llm-as

📖 Documentation

Graph Parser API

The Graph Parser endpoint extracts Knowledge Graphs from text using dependency parsing:

POST /graph/parse

{
    "text": "ApexSigma Solutions is integrating the new Design System. CortexBridge v4.0.2 utilizes React 18 and TailwindCSS. The module was deployed by SigmaDev11 in Cape Town.",
    "config": {}
}

Response:

{
    "metadata": {
        "model": "en_core_web_trf",
        "sentences": 3,
        "entities": 7,
        "relations": 3
    },
    "nodes": [
        {"id": "ApexSigma Solutions", "type": "ORG"},
        {"id": "Design System", "type": "PRODUCT"},
        {"id": "CortexBridge v4.0.2", "type": "PRODUCT"},
        {"id": "React 18", "type": "PRODUCT"},
        {"id": "TailwindCSS", "type": "PRODUCT"},
        {"id": "SigmaDev11", "type": "PERSON"},
        {"id": "Cape Town", "type": "GPE"}
    ],
    "edges": [
        {"source": "ApexSigma Solutions", "relation": "integrating", "target": "Design System"},
        {"source": "CortexBridge v4.0.2", "relation": "utilizes", "target": "React 18"},
        {"source": "SigmaDev11", "relation": "deployed_in", "target": "Cape Town"}
    ]
}

Health Check

GET /graph/health

{
    "status": "ready",
    "model_loaded": true,
    "model_name": "en_core_web_trf"
}

Omega Ingest API

See API Reference for Omega Ingest endpoints.

🛠️ Development

Running Tests

# Run all tests
poetry run pytest

# Run with coverage
poetry run pytest --cov=ingest_llm_as --cov-report=html

# Run specific test
poetry run pytest tests/test_document_parser.py -v

Code Quality

# Lint
poetry run ruff check .

# Format
poetry run ruff format .

# Type check
poetry run mypy src/ingest_llm_as/

# Pre-commit hooks
poetry run pre-commit run --all-files

Project Structure

src/ingest_llm_as/
├── api/
│   ├── __init__.py         # Router exports
│   ├── graph_parser.py     # Graph Parser endpoints
│   └── omega_ingest.py     # Omega Ingest endpoints
├── parsers/
│   ├── __init__.py         # Parser exports
│   └── document_parser.py  # DocumentParser class
├── main.py                 # FastAPI application
└── ...

📦 Dependencies

Core Dependencies

FastAPI - Web framework
Spacy - NLP processing with Transformers
NLTK - Sentence tokenization
Pydantic - Data validation
Uvicorn - ASGI server

Optional Dependencies

Langfuse - Observability
Neo4j - Knowledge graph storage
OpenAI - LLM integration

See pyproject.toml for full dependency list.

🔧 Configuration

Configure the service using environment variables (see .env.example):

# NLP Model Settings
SPACY_MODEL=en_core_web_trf

# Server Settings
HOST=0.0.0.0
PORT=8000

# Optional: Neo4j Database
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password

📚 Additional Resources

Service Status - Current development status
Deployment Guide - Production deployment
API Reference - Complete API documentation

🏗️ Architecture

The service implements an extractive NLP approach for data provenance:

Text Preprocessing - Clean and normalize input text
Sentence Tokenization - Split into sentences using NLTK
Dependency Parsing - Use Spacy Transformers for high-fidelity parsing
Entity Extraction - Identify entities with Named Entity Recognition
Relation Extraction - Extract relationships via dependency traversal
Knowledge Graph Output - Return structured nodes and edges

This approach ensures all extracted information can be traced back to the original source text.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github		.github
.roo		.roo
alembic		alembic
app/services		app/services
config		config
docs		docs
integration		integration
legacy_backup		legacy_backup
plans		plans
prompts		prompts
scripts		scripts
src		src
tests		tests
validators		validators
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.largefiles		.largefiles
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
InGest-LLM.as.code-workspace		InGest-LLM.as.code-workspace
MEMOS_ENTRY.md		MEMOS_ENTRY.md
README.md		README.md
SESSION_CONTEXT_2026-01-12.xml		SESSION_CONTEXT_2026-01-12.xml
UI_TIMEOUT_FIX.md		UI_TIMEOUT_FIX.md
alembic.ini		alembic.ini
api_ingestion_endpoints.md		api_ingestion_endpoints.md
bootstrap.py		bootstrap.py
check_dsn.py		check_dsn.py
check_env.py		check_env.py
cleanup_and_restart.ps1		cleanup_and_restart.ps1
debug_test.py		debug_test.py
init_db.py		init_db.py
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
quick_start_8000.ps1		quick_start_8000.ps1
restart_ingest_llm.ps1		restart_ingest_llm.ps1
test_api_direct.py		test_api_direct.py
test_bind.py		test_bind.py
test_bind_v2.py		test_bind_v2.py
test_connections.py		test_connections.py
test_db_conn.py		test_db_conn.py
test_endpoint.py		test_endpoint.py
test_ingest.py		test_ingest.py
test_nltk.py		test_nltk.py
test_parser.py		test_parser.py
test_sample.txt		test_sample.txt
tmpclaude-a5d2-cwd		tmpclaude-a5d2-cwd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InGest-LLM.as

🚀 Quick Start

Prerequisites

Installation

Docker

📖 Documentation

Graph Parser API

Health Check

Omega Ingest API

🛠️ Development

Running Tests

Code Quality

Project Structure

📦 Dependencies

Core Dependencies

Optional Dependencies

🔧 Configuration

📚 Additional Resources

🏗️ Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InGest-LLM.as

🚀 Quick Start

Prerequisites

Installation

Docker

📖 Documentation

Graph Parser API

Health Check

Omega Ingest API

🛠️ Development

Running Tests

Code Quality

Project Structure

📦 Dependencies

Core Dependencies

Optional Dependencies

🔧 Configuration

📚 Additional Resources

🏗️ Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages