Minimal RAG + automation portfolio: CLI, Gradio demos, Docker, CI — no external APIs
A compact, batteries-included AI & automation showcase. It bundles a reusable RAG stub, document Q&A, workflow chatbot, résumé shortlister, and reporting/content automation – each with Gradio demos, CLI entry points, Docker support, and continuous integration.
All examples use synthetic data only. Never commit secrets; use placeholders such as OPENAI_API_KEY=PLACEHOLDER.
- Features
- Quickstart (local)
- CLI usage
- Demos (Gradio)
- Docker (multi-service)
- CI / lint / tests
- Project structure
- Troubleshooting
- FAQ
- Security & data
- Licence
- RAG core – keyword-based
answer()plus a light-weightSimpleIndexwith JSON save/load helpers. - Demos and pipelines – Gradio apps for RAG, document Q&A (TXT and optional PDF), chatbot (ticket/scheduling stubs), résumé ranking, and reporting (Markdown summary, short social post, SVG card, JSON scheduler).
- Developer UX – src-layout, editable install (
pip install -e .), CLI (llm-rag), Docker Compose services, GitHub Actions. - Minimal dependencies – Python stdlib and Gradio. PDF support is optional via
pypdf.
pip install -r requirements.txt
pip install -e .
pytest -q# RAG answer using the sample corpus
llm-rag -q "What is the capital of Norway?" --docs sample_data/capitals.txt -k 1
# Persist and reload an index
llm-rag -q "Capital of Japan" --docs sample_data/capitals.txt --save-index /tmp/idx.json -k 2
llm-rag -q "Capital of Japan" --load-index /tmp/idx.json -k 1Generate sample PDFs without extra dependencies:
python scripts/generate_sample_pdf.py sample_data/capitals.pdf
python scripts/generate_sample_resume_pdf.py sample_data/resume_sample.pdf# src-layout friendly launches
PYTHONPATH=src python app/demo_gradio.py # RAG demo
PYTHONPATH=src python app/doc_qa.py # Document Q&A (TXT/PDF)
PYTHONPATH=src python app/chatbot.py # Chatbot (tickets, scheduling, RAG fallback)
PYTHONPATH=src python app/resume_demo.py # Résumé shortlister
PYTHONPATH=src python app/reporting_demo.py # Reporting & content automationAll demos respect GRADIO_SERVER_NAME and GRADIO_SERVER_PORT.
A single image powers five services on distinct ports.
docker compose up --build -d
for p in 7860 7861 7862 7863 7864; do curl -sf http://localhost:$p >/dev/null && echo "port $p OK" || echo "port $p FAIL"; donePorts: 7860 demo · 7861 document Q&A · 7862 chatbot · 7863 résumé · 7864 reporting.
If you prefer identical container ports, map host → container 7860 for each service in Compose and control the application via GRADIO_SERVER_PORT.
The workflow .github/workflows/ci.yml runs on every push, pull request, and manual dispatch. It caches pip, installs dependencies, runs pytest -q, then executes ruff check --select=E9,F63,F7,F82 ..
Locally:
pytest -q
python -m pip install ruff && ruff check --select=E9,F63,F7,F82 .app/ Gradio demos (RAG, Doc Q&A, Chatbot, Résumé, Reporting)
src/llm_rag/ Reusable modules (rag_stub, index_stub, pipeline, docqa, chatbot, resume, reporting)
sample_data/ Synthetic corpora and generated artefacts (PDFs, SVG, JSON)
tests/ Smoke and integration tests (includes PDF generation)
Dockerfile, docker-compose.yml
.github/workflows/ci.yml
- Service unreachable on 7861/7863 – ensure
GRADIO_SERVER_PORTis set per service; rebuild and rundocker compose up -d. - Pip JSON decode errors during Docker build – the Dockerfile forces the canonical PyPI index and retry logic. Behind a proxy, add trusted-host arguments.
- PDF ingestion empty –
pypdfis optional; without it the loader falls back tosample_data/capitals.txt.
Where does retrieval happen? In SimpleIndex.search() (token-based Jaccard). rag_pipeline() wires the index into rag_stub.answer().
Can I persist the index? Yes – SimpleIndex.save() and SimpleIndex.load(), or the CLI --save-index/--load-index flags.
Are secrets required? No. Use placeholders only; everything operates offline with synthetic content.
Use synthetic or auto-generated data located under sample_data/ exclusively. The project makes no outbound API calls; optional PDF parsing remains local.
MIT. Add a dedicated LICENCE file if required.
For deeper guidance, read docs/ARCHITECTURE.md, docs/DEMO.md, and docs/index.md.