A resume search app that lets you upload PDF/text resumes, index them with embeddings, and ask natural-language questions to find the best candidates. Built with FastAPI, Streamlit, ChromaDB, and OpenAI.
- Upload resumes — PDF or plain text; files are chunked, embedded, and stored in ChromaDB
- Ask questions — e.g. "Who has the most Python experience?" or "Find candidates who know DevOps"; answers are grounded in your documents with ranked candidates
- Document library — List and download uploaded resumes (paginated)
- Storage — Local disk by default, or S3-compatible storage when configured
- Reset — Wipe the vector DB from the Settings tab when you need a fresh start
- Google SSO — When configured, sign in with Google; JWT is used for API auth
- Backend (
main.py) — FastAPI app: ingest PDF/text, embed with OpenAItext-embedding-3-small, store in ChromaDB;/askdoes retrieval + GPT-4.1-mini for answers and ranked candidates - Frontend (
ui.py) — Streamlit app: Ask tab, Resumes tab (upload + list + download), Settings (danger zone) - Storage (
storage.py) — S3 when bucket + credentials are set; otherwise uses a local directory
- Python 3.10+
- OpenAI API key
-
Clone and create a virtual environment
git clone <repo-url> cd doc-reader python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install -r requirements.txt
-
Environment variables
Create a
.envin the project root:Variable Description OPEN_API_KEYOpenAI API key (embeddings + chat) API_URLBackend URL for the Streamlit app (e.g. http://localhost:8000)FRONTEND_URLStreamlit app URL for CORS (e.g. http://localhost:8501)DOCUMENT_LOCAL_DIRLocal folder for uploaded files when not using S3 (e.g. document_storage)Optional (Google SSO):
When set, the backend requires Google sign-in for all API routes except
/and/auth/*. The frontend shows a "Sign in with Google" page when unauthenticated.Variable Description GOOGLE_CLIENT_IDOAuth 2.0 Client ID from Google Cloud Console GOOGLE_CLIENT_SECRETOAuth 2.0 Client secret JWT_SECRETSecret used to sign session JWTs (e.g. 32+ character random string) Configure in Google Cloud Console: create an OAuth 2.0 Client ID (Web application), add authorized redirect URI
https://<your-backend-host>/auth/google/callback(andhttp://localhost:8000/auth/google/callbackfor local dev).Optional (S3-compatible storage):
Variable Description DOCUMENT_BUCKETBucket name DOCUMENT_BUCKET_ACCESS_KEY_IDAccess key DOCUMENT_BUCKET_SECRET_ACCESS_KEYSecret key DOCUMENT_BUCKET_REGIONRegion (e.g. us-east-1)DOCUMENT_BUCKET_ENDPOINTCustom endpoint URL (optional) -
Run locally
Terminal 1 — Backend:
uvicorn main:app --reload
Terminal 2 — Frontend:
streamlit run ui.py
Set
API_URL=http://localhost:8000andFRONTEND_URL=http://localhost:8501in.envso the UI can call the API and CORS allows the origin.
| Method | Path | Description |
|---|---|---|
| GET | / |
Health / status |
| GET | /auth/google |
Redirect to Google sign-in (when SSO configured) |
| GET | /auth/google/callback |
OAuth callback; redirects to frontend with ?token=... |
| GET | /auth/me |
Current user email/name (Bearer token required when SSO on) |
| POST | /ingest |
Ingest raw text ({"text": "..."}) |
| POST | /ingest_pdf |
Ingest PDF (multipart file) |
| POST | /ask |
RAG Q&A ({"question": "..."}) → answer + ranked candidates + excerpts |
| GET | /documents/list |
List stored document names |
| GET | /documents/download?filename=... |
Download file (redirect to S3 presigned URL or file response) |
| POST | /wipe |
Delete all documents from the ChromaDB collection |
When Google SSO is configured, /ask, /ingest, /ingest_pdf, /documents/list, /documents/download, and /wipe require an Authorization: Bearer <token> header (or token query param for download).
The repo includes railway.toml defining two services:
- backend —
uvicorn main:app --host 0.0.0.0 --port $PORT - frontend —
streamlit run ui.py --server.port $PORT --server.address 0.0.0.0
Set in Railway:
OPEN_API_KEYAPI_URL→ public URL of the backend serviceFRONTEND_URL→ public URL of the Streamlit service (no trailing slash)DOCUMENT_LOCAL_DIRor the S3 variables if you use object storage- For Google SSO:
GOOGLE_CLIENT_ID,GOOGLE_CLIENT_SECRET,JWT_SECRET(backend only)
For production, add a /health route if your platform expects it (e.g. healthcheckPath = "/health" in railway.toml).
See repository license.