Resume Search (doc-reader)

A resume search app that lets you upload PDF/text resumes, index them with embeddings, and ask natural-language questions to find the best candidates. Built with FastAPI, Streamlit, ChromaDB, and OpenAI.

Features

Upload resumes — PDF or plain text; files are chunked, embedded, and stored in ChromaDB
Ask questions — e.g. "Who has the most Python experience?" or "Find candidates who know DevOps"; answers are grounded in your documents with ranked candidates
Document library — List and download uploaded resumes (paginated)
Storage — Local disk by default, or S3-compatible storage when configured
Reset — Wipe the vector DB from the Settings tab when you need a fresh start
Google SSO — When configured, sign in with Google; JWT is used for API auth

Architecture

Backend (main.py) — FastAPI app: ingest PDF/text, embed with OpenAI text-embedding-3-small, store in ChromaDB; /ask does retrieval + GPT-4.1-mini for answers and ranked candidates
Frontend (ui.py) — Streamlit app: Ask tab, Resumes tab (upload + list + download), Settings (danger zone)
Storage (storage.py) — S3 when bucket + credentials are set; otherwise uses a local directory

Requirements

Python 3.10+
OpenAI API key

Setup

Clone and create a virtual environment

git clone <repo-url>
cd doc-reader
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Environment variables

Create a .env in the project root:

Variable	Description
`OPEN_API_KEY`	OpenAI API key (embeddings + chat)
`API_URL`	Backend URL for the Streamlit app (e.g. `http://localhost:8000`)
`FRONTEND_URL`	Streamlit app URL for CORS (e.g. `http://localhost:8501`)
`DOCUMENT_LOCAL_DIR`	Local folder for uploaded files when not using S3 (e.g. `document_storage`)

Optional (Google SSO):

When set, the backend requires Google sign-in for all API routes except / and /auth/*. The frontend shows a "Sign in with Google" page when unauthenticated.

Variable	Description
`GOOGLE_CLIENT_ID`	OAuth 2.0 Client ID from Google Cloud Console
`GOOGLE_CLIENT_SECRET`	OAuth 2.0 Client secret
`JWT_SECRET`	Secret used to sign session JWTs (e.g. 32+ character random string)

Configure in Google Cloud Console: create an OAuth 2.0 Client ID (Web application), add authorized redirect URI https://<your-backend-host>/auth/google/callback (and http://localhost:8000/auth/google/callback for local dev).

Optional (S3-compatible storage):

Variable	Description
`DOCUMENT_BUCKET`	Bucket name
`DOCUMENT_BUCKET_ACCESS_KEY_ID`	Access key
`DOCUMENT_BUCKET_SECRET_ACCESS_KEY`	Secret key
`DOCUMENT_BUCKET_REGION`	Region (e.g. `us-east-1`)
`DOCUMENT_BUCKET_ENDPOINT`	Custom endpoint URL (optional)

Run locally

Terminal 1 — Backend:
```
uvicorn main:app --reload
```
Terminal 2 — Frontend:
```
streamlit run ui.py
```
- API: http://localhost:8000
- UI: http://localhost:8501
Set API_URL=http://localhost:8000 and FRONTEND_URL=http://localhost:8501 in .env so the UI can call the API and CORS allows the origin.

API overview

Method	Path	Description
GET	`/`	Health / status
GET	`/auth/google`	Redirect to Google sign-in (when SSO configured)
GET	`/auth/google/callback`	OAuth callback; redirects to frontend with `?token=...`
GET	`/auth/me`	Current user email/name (Bearer token required when SSO on)
POST	`/ingest`	Ingest raw text (`{"text": "..."}`)
POST	`/ingest_pdf`	Ingest PDF (multipart `file`)
POST	`/ask`	RAG Q&A (`{"question": "..."}`) → answer + ranked candidates + excerpts
GET	`/documents/list`	List stored document names
GET	`/documents/download?filename=...`	Download file (redirect to S3 presigned URL or file response)
POST	`/wipe`	Delete all documents from the ChromaDB collection

When Google SSO is configured, /ask, /ingest, /ingest_pdf, /documents/list, /documents/download, and /wipe require an Authorization: Bearer <token> header (or token query param for download).

Deployment (Railway)

The repo includes railway.toml defining two services:

backend — uvicorn main:app --host 0.0.0.0 --port $PORT
frontend — streamlit run ui.py --server.port $PORT --server.address 0.0.0.0

Set in Railway:

OPEN_API_KEY
API_URL → public URL of the backend service
FRONTEND_URL → public URL of the Streamlit service (no trailing slash)
DOCUMENT_LOCAL_DIR or the S3 variables if you use object storage
For Google SSO: GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, JWT_SECRET (backend only)

For production, add a /health route if your platform expects it (e.g. healthcheckPath = "/health" in railway.toml).

License

See repository license.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.streamlit		.streamlit
.vscode		.vscode
.gitignore		.gitignore
README.md		README.md
auth.py		auth.py
main.py		main.py
railway.toml		railway.toml
requirements.txt		requirements.txt
storage.py		storage.py
ui.py		ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Search (doc-reader)

Features

Architecture

Requirements

Setup

API overview

Deployment (Railway)

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Resume Search (doc-reader)

Features

Architecture

Requirements

Setup

API overview

Deployment (Railway)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages