|
| 1 | +# Product Requirements Document (PRD) |
| 2 | + |
| 3 | +Project: Text-to-SQL Platform (FastAPI + Groq) |
| 4 | +Codebase root: `/Users/a/Documents/DataScience_World/LLM_project/TextToSQLapp` |
| 5 | + |
| 6 | +## 1) Overview |
| 7 | +The Text-to-SQL platform converts natural language questions into SQL, executes the SQL against an SQLite database (`school.db`), and exposes results via a REST API. |
| 8 | + |
| 9 | +- Backend: FastAPI |
| 10 | +- NL→SQL: Groq (OpenAI-compatible API) |
| 11 | +- DB: SQLite (demo), path configurable via `.env` or env var `SQLITE_PATH` |
| 12 | + |
| 13 | +## 2) Goals |
| 14 | +- Reliable API endpoints for health, data access, direct SQL execution, and NL→SQL. |
| 15 | +- Reproducible local environment via Conda. |
| 16 | +- Clear docs and demo script. |
| 17 | +- Maintainable code with tests and useful coverage (≥75%). |
| 18 | + |
| 19 | +## 3) Non-Goals |
| 20 | +- Full production-grade UI (future). |
| 21 | +- RBAC/SSO and multi-tenant auth (future). |
| 22 | +- Non-SQLite backends (future). |
| 23 | + |
| 24 | +## 4) Users |
| 25 | +- Data analysts, educators, and developers needing quick NL→SQL. |
| 26 | + |
| 27 | +## 5) Success Metrics |
| 28 | +- P0: Endpoints return correct shapes and sensible error messages. |
| 29 | +- P0: Local tests pass with coverage ≥ 75%. |
| 30 | +- P1: NL2SQL produces valid SQL for common demo questions. |
| 31 | +- P1: p95 NL2SQL latency < 3s (network and LLM dependent). |
| 32 | + |
| 33 | +## 6) Architecture |
| 34 | +- App entry: `backend/app/main.py` |
| 35 | +- Routes: `backend/app/api/v1/routes.py` |
| 36 | +- Config: `backend/app/core/config.py` (Pydantic Settings) |
| 37 | +- Services: `backend/app/services/` |
| 38 | + - `db.py` (SQLite operations) |
| 39 | + - `nl2sql.py` (Groq calls) |
| 40 | +- Utils: `backend/app/utils/sql_cleaner.py` |
| 41 | +- Models/schemas: `backend/app/models/schemas.py` |
| 42 | +- Tests: `backend/tests/` |
| 43 | +- Demo: `backend/api_demo.py` |
| 44 | + |
| 45 | +## 7) Functional Requirements |
| 46 | +- GET `/api/v1/health` → `{ "status": "ok" }` |
| 47 | +- GET `/api/v1/students` → returns rows from `STUDENT` table |
| 48 | +- POST `/api/v1/sql` with `{ "sql": "..." }` → returns `rows` or `rowcount` |
| 49 | +- POST `/api/v1/nl2sql` with `{ "question": "..." }` → returns `{ "sql": "..." }` |
| 50 | + |
| 51 | +## 8) Non-Functional Requirements |
| 52 | +- Reliability: robust error handling for DB and LLM failures. |
| 53 | +- Security: `.env` ignored; no secrets in logs; basic SQL cleaning. |
| 54 | +- Observability: structured logs. |
| 55 | +- Performance: appropriate for SQLite; minimal overhead. |
| 56 | +- Maintainability: typed Python, tests, organized modules. |
| 57 | + |
| 58 | +## 9) API Spec (v1) |
| 59 | +Base: `/api/v1` |
| 60 | + |
| 61 | +- `GET /health` → 200 `{ "status": "ok" }` |
| 62 | +- `GET /students` → 200 `{ "rows": [[...], ...] }` | 200 `{ "rows": [] }` |
| 63 | +- `POST /sql` → 200 `{ "rows": [[...]] }` or `{ "rowcount": N }`, 400 on invalid SQL |
| 64 | +- `POST /nl2sql` → 200 `{ "sql": "SELECT ...;" }`, 502 on LLM error |
| 65 | + |
| 66 | +## 10) Data Model |
| 67 | +SQLite file: `school.db` |
| 68 | +- Table: `STUDENT(NAME VARCHAR(25), CLASS VARCHAR(25), SECTION VARCHAR(25), MARKS INT)` |
| 69 | + |
| 70 | +## 11) Configuration |
| 71 | +- `backend/.env.example` → copy to `backend/.env` |
| 72 | +- Vars: |
| 73 | + - `GROQ_API_KEY` (required for NL2SQL) |
| 74 | + - `GROQ_MODEL` (default `llama-3.1-70b-versatile`) |
| 75 | + - `SQLITE_PATH` (default `school.db`) |
| 76 | + |
| 77 | +## 12) Dependencies |
| 78 | +- Declared in `backend/environment.yml` and `backend/pyproject.toml`. |
| 79 | +- Key: fastapi, uvicorn, groq, pydantic, pydantic-settings, python-dotenv, pytest, pytest-cov, httpx, ruff. |
| 80 | + |
| 81 | +## 13) Security & Privacy |
| 82 | +- No secret commits. |
| 83 | +- Clean LLM outputs to SQL (strip markdown, ensure semicolon). |
| 84 | +- Future: schema allow-listing and stricter SQL validation. |
| 85 | + |
| 86 | +## 14) Testing |
| 87 | +- Unit/integration tests in `backend/tests/`. |
| 88 | +- Temporary DB per test where needed; avoid mutating real `student.db`. |
| 89 | +- Goal: keep coverage ≥ 75% (current ~79%). |
| 90 | + |
| 91 | +## 15) Observability |
| 92 | +- Logging set in `backend/app/core/logging.py`. |
| 93 | +- Future: request IDs and tracing hooks. |
| 94 | + |
| 95 | +## 16) Deployment |
| 96 | +- Local: `uvicorn app.main:app --reload --port 8000 --app-dir backend` |
| 97 | +- Future: containerization and CI deploys. |
| 98 | + |
| 99 | +## 17) Risks & Mitigations |
| 100 | +- LLM hallucination → cleaning and guardrails; schema awareness later. |
| 101 | +- Missing DB → seed script and clear errors. |
| 102 | +- API key missing → error with actionable message. |
| 103 | +- README divergence → maintain single source on main and use PRs. |
| 104 | + |
| 105 | +## 18) Rollout Plan |
| 106 | +- Phase 1: Backend stable (current) |
| 107 | +- Phase 2: Frontend SPA |
| 108 | +- Phase 3: Security hardening, schema introspection, Docker/CI |
| 109 | + |
| 110 | +## 19) Timeline (example) |
| 111 | +- W1: Stabilize backend, DB seeding, CI |
| 112 | +- W2: Frontend prototype |
| 113 | +- W3: Improve guardrails and prompts |
| 114 | +- W4: Docker & release |
| 115 | + |
| 116 | +## 20) Acceptance Criteria |
| 117 | +- Endpoints function as spec’d |
| 118 | +- Demo script returns rows and valid NL→SQL |
| 119 | +- Tests pass locally/CI; coverage ≥ 75% |
| 120 | +- README documents setup and usage |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +# Step-by-Step Implementation Plan |
| 125 | + |
| 126 | +## A) Environment & Setup |
| 127 | +1. Create Conda env (first time only): |
| 128 | + ```bash |
| 129 | + conda env create -f backend/environment.yml |
| 130 | + # if env exists: |
| 131 | + conda env update -n text2sql-backend -f backend/environment.yml --prune |
| 132 | + conda activate text2sql-backend |
| 133 | + ``` |
| 134 | +2. Configure env vars: |
| 135 | + ```bash |
| 136 | + cp backend/.env.example backend/.env |
| 137 | + # Edit backend/.env to set GROQ_API_KEY and optionally GROQ_MODEL and SQLITE_PATH |
| 138 | + ``` |
| 139 | + |
| 140 | +## B) Database Initialization (demo) |
| 141 | +From repo root: |
| 142 | +```bash |
| 143 | +sqlite3 school.db <<'SQL' |
| 144 | +CREATE TABLE IF NOT EXISTS STUDENT ( |
| 145 | + NAME VARCHAR(25), |
| 146 | + CLASS VARCHAR(25), |
| 147 | + SECTION VARCHAR(25), |
| 148 | + MARKS INT |
| 149 | +); |
| 150 | +
|
| 151 | +DELETE FROM STUDENT; |
| 152 | +
|
| 153 | +INSERT INTO STUDENT (NAME, CLASS, SECTION, MARKS) VALUES |
| 154 | +('Alice','Data Science','A',85), |
| 155 | +('Bob','Data Science','B',78), |
| 156 | +('Charlie','AI','A',92), |
| 157 | +('Diana','AI','B',88); |
| 158 | +SQL |
| 159 | +``` |
| 160 | +If you want a different path: |
| 161 | +- Set `SQLITE_PATH=/absolute/path/to/school.db` in `backend/.env` |
| 162 | + |
| 163 | +## C) Run the API |
| 164 | +```bash |
| 165 | +uvicorn app.main:app --reload --port 8000 --app-dir backend |
| 166 | +# Docs: http://127.0.0.1:8000/docs |
| 167 | +``` |
| 168 | + |
| 169 | +## D) Sanity Test (curl) |
| 170 | +```bash |
| 171 | +curl http://127.0.0.1:8000/api/v1/health |
| 172 | +curl http://127.0.0.1:8000/api/v1/students |
| 173 | +curl -X POST http://127.0.0.1:8000/api/v1/sql \ |
| 174 | + -H 'Content-Type: application/json' \ |
| 175 | + -d '{"sql":"SELECT COUNT(*) FROM STUDENT;"}' |
| 176 | +``` |
| 177 | + |
| 178 | +## E) NL→SQL Demo |
| 179 | +```bash |
| 180 | +python backend/api_demo.py |
| 181 | +``` |
| 182 | +Requires `GROQ_API_KEY` to be set in `backend/.env`. |
| 183 | + |
| 184 | +## F) Running Tests |
| 185 | +```bash |
| 186 | +cd backend |
| 187 | +pytest -q --cov=app --cov-report=term-missing |
| 188 | +``` |
| 189 | +Expected: All tests pass, coverage ~79%. |
| 190 | + |
| 191 | +## G) Development Workflow |
| 192 | +1. Branch from `main`: `git switch -c feature/<name>` |
| 193 | +2. Make changes; keep commits focused. |
| 194 | +3. Run tests locally. |
| 195 | +4. Push branch and open PR. |
| 196 | +5. Address review, squash/rebase as appropriate. |
| 197 | + |
| 198 | +## H) Future Enhancements |
| 199 | +- Frontend SPA (React/Vue/Svelte) consuming `/api/v1` |
| 200 | +- AuthN/Z, rate limiting |
| 201 | +- Schema introspection and allow-listed SQL |
| 202 | +- Dockerfile + GitHub Actions CI |
| 203 | +- Prompt tuning and fallback strategies for NL2SQL |
| 204 | + |
| 205 | +--- |
| 206 | + |
| 207 | +# Appendix |
| 208 | + |
| 209 | +## Files & Paths |
| 210 | +- App: `backend/app/main.py` |
| 211 | +- Routes: `backend/app/api/v1/routes.py` |
| 212 | +- Services: `backend/app/services/` |
| 213 | +- Config: `backend/app/core/config.py` |
| 214 | +- Tests: `backend/tests/` |
| 215 | +- Demo: `backend/api_demo.py` |
| 216 | +- Environment: `backend/environment.yml` |
| 217 | + |
| 218 | +## Troubleshooting |
| 219 | +- "no such table: STUDENT": seed DB (see section B) or set `SQLITE_PATH`. |
| 220 | +- 502 from `/nl2sql`: ensure `GROQ_API_KEY` is set and network available. |
| 221 | +- Port in use: `lsof -ti:8000 | xargs -r kill -9` then restart. |
0 commit comments