Skip to content

Commit 0fbef86

Browse files
Rename DB to school.db; remove repo links; update docs; add seed script and CI; seed demo DB
1 parent 5527d1d commit 0fbef86

File tree

8 files changed

+344
-12
lines changed

8 files changed

+344
-12
lines changed

.github/workflows/ci.yml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: ["**"]
6+
pull_request:
7+
branches: ["**"]
8+
9+
jobs:
10+
tests:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- name: Checkout repository
14+
uses: actions/checkout@v4
15+
16+
- name: Set up Python
17+
uses: actions/setup-python@v5
18+
with:
19+
python-version: '3.11'
20+
21+
- name: Install dependencies
22+
run: |
23+
python -m pip install --upgrade pip
24+
pip install -e backend[dev]
25+
26+
- name: Run tests
27+
env:
28+
GROQ_API_KEY: dummy-ci-key
29+
SQLITE_PATH: school.db
30+
working-directory: backend
31+
run: |
32+
pytest -q --cov=app --cov-report=term-missing --cov-report=xml:coverage.xml
33+
34+
- name: Upload coverage report
35+
uses: actions/upload-artifact@v4
36+
with:
37+
name: coverage-xml
38+
path: backend/coverage.xml

PRD.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# Product Requirements Document (PRD)
2+
3+
Project: Text-to-SQL Platform (FastAPI + Groq)
4+
Codebase root: `/Users/a/Documents/DataScience_World/LLM_project/TextToSQLapp`
5+
6+
## 1) Overview
7+
The Text-to-SQL platform converts natural language questions into SQL, executes the SQL against an SQLite database (`school.db`), and exposes results via a REST API.
8+
9+
- Backend: FastAPI
10+
- NL→SQL: Groq (OpenAI-compatible API)
11+
- DB: SQLite (demo), path configurable via `.env` or env var `SQLITE_PATH`
12+
13+
## 2) Goals
14+
- Reliable API endpoints for health, data access, direct SQL execution, and NL→SQL.
15+
- Reproducible local environment via Conda.
16+
- Clear docs and demo script.
17+
- Maintainable code with tests and useful coverage (≥75%).
18+
19+
## 3) Non-Goals
20+
- Full production-grade UI (future).
21+
- RBAC/SSO and multi-tenant auth (future).
22+
- Non-SQLite backends (future).
23+
24+
## 4) Users
25+
- Data analysts, educators, and developers needing quick NL→SQL.
26+
27+
## 5) Success Metrics
28+
- P0: Endpoints return correct shapes and sensible error messages.
29+
- P0: Local tests pass with coverage ≥ 75%.
30+
- P1: NL2SQL produces valid SQL for common demo questions.
31+
- P1: p95 NL2SQL latency < 3s (network and LLM dependent).
32+
33+
## 6) Architecture
34+
- App entry: `backend/app/main.py`
35+
- Routes: `backend/app/api/v1/routes.py`
36+
- Config: `backend/app/core/config.py` (Pydantic Settings)
37+
- Services: `backend/app/services/`
38+
- `db.py` (SQLite operations)
39+
- `nl2sql.py` (Groq calls)
40+
- Utils: `backend/app/utils/sql_cleaner.py`
41+
- Models/schemas: `backend/app/models/schemas.py`
42+
- Tests: `backend/tests/`
43+
- Demo: `backend/api_demo.py`
44+
45+
## 7) Functional Requirements
46+
- GET `/api/v1/health``{ "status": "ok" }`
47+
- GET `/api/v1/students` → returns rows from `STUDENT` table
48+
- POST `/api/v1/sql` with `{ "sql": "..." }` → returns `rows` or `rowcount`
49+
- POST `/api/v1/nl2sql` with `{ "question": "..." }` → returns `{ "sql": "..." }`
50+
51+
## 8) Non-Functional Requirements
52+
- Reliability: robust error handling for DB and LLM failures.
53+
- Security: `.env` ignored; no secrets in logs; basic SQL cleaning.
54+
- Observability: structured logs.
55+
- Performance: appropriate for SQLite; minimal overhead.
56+
- Maintainability: typed Python, tests, organized modules.
57+
58+
## 9) API Spec (v1)
59+
Base: `/api/v1`
60+
61+
- `GET /health` → 200 `{ "status": "ok" }`
62+
- `GET /students` → 200 `{ "rows": [[...], ...] }` | 200 `{ "rows": [] }`
63+
- `POST /sql` → 200 `{ "rows": [[...]] }` or `{ "rowcount": N }`, 400 on invalid SQL
64+
- `POST /nl2sql` → 200 `{ "sql": "SELECT ...;" }`, 502 on LLM error
65+
66+
## 10) Data Model
67+
SQLite file: `school.db`
68+
- Table: `STUDENT(NAME VARCHAR(25), CLASS VARCHAR(25), SECTION VARCHAR(25), MARKS INT)`
69+
70+
## 11) Configuration
71+
- `backend/.env.example` → copy to `backend/.env`
72+
- Vars:
73+
- `GROQ_API_KEY` (required for NL2SQL)
74+
- `GROQ_MODEL` (default `llama-3.1-70b-versatile`)
75+
- `SQLITE_PATH` (default `school.db`)
76+
77+
## 12) Dependencies
78+
- Declared in `backend/environment.yml` and `backend/pyproject.toml`.
79+
- Key: fastapi, uvicorn, groq, pydantic, pydantic-settings, python-dotenv, pytest, pytest-cov, httpx, ruff.
80+
81+
## 13) Security & Privacy
82+
- No secret commits.
83+
- Clean LLM outputs to SQL (strip markdown, ensure semicolon).
84+
- Future: schema allow-listing and stricter SQL validation.
85+
86+
## 14) Testing
87+
- Unit/integration tests in `backend/tests/`.
88+
- Temporary DB per test where needed; avoid mutating real `student.db`.
89+
- Goal: keep coverage ≥ 75% (current ~79%).
90+
91+
## 15) Observability
92+
- Logging set in `backend/app/core/logging.py`.
93+
- Future: request IDs and tracing hooks.
94+
95+
## 16) Deployment
96+
- Local: `uvicorn app.main:app --reload --port 8000 --app-dir backend`
97+
- Future: containerization and CI deploys.
98+
99+
## 17) Risks & Mitigations
100+
- LLM hallucination → cleaning and guardrails; schema awareness later.
101+
- Missing DB → seed script and clear errors.
102+
- API key missing → error with actionable message.
103+
- README divergence → maintain single source on main and use PRs.
104+
105+
## 18) Rollout Plan
106+
- Phase 1: Backend stable (current)
107+
- Phase 2: Frontend SPA
108+
- Phase 3: Security hardening, schema introspection, Docker/CI
109+
110+
## 19) Timeline (example)
111+
- W1: Stabilize backend, DB seeding, CI
112+
- W2: Frontend prototype
113+
- W3: Improve guardrails and prompts
114+
- W4: Docker & release
115+
116+
## 20) Acceptance Criteria
117+
- Endpoints function as spec’d
118+
- Demo script returns rows and valid NL→SQL
119+
- Tests pass locally/CI; coverage ≥ 75%
120+
- README documents setup and usage
121+
122+
---
123+
124+
# Step-by-Step Implementation Plan
125+
126+
## A) Environment & Setup
127+
1. Create Conda env (first time only):
128+
```bash
129+
conda env create -f backend/environment.yml
130+
# if env exists:
131+
conda env update -n text2sql-backend -f backend/environment.yml --prune
132+
conda activate text2sql-backend
133+
```
134+
2. Configure env vars:
135+
```bash
136+
cp backend/.env.example backend/.env
137+
# Edit backend/.env to set GROQ_API_KEY and optionally GROQ_MODEL and SQLITE_PATH
138+
```
139+
140+
## B) Database Initialization (demo)
141+
From repo root:
142+
```bash
143+
sqlite3 school.db <<'SQL'
144+
CREATE TABLE IF NOT EXISTS STUDENT (
145+
NAME VARCHAR(25),
146+
CLASS VARCHAR(25),
147+
SECTION VARCHAR(25),
148+
MARKS INT
149+
);
150+
151+
DELETE FROM STUDENT;
152+
153+
INSERT INTO STUDENT (NAME, CLASS, SECTION, MARKS) VALUES
154+
('Alice','Data Science','A',85),
155+
('Bob','Data Science','B',78),
156+
('Charlie','AI','A',92),
157+
('Diana','AI','B',88);
158+
SQL
159+
```
160+
If you want a different path:
161+
- Set `SQLITE_PATH=/absolute/path/to/school.db` in `backend/.env`
162+
163+
## C) Run the API
164+
```bash
165+
uvicorn app.main:app --reload --port 8000 --app-dir backend
166+
# Docs: http://127.0.0.1:8000/docs
167+
```
168+
169+
## D) Sanity Test (curl)
170+
```bash
171+
curl http://127.0.0.1:8000/api/v1/health
172+
curl http://127.0.0.1:8000/api/v1/students
173+
curl -X POST http://127.0.0.1:8000/api/v1/sql \
174+
-H 'Content-Type: application/json' \
175+
-d '{"sql":"SELECT COUNT(*) FROM STUDENT;"}'
176+
```
177+
178+
## E) NL→SQL Demo
179+
```bash
180+
python backend/api_demo.py
181+
```
182+
Requires `GROQ_API_KEY` to be set in `backend/.env`.
183+
184+
## F) Running Tests
185+
```bash
186+
cd backend
187+
pytest -q --cov=app --cov-report=term-missing
188+
```
189+
Expected: All tests pass, coverage ~79%.
190+
191+
## G) Development Workflow
192+
1. Branch from `main`: `git switch -c feature/<name>`
193+
2. Make changes; keep commits focused.
194+
3. Run tests locally.
195+
4. Push branch and open PR.
196+
5. Address review, squash/rebase as appropriate.
197+
198+
## H) Future Enhancements
199+
- Frontend SPA (React/Vue/Svelte) consuming `/api/v1`
200+
- AuthN/Z, rate limiting
201+
- Schema introspection and allow-listed SQL
202+
- Dockerfile + GitHub Actions CI
203+
- Prompt tuning and fallback strategies for NL2SQL
204+
205+
---
206+
207+
# Appendix
208+
209+
## Files & Paths
210+
- App: `backend/app/main.py`
211+
- Routes: `backend/app/api/v1/routes.py`
212+
- Services: `backend/app/services/`
213+
- Config: `backend/app/core/config.py`
214+
- Tests: `backend/tests/`
215+
- Demo: `backend/api_demo.py`
216+
- Environment: `backend/environment.yml`
217+
218+
## Troubleshooting
219+
- "no such table: STUDENT": seed DB (see section B) or set `SQLITE_PATH`.
220+
- 502 from `/nl2sql`: ensure `GROQ_API_KEY` is set and network available.
221+
- Port in use: `lsof -ti:8000 | xargs -r kill -9` then restart.

README.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,9 @@
11
# Text-to-SQL Platform (FastAPI + Groq) 🚀
22

3-
This repository provides a modular backend that converts natural language into SQL and executes it on an SQLite database (`student.db`).
3+
This repository provides a modular backend that converts natural language into SQL and executes it on an SQLite database (`school.db`).
44

55
The backend is built with FastAPI and uses Groq’s OpenAI-compatible API to generate SQL from English questions. A separate modern UI will be added by the frontend team.
66

7-
[Repo: austinLorenzMccoy/sql-query-generator](https://github.com/austinLorenzMccoy/sql-query-generator)
8-
97
![build](https://img.shields.io/badge/build-passing-brightgreen)
108
![tests](https://img.shields.io/badge/tests-100%20pass-green)
119
![coverage](https://img.shields.io/badge/coverage-79%25-yellow)
@@ -66,6 +64,11 @@ cp backend/.env.example backend/.env
6664
# edit backend/.env and set GROQ_API_KEY and (optionally) GROQ_MODEL
6765
```
6866

67+
2.5) Seed the database (creates/refreshes `school.db`)
68+
```bash
69+
python backend/scripts/seed_db.py
70+
```
71+
6972
3) Run the API (from repo root)
7073
```bash
7174
uvicorn app.main:app --reload --port 8000 --app-dir backend
@@ -83,7 +86,7 @@ Here are some example questions you can ask:
8386

8487
## Database Schema
8588

86-
The database **student.db** has the following schema:
89+
The database **school.db** has the following schema:
8790

8891
| Column | Type | Description |
8992
|---------|---------|--------------------------------------|
@@ -112,7 +115,7 @@ The database **student.db** has the following schema:
112115
113116
│ SQL
114117
115-
SQLite: student.db
118+
SQLite: school.db
116119
```
117120

118121
## Demo ▶️
@@ -152,7 +155,7 @@ cd backend
152155
pytest -q --cov=app --cov-report=term-missing
153156
```
154157

155-
Tests use a temporary SQLite DB and do not touch your real `student.db`.
158+
Tests use a temporary SQLite DB and do not touch your real `school.db`.
156159

157160
## Roadmap 🗺️
158161

backend/.env.example

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Copy this file to .env and fill in the values
22
APP_NAME=Text-to-SQL Backend
3-
SQLITE_PATH=../student.db
3+
SQLITE_PATH=../school.db
44

55
# Groq settings
66
GROQ_API_KEY="your_groq_api_key_here"

backend/README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Text-to-SQL Backend (FastAPI + Groq)
22

3-
A modular FastAPI backend that converts natural language to SQL for the `STUDENT` table in `student.db` and executes queries safely.
3+
A modular FastAPI backend that converts natural language to SQL for the `STUDENT` table in `school.db` and executes queries safely.
44

55
## Project Structure
66

@@ -40,7 +40,7 @@ backend/
4040

4141
- Python 3.9+
4242
- Conda (for environment management)
43-
- SQLite database at `TextToSQLapp/student.db` (already in repo)
43+
- SQLite database at `TextToSQLapp/school.db` (can be created/seeded)
4444

4545
## Quickstart (Conda + Uvicorn)
4646

@@ -59,6 +59,9 @@ conda activate text2sql-backend
5959
cp backend/.env.example backend/.env
6060
# then edit backend/.env and set GROQ_API_KEY
6161

62+
# seed demo database (creates/refreshes `school.db`)
63+
python backend/scripts/seed_db.py
64+
6265
# run API
6366
uvicorn app.main:app --reload --port 8000 --app-dir backend
6467
```
@@ -101,14 +104,14 @@ pytest
101104
pytest -q --cov=app --cov-report=term-missing --cov-report=xml:coverage.xml
102105
```
103106

104-
The test suite uses a temporary SQLite database and does not touch your real `student.db`.
107+
The test suite uses a temporary SQLite database and does not touch your real `school.db`.
105108

106109
## Configuration
107110

108111
Configuration is handled by `app/core/config.py` using Pydantic BaseSettings:
109112

110113
- `APP_NAME` (default: Text-to-SQL Backend)
111-
- `SQLITE_PATH` (default: student.db)
114+
- `SQLITE_PATH` (default: school.db)
112115
- `GROQ_API_KEY` (required for /nl2sql)
113116
- `GROQ_MODEL` (default: llama-3.1-70b-versatile)
114117

backend/app/core/config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
class Settings(BaseSettings):
1111
app_name: str = Field("Text-to-SQL Backend", env="APP_NAME")
12-
sqlite_path: str = Field("student.db", env="SQLITE_PATH")
12+
sqlite_path: str = Field("school.db", env="SQLITE_PATH")
1313

1414
# Groq (primary)
1515
groq_api_key: str | None = Field(default=None, env="GROQ_API_KEY")

0 commit comments

Comments
 (0)