GitHub - LisaYllander92/data-platform-FoodHub: RESTful API built with FastAPI focusing on data integration and Pydantic validation.

A Recipe Search Platform For All Your Needs

Fast, easy, and surprisingly satisfying.
Craving something good?
FoodHub turns your leftovers into something worth eating. Save money, cut waste, and satisfy your cravings – zero planning required.

📋 Project Overview

FoodHub is a robust data platform and backend API designed to bridge the gap between raw ingredients and delicious meals. It features:

Smart Search: Search for recipes by entering up to 10 ingredients.
Fuzzy Matching: Handles smaller typos and misspellings (e.g., "avocdo" -> "avocado") using RapidFuzz.
Ranking Logic: Recipe suggestions are ranked by the number of matching ingredients.
Cache-First Strategy: Checks the database before calling the Spoonacular API to minimize API costs.
Search Statistics: Tracks and visualizes the most popular ingredient searches using Matplotlib.
Frontend: A lightweight web interface for searching recipes and viewing history and search statistic.

📂 Project Structure

The project is organized into a modular directory structure to ensure a clean separation of concerns between the API, data processing, and infrastructure.

.
├── app/                      # Core application (FastAPI)
│   ├── api/                  # API endpoints (routes)
│   ├── clients/              # External API clients (Spoonacular)
│   ├── consumer/             # Kafka Consumer (listens and saves data)
│   ├── producer/             # Kafka Producer (sends messages)
│   ├── repositories/         # Database operations
│   ├── schema/               # Pydantic models (internal & external)
│   ├── services/             # Business logic (search, filtering)
│   └── transformers/         # Data transformation logic
├── data/                     # Data cleaning and validation scripts
│   ├── cleaning_recipe.py    # Logic for cleaning recipe data
│   └── flagged_recipe.py     # Logic for handling invalid data
├── docs/                     # Architecture models and sprint logs
├── frontend/                 # Web interface (HTML, CSS, JS)
├── database.py               # Database connection pool (PostgreSQL)
├── main.py                   # FastAPI application entry point
├── docker-compose.yml        # Infrastructure and container orchestration
├── Dockerfile                # Backend build instructions
├── init.sql                  # Database initialization script
├── pyproject.toml            # Project metadata and dependencies (uv)
├── uv.lock                   # Dependency lock file
├── README.md                 # Project documentation and setup guide
└── REQUIREMENTS.md           # Detailed project requirements

🏗️ Architecture & Data Flow

The system is built as a modern data engineering pipeline within a Docker Compose environment, ensuring seamless communication between microservices, streaming components, and cloud storage:

1. User Interaction & Frontend

The journey begins at the Frontend (localhost:8000). Users can search for recipes by ingredients, view their search history, and access data insights through automated Matplotlib visualizations.

2. FastAPI: The Orchestrator

The backend acts as the system's brain, managing the flow of data:

Fuzzy Search: Uses RapidFuzz to handle typos, ensuring "chiken" still returns "chicken" recipes.
Database Connectivity: Leverages psycopg for high-performance communication with the Supabase instance.
Smart Caching: FastAPI first checks the curated_recipes table. On a "cache hit," data is returned instantly to save API tokens. On a "miss," it fetches fresh data from the Spoonacular API.

3. The Streaming Pipeline (Kafka)

To ensure asynchronous processing and scalability, we utilize Apache Kafka:

Producer: When new data is fetched from Spoonacular, FastAPI acts as a Producer, pushing the results as events into the Kafka Cluster.
Consumer: A dedicated Kafka Consumer listens to the stream, reads the incoming data, and persists the raw payloads into the staging layer.

4. Storage Layer (Supabase / PostgreSQL)

Data is persisted into three distinct functional layers within Supabase to separate concerns:

staging_recipes (Raw Data): Stores untreated JSON payloads from Kafka for historical auditing and backup.
curated_recipes (Validated Data): Holds cleaned, structured, and validated recipe data, optimized for frontend performance.
search_log (Analytics): Logs user search queries (ingredients) to provide the data source for search frequency statistics.

5. ETL Pipeline

Throughout the flow, data moves through a classic Extract → Transform → Load process:

Extract: Raw recipe data is fetched from the Spoonacular API via get_recipe_information(), returned as JSON with camelCase fields that may contain NaN values and HTML-tagged instructions.
Transform: Data is cleaned and normalized in recipe_transformers.py and ingredient_service.py:
- clean_numeric() replaces NaN/Inf with valid defaults
- camelCase fields are converted to snake_case (e.g. readyInMinutes → ready_in_minutes)
- HTML tags are stripped from instructions using Regex
- Ingredients are normalized to lowercase for fuzzy matching
Load: Cleaned data is persisted into two layers following a Medallion Architecture:
- staging_recipes — raw JSON payloads for auditing (via Kafka Consumer)
- curated_recipes — validated, structured data ready for the frontend

💻 Tech Stack

🚀 Getting Started

Prerequisites

Python 3.12
Docker Desktop
uv – install here
A Supabase account and project – get started here

Installation

Clone the repository

git clone https://github.com/LisaYllander92/LAB2_Data_platform_FoodHub.git
cd LAB2_Data_platform_FoodHub

Install dependencies

uv sync

Set up your .env file with your Supabase credentials:

DB_HOST=your-db-host
DB_PORT=6543
DB_NAME=postgres
DB_USER=postgres
DB_PASSWORD=your-supabase-password
SPOONACULAR_API_KEY=your-api-key
SPOONACULAR_USERNAME=your-spoonacular-username
SPOONACULAR_HASH=your-spoonacular-hash

Initialize the database schema in Supabase by running init.sql in the Supabase SQL editor.
Start all services

docker compose up --build

Access the app:

Frontend: http://localhost:8000
API docs (Swagger): http://localhost:8000/docs
Kafka UI: http://localhost:8080
Search statistics plot: http://localhost:8000/api/recipes/stats/plot

📊 API Endpoints

Method	Endpoint	Description
GET	`/api/recipes/search`	Search recipes by ingredients
GET	`/api/recipes/detail/{title}`	Get full recipe details
GET	`/api/recipes/history`	View recently saved recipes
GET	`/api/recipes/popular-searches`	Top 10 most searched ingredients
GET	`/api/recipes/stats/plot`	Bar chart of popular searches
POST	`/api/recipes`	Send a recipe to Kafka

👀 Behind the Scenes

Detailed documentation of our architectural design and agile development process.

📊 Data Modeling

Visualizing our database structure from concept to final implementation.

Phase	View Model
Step 1: Conceptual	🔗 View Conceptual Model
Step 2: Initial Logical	🔗 View First Logical Model
Step 3: Final Implementation	🔗 View Final Logical Model

🔄 Agile Process & Logs

We followed an agile methodology, documenting every step through activity logs and retrospectives.

Sprint	Activity Logs	Retrospectives
Sprint 1	📄 View Log	🔄 View Retro
Sprint 2	📄 View Log	🔄 View Retro
Sprint 3	📄 View Log	🔄 View Retro
Sprint 4	📄 View Log	🔄 View Retro

📚 Resources & Transparency

Tip

Sources & AI Usage: You can find our detailed documentation on tools, sources, and AI-assisted development here: FoodHub Sources PDF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Recipe Search Platform For All Your Needs

📋 Project Overview

FoodHub is a robust data platform and backend API designed to bridge the gap between raw ingredients and delicious meals. It features:

📂 Project Structure

🏗️ Architecture & Data Flow

1. User Interaction & Frontend

2. FastAPI: The Orchestrator

3. The Streaming Pipeline (Kafka)

4. Storage Layer (Supabase / PostgreSQL)

5. ETL Pipeline

💻 Tech Stack

🚀 Getting Started

Prerequisites

Installation

📊 API Endpoints

👀 Behind the Scenes

📊 Data Modeling

🔄 Agile Process & Logs

📚 Resources & Transparency

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
app		app
data		data
docs		docs
frontend		frontend
images		images
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Docker-compose.yml		Docker-compose.yml
Dockerfile		Dockerfile
README.md		README.md
REQUIREMENTS.md		REQUIREMENTS.md
init.sql		init.sql
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

A Recipe Search Platform For All Your Needs

📋 Project Overview

FoodHub is a robust data platform and backend API designed to bridge the gap between raw ingredients and delicious meals. It features:

📂 Project Structure

🏗️ Architecture & Data Flow

1. User Interaction & Frontend

2. FastAPI: The Orchestrator

3. The Streaming Pipeline (Kafka)

4. Storage Layer (Supabase / PostgreSQL)

5. ETL Pipeline

💻 Tech Stack

🚀 Getting Started

Prerequisites

Installation

📊 API Endpoints

👀 Behind the Scenes

📊 Data Modeling

🔄 Agile Process & Logs

📚 Resources & Transparency

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages