Skip to content

LisaYllander92/data-platform-FoodHub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

144 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FoodHub logo

A Recipe Search Platform For All Your Needs

Fast, easy, and surprisingly satisfying.
Craving something good?
FoodHub turns your leftovers into something worth eating. Save money, cut waste, and satisfy your cravings – zero planning required.


πŸ“‹ Project Overview

FoodHub is a robust data platform and backend API designed to bridge the gap between raw ingredients and delicious meals. It features:

  • Smart Search: Search for recipes by entering up to 10 ingredients.
  • Fuzzy Matching: Handles smaller typos and misspellings (e.g., "avocdo" -> "avocado") using RapidFuzz.
  • Ranking Logic: Recipe suggestions are ranked by the number of matching ingredients.
  • Cache-First Strategy: Checks the database before calling the Spoonacular API to minimize API costs.
  • Search Statistics: Tracks and visualizes the most popular ingredient searches using Matplotlib.
  • Frontend: A lightweight web interface for searching recipes and viewing history and search statistic.

πŸ“‚ Project Structure

The project is organized into a modular directory structure to ensure a clean separation of concerns between the API, data processing, and infrastructure.

.
β”œβ”€β”€ app/                      # Core application (FastAPI)
β”‚   β”œβ”€β”€ api/                  # API endpoints (routes)
β”‚   β”œβ”€β”€ clients/              # External API clients (Spoonacular)
β”‚   β”œβ”€β”€ consumer/             # Kafka Consumer (listens and saves data)
β”‚   β”œβ”€β”€ producer/             # Kafka Producer (sends messages)
β”‚   β”œβ”€β”€ repositories/         # Database operations
β”‚   β”œβ”€β”€ schema/               # Pydantic models (internal & external)
β”‚   β”œβ”€β”€ services/             # Business logic (search, filtering)
β”‚   └── transformers/         # Data transformation logic
β”œβ”€β”€ data/                     # Data cleaning and validation scripts
β”‚   β”œβ”€β”€ cleaning_recipe.py    # Logic for cleaning recipe data
β”‚   └── flagged_recipe.py     # Logic for handling invalid data
β”œβ”€β”€ docs/                     # Architecture models and sprint logs
β”œβ”€β”€ frontend/                 # Web interface (HTML, CSS, JS)
β”œβ”€β”€ database.py               # Database connection pool (PostgreSQL)
β”œβ”€β”€ main.py                   # FastAPI application entry point
β”œβ”€β”€ docker-compose.yml        # Infrastructure and container orchestration
β”œβ”€β”€ Dockerfile                # Backend build instructions
β”œβ”€β”€ init.sql                  # Database initialization script
β”œβ”€β”€ pyproject.toml            # Project metadata and dependencies (uv)
β”œβ”€β”€ uv.lock                   # Dependency lock file
β”œβ”€β”€ README.md                 # Project documentation and setup guide
└── REQUIREMENTS.md           # Detailed project requirements

πŸ—οΈ Architecture & Data Flow

Foodhub Ecosystem

The system is built as a modern data engineering pipeline within a Docker Compose environment, ensuring seamless communication between microservices, streaming components, and cloud storage:

1. User Interaction & Frontend

The journey begins at the Frontend (localhost:8000). Users can search for recipes by ingredients, view their search history, and access data insights through automated Matplotlib visualizations.

2. FastAPI: The Orchestrator

The backend acts as the system's brain, managing the flow of data:

  • Fuzzy Search: Uses RapidFuzz to handle typos, ensuring "chiken" still returns "chicken" recipes.
  • Database Connectivity: Leverages psycopg for high-performance communication with the Supabase instance.
  • Smart Caching: FastAPI first checks the curated_recipes table. On a "cache hit," data is returned instantly to save API tokens. On a "miss," it fetches fresh data from the Spoonacular API.

3. The Streaming Pipeline (Kafka)

To ensure asynchronous processing and scalability, we utilize Apache Kafka:

  • Producer: When new data is fetched from Spoonacular, FastAPI acts as a Producer, pushing the results as events into the Kafka Cluster.
  • Consumer: A dedicated Kafka Consumer listens to the stream, reads the incoming data, and persists the raw payloads into the staging layer.

4. Storage Layer (Supabase / PostgreSQL)

Data is persisted into three distinct functional layers within Supabase to separate concerns:

  • staging_recipes (Raw Data): Stores untreated JSON payloads from Kafka for historical auditing and backup.
  • curated_recipes (Validated Data): Holds cleaned, structured, and validated recipe data, optimized for frontend performance.
  • search_log (Analytics): Logs user search queries (ingredients) to provide the data source for search frequency statistics.

5. ETL Pipeline

Throughout the flow, data moves through a classic Extract β†’ Transform β†’ Load process:

  • Extract: Raw recipe data is fetched from the Spoonacular API via get_recipe_information(), returned as JSON with camelCase fields that may contain NaN values and HTML-tagged instructions.
  • Transform: Data is cleaned and normalized in recipe_transformers.py and ingredient_service.py:
    • clean_numeric() replaces NaN/Inf with valid defaults
    • camelCase fields are converted to snake_case (e.g. readyInMinutes β†’ ready_in_minutes)
    • HTML tags are stripped from instructions using Regex
    • Ingredients are normalized to lowercase for fuzzy matching
  • Load: Cleaned data is persisted into two layers following a Medallion Architecture:
    • staging_recipes β€” raw JSON payloads for auditing (via Kafka Consumer)
    • curated_recipes β€” validated, structured data ready for the frontend

πŸ’» Tech Stack

Python FastAPI Apache Kafka PostgreSQL Supabase Docker Pandas RapidFuzz Matplotlib uv Git HTML5 CSS3 JavaScript

πŸš€ Getting Started

Prerequisites

Installation

  1. Clone the repository
git clone https://github.com/LisaYllander92/LAB2_Data_platform_FoodHub.git
cd LAB2_Data_platform_FoodHub
  1. Install dependencies
uv sync
  1. Set up your .env file with your Supabase credentials:
DB_HOST=your-db-host
DB_PORT=6543
DB_NAME=postgres
DB_USER=postgres
DB_PASSWORD=your-supabase-password
SPOONACULAR_API_KEY=your-api-key
SPOONACULAR_USERNAME=your-spoonacular-username
SPOONACULAR_HASH=your-spoonacular-hash
  1. Initialize the database schema in Supabase by running init.sql in the Supabase SQL editor.
  2. Start all services
docker compose up --build
  1. Access the app:

πŸ“Š API Endpoints

Method Endpoint Description
GET /api/recipes/search Search recipes by ingredients
GET /api/recipes/detail/{title} Get full recipe details
GET /api/recipes/history View recently saved recipes
GET /api/recipes/popular-searches Top 10 most searched ingredients
GET /api/recipes/stats/plot Bar chart of popular searches
POST /api/recipes Send a recipe to Kafka

πŸ‘€ Behind the Scenes

Detailed documentation of our architectural design and agile development process.


πŸ“Š Data Modeling

Visualizing our database structure from concept to final implementation.

Phase View Model
Step 1: Conceptual πŸ”— View Conceptual Model
Step 2: Initial Logical πŸ”— View First Logical Model
Step 3: Final Implementation πŸ”— View Final Logical Model

πŸ”„ Agile Process & Logs

We followed an agile methodology, documenting every step through activity logs and retrospectives.

Sprint Activity Logs Retrospectives
Sprint 1 πŸ“„ View Log πŸ”„ View Retro
Sprint 2 πŸ“„ View Log πŸ”„ View Retro
Sprint 3 πŸ“„ View Log πŸ”„ View Retro
Sprint 4 πŸ“„ View Log πŸ”„ View Retro

πŸ“š Resources & Transparency

Tip

Sources & AI Usage: You can find our detailed documentation on tools, sources, and AI-assisted development here: FoodHub Sources PDF

About

RESTful API built with FastAPI focusing on data integration and Pydantic validation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors