Skip to content

NaghamProgrammer/BookWorm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

📚 BookWorm — AI Book Assistant

A conversational AI chatbot that recommends books, classifies genres, finds similar reads, and answers questions about your favourite titles.

Hugging Face Space YouTube Demo


✨ Features

Intent Example
📖 Recommendations "Recommend a mystery novel"
🎭 Vibe Search "I want something dark and psychological"
🔍 Similar Books "Books similar to Pride and Prejudice"
🤖 Genre Classification "What genre is Dune?"
📝 Summarize "Summarize 1984"
✍️ Author Lookup "Who wrote The Road?"
Opinions "Is Harry Potter worth reading?"
ℹ️ Book Details "Tell me about Dune"

🧠 How It Works

BookWorm is a pipeline of several components working together:

1. Intent Classification (Rule-Based)

User messages are classified into one of 9 intents using regex patterns — RECOMMEND, SIMILAR, VIBE_SEARCH, CLASSIFY_BOOK, SUMMARIZE, AUTHOR, OPINION, DETAIL, or GREET.

2. Title Extraction

A multi-pass title extractor pulls book names from natural language queries using:

  • Exact substring matching against all known titles
  • First-words matching (first 2–3 words of title)
  • Fuzzy matching via difflib after stripping intent phrases
  • FAISS semantic fallback for ambiguous cases

3. Fine-Tuned DistilBERT (Genre Classifier)

A distilbert-base-uncased model fine-tuned on book descriptions to predict genres. Key training details:

  • Inverse-sqrt class weighting to handle genre imbalance
  • Custom WeightedTrainer built on HuggingFace's Trainer
  • 80/10/10 stratified train/val/test split
  • 4 epochs, learning rate 2e-5, max sequence length 256

4. Semantic Search via FAISS

Every book is embedded using all-MiniLM-L6-v2 (SentenceTransformers) and indexed in a FAISS flat inner-product index. This powers both vibe-based recommendation and similar-book search.

5. Open Library Fallback

For books not in the local dataset, the chatbot queries the Open Library API to fetch author, year, and subject information.


🗂️ Dataset

Best Books Ever Dataset from Kaggle — containing titles, authors, genres, ratings, page counts, descriptions, and publication dates.

Preprocessing steps include duplicate removal, page count normalization, genre list parsing, and filtering out books with missing descriptions.


🛠️ Tech Stack

  • Model: distilbert-base-uncased (HuggingFace Transformers)
  • Embeddings: all-MiniLM-L6-v2 (SentenceTransformers)
  • Vector Search: FAISS (faiss-cpu)
  • UI: Gradio Blocks
  • External API: Open Library
  • Other: PyTorch, scikit-learn, pandas, NumPy

📁 Repository Structure

BookWorm/
├── BookWorm.ipynb        # Full training pipeline — data prep, model fine-tuning, index building
├── app.py                # Gradio app (loads pretrained artifacts and serves the chatbot)
├── requirements.txt      # Python dependencies
└── README.md

Note: Large artifacts are hosted on Hugging Face Spaces and are not included in this repository:

  • books.csv — preprocessed dataset
  • embeddings.npy — precomputed FAISS book embeddings
  • genre-classifier-final/ — fine-tuned DistilBERT weights and tokenizer

🚀 Run Locally

Option A — Use the pretrained model (recommended)

  1. Clone the repository
git clone https://github.com/NaghamProgrammer/BookWorm.git
cd BookWorm
  1. Install dependencies
pip install -r requirements.txt
  1. Download the large artifacts from Hugging Face

Go to the Hugging Face Space files and manually download:

  • books.csv
  • embeddings.npy
  • genre-classifier-final/ (the full directory)

Place them all in the root of the project folder.

  1. Launch the app
python app.py

Option B — Retrain from scratch

Run the cells in BookWorm.ipynb from top to bottom. This will:

  • Download the dataset from Kaggle via kagglehub
  • Preprocess and save books.csv
  • Fine-tune DistilBERT and save the model to genre-classifier-final/
  • Build and save the FAISS index as embeddings.npy
  • Launch the Gradio app

⚠️ Retraining requires a GPU and takes approximately 20–30 minutes.


🌐 Live Demo

Try it instantly — no setup required:

👉 BookWorm on Hugging Face Spaces


📄 License

This project is released under the MIT License.


Built with ❤️ and a lot of books.

About

Your AI Book Assistant

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors