📚 BookWorm — AI Book Assistant

A conversational AI chatbot that recommends books, classifies genres, finds similar reads, and answers questions about your favourite titles.

✨ Features

Intent	Example
📖 Recommendations	"Recommend a mystery novel"
🎭 Vibe Search	"I want something dark and psychological"
🔍 Similar Books	"Books similar to Pride and Prejudice"
🤖 Genre Classification	"What genre is Dune?"
📝 Summarize	"Summarize 1984"
✍️ Author Lookup	"Who wrote The Road?"
⭐ Opinions	"Is Harry Potter worth reading?"
ℹ️ Book Details	"Tell me about Dune"

🧠 How It Works

BookWorm is a pipeline of several components working together:

1. Intent Classification (Rule-Based)

User messages are classified into one of 9 intents using regex patterns — RECOMMEND, SIMILAR, VIBE_SEARCH, CLASSIFY_BOOK, SUMMARIZE, AUTHOR, OPINION, DETAIL, or GREET.

2. Title Extraction

A multi-pass title extractor pulls book names from natural language queries using:

Exact substring matching against all known titles
First-words matching (first 2–3 words of title)
Fuzzy matching via difflib after stripping intent phrases
FAISS semantic fallback for ambiguous cases

3. Fine-Tuned DistilBERT (Genre Classifier)

A distilbert-base-uncased model fine-tuned on book descriptions to predict genres. Key training details:

Inverse-sqrt class weighting to handle genre imbalance
Custom WeightedTrainer built on HuggingFace's Trainer
80/10/10 stratified train/val/test split
4 epochs, learning rate 2e-5, max sequence length 256

4. Semantic Search via FAISS

Every book is embedded using all-MiniLM-L6-v2 (SentenceTransformers) and indexed in a FAISS flat inner-product index. This powers both vibe-based recommendation and similar-book search.

5. Open Library Fallback

For books not in the local dataset, the chatbot queries the Open Library API to fetch author, year, and subject information.

🗂️ Dataset

Best Books Ever Dataset from Kaggle — containing titles, authors, genres, ratings, page counts, descriptions, and publication dates.

Preprocessing steps include duplicate removal, page count normalization, genre list parsing, and filtering out books with missing descriptions.

🛠️ Tech Stack

Model: distilbert-base-uncased (HuggingFace Transformers)
Embeddings: all-MiniLM-L6-v2 (SentenceTransformers)
Vector Search: FAISS (faiss-cpu)
UI: Gradio Blocks
External API: Open Library
Other: PyTorch, scikit-learn, pandas, NumPy

📁 Repository Structure

BookWorm/
├── BookWorm.ipynb        # Full training pipeline — data prep, model fine-tuning, index building
├── app.py                # Gradio app (loads pretrained artifacts and serves the chatbot)
├── requirements.txt      # Python dependencies
└── README.md

Note: Large artifacts are hosted on Hugging Face Spaces and are not included in this repository:

books.csv — preprocessed dataset

embeddings.npy — precomputed FAISS book embeddings

genre-classifier-final/ — fine-tuned DistilBERT weights and tokenizer

🚀 Run Locally

Option A — Use the pretrained model (recommended)

Clone the repository

git clone https://github.com/NaghamProgrammer/BookWorm.git
cd BookWorm

Install dependencies

pip install -r requirements.txt

Download the large artifacts from Hugging Face

Go to the Hugging Face Space files and manually download:

books.csv
embeddings.npy
genre-classifier-final/ (the full directory)

Place them all in the root of the project folder.

Launch the app

python app.py

Option B — Retrain from scratch

Run the cells in BookWorm.ipynb from top to bottom. This will:

Download the dataset from Kaggle via kagglehub
Preprocess and save books.csv
Fine-tune DistilBERT and save the model to genre-classifier-final/
Build and save the FAISS index as embeddings.npy
Launch the Gradio app

⚠️ Retraining requires a GPU and takes approximately 20–30 minutes.

🌐 Live Demo

Try it instantly — no setup required:

👉 BookWorm on Hugging Face Spaces

📄 License

This project is released under the MIT License.

Built with ❤️ and a lot of books.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 BookWorm — AI Book Assistant

✨ Features

🧠 How It Works

1. Intent Classification (Rule-Based)

2. Title Extraction

3. Fine-Tuned DistilBERT (Genre Classifier)

4. Semantic Search via FAISS

5. Open Library Fallback

🗂️ Dataset

🛠️ Tech Stack

📁 Repository Structure

🚀 Run Locally

Option A — Use the pretrained model (recommended)

Option B — Retrain from scratch

🌐 Live Demo

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
BookWorm.ipynb		BookWorm.ipynb
README.md		README.md
app.py		app.py

Folders and files

Latest commit

History

Repository files navigation

📚 BookWorm — AI Book Assistant

✨ Features

🧠 How It Works

1. Intent Classification (Rule-Based)

2. Title Extraction

3. Fine-Tuned DistilBERT (Genre Classifier)

4. Semantic Search via FAISS

5. Open Library Fallback

🗂️ Dataset

🛠️ Tech Stack

📁 Repository Structure

🚀 Run Locally

Option A — Use the pretrained model (recommended)

Option B — Retrain from scratch

🌐 Live Demo

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages