An intelligent AI-powered lyrics generator built with Retrieval-Augmented Generation (RAG) and TensorFlow Keras. The application combines semantic search with neural language models to generate contextually relevant lyrics based on user input.
- Retrieval-Augmented Generation (RAG): Uses TF-IDF vectorization and cosine similarity to retrieve contextually relevant lyric snippets
- Advanced Language Model: TensorFlow Keras-based next-word prediction with temperature sampling for creative variation
- Interactive Web UI: Built with Streamlit for easy, real-time lyrics generation
- Multi-Source Data Support: Loads lyrics from both CSV files and MongoDB databases
- Docker Support: Production-ready Docker containerization for easy deployment
- Temperature Control: Adjust creativity level of generated lyrics (0.0 = deterministic, 1.0+ = creative)
- Sequence Padding: Intelligent padding for variable-length input sequences
The project combines two key AI components:
-
Retrieval Module (RAG)
- TF-IDF Vectorizer for text representation
- Cosine similarity search for context retrieval
- Returns most semantically relevant lyric from dataset
-
Generation Module
- Keras Sequential Model trained on lyric sequences
- Word tokenization and padding
- Temperature-based sampling for output diversity
- Configurable sequence length (default: 100 tokens)
lyrics_generator/
βββ main.py # Streamlit application (main entry point)
βββ lyrics_generator.ipynb # Jupyter notebook for model training & experimentation
βββ requirements.txt # Python dependencies (71 packages)
βββ Dockerfile # Docker containerization
βββ docker-compose.yml # Docker Compose configuration
βββ .env.example # Environment variables template
βββ ArianaGrande.csv # Sample dataset (Ariana Grande lyrics)
βββ README.md # This file
β
βββ models/ # Pre-trained model artifacts
β βββ rag_lyrics_model.h5 # Trained Keras model weights
β βββ tokenizer.pickle # Word tokenizer for text preprocessing
β βββ tfidf_vectorizer.pkl # Fitted TF-IDF vectorizer
β
βββ myenv/ # Virtual environment (local development)
β βββ Scripts/ # Python executables
β βββ Lib/ # Installed packages
β βββ pyvenv.cfg # Virtual env configuration
β
βββ .github/ # GitHub workflows & templates
| Component | Technology | Version |
|---|---|---|
| Backend | Python | 3.11+ |
| ML Framework | TensorFlow/Keras | 3.11.3 |
| Web Framework | Streamlit | 1.28+ |
| Data Processing | Pandas | 2.3.3 |
| ML Utilities | scikit-learn | 1.3+ |
| Database | MongoDB | 4.15.3 |
| Containerization | Docker | Latest |
| NLP | NumPy, SciPy | Latest |
- Python: 3.11 or higher
- pip: Latest version
- Git: For cloning the repository
- Docker (optional): For containerized deployment
- MongoDB (optional): For database-backed lyrics storage
git clone https://github.com/Mayankvlog/lyrics_generator_generative_ai.git
cd lyrics_generatorWindows PowerShell:
python -m venv myenv
myenv\Scripts\Activate.ps1macOS/Linux:
python3 -m venv myenv
source myenv/bin/activatepip install -r requirements.txtLocal Development:
python -m streamlit run main.pyAccess the app at: http://localhost:8501
# Build and start the service
docker-compose up -d
# View logs
docker-compose logs -f app
# Stop the service
docker-compose down# Build the image
docker build -t lyrics-generator:latest .
# Run the container
docker run -p 8502:8502 \
-e MONGODB_URI="your_mongodb_uri" \
lyrics-generator:latestAccess the app at: http://localhost:8502
Create a .env file in the project root (use .env.example as a template):
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/?retryWrites=true&w=majority
VPS_HOST=your_host
VPS_USER=your_user
VPS_PASSWORD=your_passwordThe app supports data loading from:
-
CSV Files (Default)
- Place CSV files in project root
- Ensure a
Lyriccolumn exists - Example:
ArianaGrande.csv
-
MongoDB (Optional)
- Configure
MONGODB_URIenvironment variable - Database:
food(default) - Collection:
lyrics(default) - Modify database/collection names in
main.pyif different
- Configure
Modify these in main.py if needed:
max_sequence_length = 100 # Must match training sequence length
temperature = 0.8 # Adjust generation creativity (0.0-2.0)
num_words = 100 # Number of words to generateThe model was trained on Ariana Grande lyrics using:
- Input: Sequences of tokens (max 100 words)
- Output: Next-word predictions with softmax probabilities
- Loss Function: Categorical crossentropy
- Optimizer: Adam
- Epochs: Trained on sequence data
-
Preprocessing
- User input β lowercase conversion
- Text cleaning (remove special characters)
- Tokenization using trained tokenizer
-
Retrieval
- TF-IDF vectorization of user input
- Cosine similarity search against dataset
- Return most relevant lyric snippet
-
Generation Loop
- Use retrieved lyric as seed
- Iteratively predict next word (100 iterations)
- Apply temperature sampling for diversity
- Append predictions to generate full lyric
- Open http://localhost:8501
- Enter a prompt or theme (e.g., "love", "heartbreak", "dreams")
- Adjust generation parameters:
- Temperature: 0.5 (deterministic) to 2.0 (creative)
- Number of Words: 50-200 (length of output)
- Click "Generate Lyrics"
- View generated lyrics with retrieved context
- "love and heartbreak"
- "dancing in the moonlight"
- "wish you were here"
- "breaking free"
Key Packages:
streamlit- Web UI frameworktensorflow- Deep learningkeras- High-level neural networkspandas- Data manipulationnumpy- Numerical computingscikit-learn- ML utilitiespymongo- MongoDB driverh5py- HDF5 file handlingjoblib- Model serialization
See requirements.txt for complete list with versions.
- Model Inference: ~100-200ms per generation (GPU: ~50-100ms)
- TF-IDF Search: <10ms for dataset < 10,000 lyrics
- Memory Usage: ~500MB for model + data
- Optimization Tips:
- Use GPU for faster inference:
CUDA_VISIBLE_DEVICES=0 - Cache models with
@st.cache_resource - Batch process multiple generations
- Use GPU for faster inference:
-
Check existing issues: GitHub Issues
-
Create new issue: Include:
- Python version & OS
- Error message & traceback
- Steps to reproduce
- Expected vs actual behavior
-
Discussions: Use GitHub Discussions for general questions
- Multi-artist support
- Custom model training interface
- Fine-tuning with user feedback
- Real-time lyrics quality scoring
- Export to music production software
- Advanced prompt engineering
- API endpoint for backend integration
- MLOps pipeline with CI/CD
- Model versioning & A/B testing
- Analytics & usage metrics
- Streamlit Documentation
- TensorFlow/Keras Guide
- Scikit-learn TF-IDF
- RAG Pattern
- Docker Best Practices
Made with β€οΈ by Mayank Kumar | 2025
# Build the image
docker build -t lyrics-generator:latest .
# Run the container
docker run -p 8501:8501 lyrics-generator:latest# Create .env file (copy from .env.example)
cp .env.example .env
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f app- VPS IP:
167.71.235.91 - URL: http://167.71.235.91:8501
- SSH into your VPS:
ssh root@167.71.235.91- Run the deployment script:
curl -fsSL https://raw.githubusercontent.com/Mayankvlog/lyrics_generator_generative_ai/main/deploy-vps.sh | bashOr manually:
apt-get update && apt-get install -y git curl
curl -fsSL https://get.docker.com | sh
git clone https://github.com/Mayankvlog/lyrics_generator_generative_ai.git
cd lyrics_generator_generative_ai
docker-compose up -d- Access the app:
- Open your browser and go to: http://167.71.235.91:8501
The GitHub Actions workflow automatically:
- β Tests code and dependencies
- β Builds Docker image
- β Pushes to Docker Hub
- β Deploys to VPS via SSH