Skip to content

amogiska/simple_rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple RAG System with OpenAI

A simple Retrieval-Augmented Generation (RAG) system built from scratch using OpenAI's embedding and chat completion APIs.

Features

  • Vector Database: In-memory storage for text chunks and their embeddings
  • Semantic Search: Uses cosine similarity to find relevant information
  • OpenAI Integration: Uses OpenAI's text-embedding-3-small for embeddings and gpt-3.5-turbo for generation
  • Interactive Chat: Continuous prompting interface to ask multiple questions
  • Flexible Data Source: Works with any text file - just change the DATA_FILE variable

Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Set your OpenAI API key:
export OPENAI_API_KEY="your-api-key-here"
  1. Run the system:
python main.py

How it Works

  1. Indexing Phase:

    • Loads text data from the specified file (default: cat-facts.txt)
    • Creates embeddings for each line using OpenAI's embedding model
    • Stores text chunks and embeddings in an in-memory vector database
  2. Retrieval Phase:

    • Converts user query to embedding
    • Finds most similar text chunks using cosine similarity
    • Returns top 5 most relevant results with similarity scores
  3. Generation Phase:

    • Uses retrieved chunks as context
    • Generates response using OpenAI's chat completion API
    • Continues prompting for more questions until user exits

Customizing Your Data

To use your own data, simply:

  1. Create or replace the text file (each line will be treated as a separate chunk)
  2. Update the DATA_FILE variable in main.py:
    DATA_FILE = 'your-data-file.txt'

Example Usage

Building vector database...
Loaded 150 entries from cat-facts.txt
Added 150 chunks to database

Ask me anything about the loaded data! (Type 'quit' or 'exit' to stop)
====================================================================================================

Your question: How long do cats sleep?

Top 5 relevant results:
----------------------------------------------------------------------------------------------------
1. Cats spend 2/3 of every day sleeping... (similarity: 0.8432)
2. Cats sleep 16 to 18 hours per day... (similarity: 0.8201)
...
----------------------------------------------------------------------------------------------------

Answer:
Based on the context provided, cats spend about 2/3 of every day sleeping, which amounts to 16-18 hours per day...
====================================================================================================

Your question: quit
Goodbye!

Files

  • main.py: Complete RAG implementation
  • cat-facts.txt: Sample dataset (150 cat facts) - replace with your own data
  • requirements.txt: Python dependencies
  • README.md: This file

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages