LLM Evaluation Using RAGAS

Overview

This project evaluates language model responses using the RAGAS (RAG and NLP Evaluation Metrics) framework. The evaluation process involves loading a dataset from a JSON file, evaluating various metrics, and then saving the results in a scores.json file.

Requirements

Python 3.7+
Git
OpenAI API Key (for LLM-based metrics)

Installation

1. Clone the repository

Start by cloning this repository to your local machine:

git clone https://github.com/Ritvik-G/AIISC_devkit.git
cd AIISC_devkit

2. Creating a Virtual Environment

Prior to installing the requirements, it is recommended to create a virtual environment to avoid any dependency clashes.

In your terminal or command prompt, run the following command to create a virtual environment:

python -m venv myenv

Activate the virtual environment using the following script

source myenv/bin/activate  # On macOS/Linux
myenv\Scripts\activate     # On Windows

3. Install dependencies

This project requires the following python packages. Install them using pip:

pip install -r requirements.txt

4. Set up the configuration

Prepare the dataset:
- The dataset is expected to be in the data.json format. Modify or create your own data.json file with a similar structure. You can refer to the example in the data.json file in the repository.
- You can edit the location of the dataset file by just pasting the path of your data file in DATA_FILE line in Config.py.

NOTE - There are two different types of metrics available and here's how you setup for both of them:

RAGAS
- Obtain your OpenAI API Key: If you don’t have an OpenAI API key, you can get one from OpenAI.
- Open the config.py file and replace the placeholder in LLM_METRICS['OpenAI_API'] with your actual OpenAI API key.
- Set your desired model in LLM_METRICS['OpenAI_Model'], for example, "gpt-4".
- Toggle the metrics to either True or False depending on the necessity.
- Please install ragas using the pip command given in section 2.
```
LLM_METRICS = {
    "OpenAI_API": "your-openai-api-key",
    "OpenAI_Model": "gpt-4",
    "LLMContextPrecisionWithoutReference": False,
    "LLMContextPrecisionWithReference": True,
    "LLMContextRecall": False,
    "ContextEntityRecall": True,
    "NoiseSensitivity": True,
    "ResponseRelevancy": False,
    "Faithfulness": True,
}
```

Quantitative Metrics

Toggle the metrics to either True or False depending on the necessity (similar to LLM_Metrics).

METRICS = {
    "bleu" : True, 
    "rouge" : True, 
    "meteor" : True, 
    "roberta-nli" : True,
}

Running the Evaluation

After configuring your API key and the config.py file, you can run the main.py script to evaluate the LLM using the specified metrics.

1. Run the script

To start the evaluation, simply execute the main.py file:

python main.py

This will:

Load the data from data.json.
Load the LLM-based and non-LLM-based evaluation metrics as specified in config.py.
Perform evaluation based on the active metrics.
Save the results in a new scores.json file in the current directory.

2. Output

The results will be saved to a scores.json file. The results are structured as a list of metrics with their corresponding scores. The results will be saved in two different files:

ragas_scores.json : All RAGAS based scores would be available here.
quant_scores.json : All Quantitative based scores would be available here.

Folder Structure

llm-evaluation/
│
├── main.py                # Main evaluation script
├── config.py              # Configuration file for setting up LLM-based metrics
├── data.json              # Input dataset for evaluation
├── ragas_scores.json      # RAGAS framework outputs
└── quant_scores.json      # Quantitative metrics outputs

Troubleshooting

OpenAI API key not working: Ensure that the OpenAI API key is correctly set in the config.py file. If you're using a proxy, verify the proxy settings.
Data format issues: Ensure the data.json file follows the correct format with the necessary fields like user_input, retrieved_contexts, response, and reference.
Missing dependencies: If there are missing dependencies, ensure you have correctly installed the packages using pip.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
config.py		config.py
data.json		data.json
main.py		main.py
quant_scores.json		quant_scores.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Evaluation Using RAGAS

Overview

Requirements

Installation

1. Clone the repository

2. Creating a Virtual Environment

3. Install dependencies

4. Set up the configuration

Running the Evaluation

1. Run the script

2. Output

Folder Structure

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Ritvik-G/AIISC_devkit

Folders and files

Latest commit

History

Repository files navigation

LLM Evaluation Using RAGAS

Overview

Requirements

Installation

1. Clone the repository

2. Creating a Virtual Environment

3. Install dependencies

4. Set up the configuration

Running the Evaluation

1. Run the script

2. Output

Folder Structure

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages