This project evaluates language model responses using the RAGAS (RAG and NLP Evaluation Metrics) framework.
The evaluation process involves loading a dataset from a JSON file, evaluating various metrics, and then saving the results in a scores.json file.
- Python 3.7+
- Git
- OpenAI API Key (for LLM-based metrics)
Start by cloning this repository to your local machine:
git clone https://github.com/Ritvik-G/AIISC_devkit.git
cd AIISC_devkitPrior to installing the requirements, it is recommended to create a virtual environment to avoid any dependency clashes.
In your terminal or command prompt, run the following command to create a virtual environment:
python -m venv myenvActivate the virtual environment using the following script
source myenv/bin/activate # On macOS/Linux
myenv\Scripts\activate # On WindowsThis project requires the following python packages. Install them using pip:
pip install -r requirements.txt- Prepare the dataset:
- The dataset is expected to be in the
data.jsonformat. Modify or create your owndata.jsonfile with a similar structure. You can refer to the example in thedata.jsonfile in the repository. - You can edit the location of the dataset file by just pasting the path of your data file in
DATA_FILEline inConfig.py.
- The dataset is expected to be in the
NOTE - There are two different types of metrics available and here's how you setup for both of them:
-
RAGAS
- Obtain your OpenAI API Key: If you don’t have an OpenAI API key, you can get one from OpenAI.
- Open the
config.pyfile and replace the placeholder inLLM_METRICS['OpenAI_API']with your actual OpenAI API key. - Set your desired model in
LLM_METRICS['OpenAI_Model'], for example,"gpt-4". - Toggle the metrics to either True or False depending on the necessity.
- Please install
ragasusing the pip command given in section 2.
LLM_METRICS = { "OpenAI_API": "your-openai-api-key", "OpenAI_Model": "gpt-4", "LLMContextPrecisionWithoutReference": False, "LLMContextPrecisionWithReference": True, "LLMContextRecall": False, "ContextEntityRecall": True, "NoiseSensitivity": True, "ResponseRelevancy": False, "Faithfulness": True, }
-
Quantitative Metrics
- Toggle the metrics to either True or False depending on the necessity (similar to LLM_Metrics).
METRICS = { "bleu" : True, "rouge" : True, "meteor" : True, "roberta-nli" : True, }
After configuring your API key and the config.py file, you can run the main.py script to evaluate the LLM using the specified metrics.
To start the evaluation, simply execute the main.py file:
python main.pyThis will:
- Load the data from
data.json. - Load the LLM-based and non-LLM-based evaluation metrics as specified in
config.py. - Perform evaluation based on the active metrics.
- Save the results in a new
scores.jsonfile in the current directory.
The results will be saved to a scores.json file. The results are structured as a list of metrics with their corresponding scores.
The results will be saved in two different files:
ragas_scores.json: All RAGAS based scores would be available here.quant_scores.json: All Quantitative based scores would be available here.
llm-evaluation/
│
├── main.py # Main evaluation script
├── config.py # Configuration file for setting up LLM-based metrics
├── data.json # Input dataset for evaluation
├── ragas_scores.json # RAGAS framework outputs
└── quant_scores.json # Quantitative metrics outputs
- OpenAI API key not working: Ensure that the OpenAI API key is correctly set in the
config.pyfile. If you're using a proxy, verify the proxy settings. - Data format issues: Ensure the
data.jsonfile follows the correct format with the necessary fields likeuser_input,retrieved_contexts,response, andreference. - Missing dependencies: If there are missing dependencies, ensure you have correctly installed the packages using
pip.
This project is licensed under the MIT License - see the LICENSE file for details.