Evaluating and improving smallish language models for the purpose of extracting data from short text passages according to a schema that's provided at inference time. Will contain data preparation/validation code, evaluation code, and code for refinement methods like few-shot prompting and possibly self-consistency
For a summary of this project, please see the project's final report.
On Windows:
set OPENAI_API_KEY=your_api_key_here
set ANTHROPIC_API_KEY=your_api_key_here
set GOOGLE_DEEPMIND_API_KEY=your_api_key_here
set DEEPINFRA_API_KEY=your_api_key_hereOn macOS/Linux:
export OPENAI_API_KEY=your_api_key_here
export ANTHROPIC_API_KEY=your_api_key_here
export GOOGLE_DEEPMIND_API_KEY=your_api_key_here
export DEEPINFRA_API_KEY=your_api_key_hereIn Pycharm, you can modify a script's run configuration to include the environment variables.
In VS Code, you can do something similar.
If your Gemini/Google-DeepMind API key is on the Free Tier, you should also set the GOOGLE_DEEPMIND_API_KEY_IS_FREE_TIER environment variable to True. This will slow things down but will avert job failures.
When comparing extractions we will ignore case based mismatches.
We also ignore singular vs plural discrepancies.