Skip to content

TAJSchaaf/LatinNLPTools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latin NLP Tools Comparison

This project evaluates the speed, accuracy, and usability of four prominent Natural Language Processing (NLP) tools for Latin texts:

Two samples are used for testing:

Project Goal

To provide a reproducible and comparative analysis of Latin NLP tools for tokenisation, lemmatisation, and POS-tagging, as well as processing speed.

Project Structure

  • data/: Sample Latin texts (raw and preprocessed)
  • notebooks/: Jupyter notebooks for experiments
  • scripts/: Python scripts for preprocessing and tool execution
  • results/: Accuracy/speed metrics and visualizations

Installation

  1. Clone the repo:
    git clone https://github.com/YOUR_USERNAME/latin-nlp-comparison.git
    cd latin-nlp-comparison ```
  2. Create a virtual environment:
     source env/bin/activate ```
  3. Install dependencies:

pip install -r requirements.txt

Metrics Evaluated

  • Accuracy
    • Tokenisation
    • Lemmatisation
    • POS
  • Speed: length of time to process data
  • Usability: Observational assessment of set-up complexity, packages required, interface, export options

Wiki

See the GitHub Wiki for documentation, tool setup guides, and detailed findings.

Acknowledgments

Supervised by Bernhard Bauer at the University of Graz.

About

Testing and evaluating NLP tools for Latin texts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published