- fastText Language Identification
- Categories
- Lookup Files
- Incorrect/Missing German Articles
- Model Download Utilities
- Regenerating Rule Data
Download the lid.176.bin model for automatic language detection:
wget -P training_data https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin- training_data/categories.json - Main category definitions with translations, proficiency levels, and metadata.
- training_data/diversity_dimension_drivers.json - Diversity dimension drivers (subcategory details).
- app/categories.py - Category logic, utilities, and helper functions.
Refer to public documentation for human-readable explanations:
- English: https://www.witty.works/en/categories.html
- German: https://www.witty.works/de/kategorien.html
- French: https://www.witty.works/fr/categories.html
Generated by bin/sync_sqlite_data.py when refreshing rules:
- training_data/lookup.json - Rule trigger lookup map.
- training_data/lemma_plural_lookup.json - Lemma/plural normalization assistance.
Update ./training_data/de-DE/articles.csv based on:
https://www.verbformen.de/deklination/pronomen
Context checker model helpers:
- bin/download_from_huggingface.py - Fetch pre-converted SetFit CPU models.
- bin/convert_to_cpu.py - Convert raw SetFit models to CPU-only format.
- bin/test_cpu.py - Validate a CPU model directory loads and shares memory correctly.
See database-seed.md for details on regenerating database/dump.sql and rebuilding lookup JSONs.
- Database Seed (dump.sql) - Rules database initialization
- Technical Notes - Implementation details
- Setup & Deployment - FastText model download instructions
- Back to 📋 Documentation Index