Skip to content

Latest commit

 

History

History
65 lines (44 loc) · 2.93 KB

File metadata and controls

65 lines (44 loc) · 2.93 KB

Database seed: dump.sql

Table of Contents


The application seeds its in-memory SQLite database from database/dump.sql by default. This file contains the bulk of the inclusive language rules used by the NLP API.

How it's used

  • On boot, when IMPORT_FROM_DUMP=true (default in Core settings), app/db.py loads database/dump.sql into an in-memory SQLite instance.

  • If IMPORT_FROM_DUMP=false, the app falls back to copying from database/db.sqlite3 instead.

  • A pruned SQLite dump containing only the tables needed by the rule engine (e.g., rules_rule, rules_alternative, rules_falsepositive, rules_lemmatization, language declensions tables, and sources).

  • Inactive rules and alternatives are removed to keep the dataset lean.

Source of truth for rules

  • The original rules SQLite database is maintained and edited in the Witty Works Rule Editor: https://github.com/witty-works/rule-editor
  • When rules change there, export or copy the updated SQLite file from the Rule Editor and use it as input for the sync step below.

Regenerating dump.sql

  • When you update the source SQLite database (for example, new rules) regenerate the dump and supporting lookup files.
  • Use the helper script:
# Replace path/to/source.sqlite3 with your updated rules database
# (e.g., the SQLite file exported from the Rule Editor)

pdm run python -m bin.sync_sqlite_data -f path/to/source.sqlite3

Verifying changes

  • Start the API locally and check /health or run a small /v2.4/check request from the examples in API Endpoints.
  • In CI and containers, using dump.sql typically yields faster, more reproducible startups compared to bundling a binary SQLite file.

Where to look in code


See Also