Suppress benign camembert/roberta model type warning in ThEnTranslator #1215
+28
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.



What does this changes
Suppresses the transformers warning "Using a model of type camembert to instantiate a model of type roberta" when loading the
scb_1m_th-en_spmtranslation model inThEnTranslator.__init__(). Includes merge of dev branch for Python 3.13 compatibility fixes.What was wrong
The pre-trained
scb_1m_th-en_spmmodel checkpoint contains a CamemBERT configuration but fairseq loads it as a RoBERTa-compatible transformer. This triggers a model type mismatch warning from the transformers library during test execution.How this fixes it
Wraps
TransformerModel.from_pretrained()in awarnings.catch_warnings()context manager with a targeted filter for the specific warning pattern. The filter uses case-insensitive regex matching to handle potential message variations.Merged the latest changes from dev branch to incorporate Python 3.13 syntax error fixes in tokenization modules (oskut.py, sefr_cut.py, wtsplit.py) and other compatibility improvements.
Your checklist for this pull request
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.