Suppress benign camembert/roberta model type warning in ThEnTranslator #1215

Copilot · 2026-01-19T14:48:24Z

What does this changes

Suppresses the transformers warning "Using a model of type camembert to instantiate a model of type roberta" when loading the scb_1m_th-en_spm translation model in ThEnTranslator.__init__(). Includes merge of dev branch for Python 3.13 compatibility fixes.

What was wrong

The pre-trained scb_1m_th-en_spm model checkpoint contains a CamemBERT configuration but fairseq loads it as a RoBERTa-compatible transformer. This triggers a model type mismatch warning from the transformers library during test execution.

How this fixes it

Wraps TransformerModel.from_pretrained() in a warnings.catch_warnings() context manager with a targeted filter for the specific warning pattern. The filter uses case-insensitive regex matching to handle potential message variations.

with warnings.catch_warnings():
    warnings.filterwarnings(
        "ignore",
        message="(?i).*using a model of type .* to instantiate a model of type.*",
    )
    self._model = TransformerModel.from_pretrained(...)

Merged the latest changes from dev branch to incorporate Python 3.13 syntax error fixes in tokenization modules (oskut.py, sefr_cut.py, wtsplit.py) and other compatibility improvements.

Your checklist for this pull request

Passed code styles and structures
Passed code linting checks and unit test

Original prompt

This section details on the original issue you should resolve

<issue_title>bug: translate: Using camembert model to instantiate roberta model</issue_title>
<issue_description>### Description

"You are using a model of type camembert to instantiate a model of type roberta. This is not supported for all configurations of models and can yield errors." warning issued during test_translate test.

Possibly from scb_1m_th-en_spm model.

Expected results

No warning

Current results

See https://github.com/PyThaiNLP/pythainlp/actions/runs/7861725110/job/21450401313#step:5:1449

Steps to reproduce

Run test_translate.py

PyThaiNLP version

5.0.1

Python version

3.8

Operating system and version

Linux and Mac

More info

No response

Possible solution

No response

Files

Relevant code blocks:

pythainlp/tests/test_translate.py

Lines 12 to 15 in c8b547e

class TestTranslatePackage(unittest.TestCase):

def test_translate(self):

# remove("scb_1m_th-en_spm")

self.assertIsNone(download_model_all())

pythainlp/pythainlp/translate/en_th.py

Lines 37 to 42 in c8b547e

def download_model_all() -> None:

"""

Download all translation models in advance

"""

_download_install(_EN_TH_MODEL_NAME)

_download_install(_TH_EN_MODEL_NAME)

pythainlp/pythainlp/translate/en_th.py

Lines 19 to 25 in c8b547e

_EN_TH_MODEL_NAME = "scb_1m_en-th_moses"

# SCB_1M-MT_OPUS+TBASE_en-th_moses-spm_130000-16000_v1.0.tar.gz

_EN_TH_FILE_NAME = "SCB_1M-MT_OPUS+TBASE_en-th_moses-spm_130000-16000_v1.0"

_TH_EN_MODEL_NAME = "scb_1m_th-en_spm"

# SCB_1M-MT_OPUS+TBASE_th-en_spm-spm_32000-joined_v1.0.tar.gz

_TH_EN_FILE_NAME = "SCB_1M-MT_OPUS+TBASE_th-en_spm-spm_32000-joined_v1.0"

</issue_description>

Comments on the Issue (you are @copilot in this section)
@wannaphong I have a computing problems , so I will change this issues as future plan.
Oh wrong issue</comment_new>

Fixes bug: translate: Using camembert model to instantiate roberta model #903

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>

bact · 2026-01-19T23:56:58Z

@copilot There are changes in dev branch to fix the errors occured in Python 3.13. Merge the changes and run the tests again.

Copilot · 2026-01-19T23:59:25Z

@copilot There are changes in dev branch to fix the errors occured in Python 3.13. Merge the changes and run the tests again.

Merged dev branch successfully in commit 3c56c3f. The Python 3.13 syntax fixes are now included, and all linting checks pass.

sonarqubecloud · 2026-01-20T00:00:10Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coveralls · 2026-01-20T00:07:42Z

coverage: 53.593% (-0.03%) from 53.624%
when pulling 3c56c3f on copilot/fix-camembert-roberta-issue
into 65e6c54 on dev.

Initial plan

bee58b5

Copilot AI assigned Copilot and wannaphong Jan 19, 2026

Copilot started work on behalf of wannaphong January 19, 2026 14:48 View session

Copilot AI and others added 2 commits January 19, 2026 14:54

Add warning filter to suppress camembert/roberta model type warning

224b9b8

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>

Make warning filter pattern case-insensitive

5a63099

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix bug using camembert model to instantiate roberta model~~ Suppress benign camembert/roberta model type warning in ThEnTranslator Jan 19, 2026

Copilot AI requested a review from wannaphong January 19, 2026 14:57

Copilot finished work on behalf of wannaphong January 19, 2026 14:57

Copilot started work on behalf of bact January 19, 2026 23:57 View session

Merge dev branch to include Python 3.13 fixes

3c56c3f

Copilot AI requested a review from bact January 20, 2026 00:00

Copilot finished work on behalf of bact January 20, 2026 00:00

bact marked this pull request as ready for review January 20, 2026 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suppress benign camembert/roberta model type warning in ThEnTranslator #1215

Suppress benign camembert/roberta model type warning in ThEnTranslator #1215

Copilot AI commented Jan 19, 2026 •

edited

Loading

Uh oh!

bact commented Jan 19, 2026

Uh oh!

Copilot AI commented Jan 19, 2026

Uh oh!

sonarqubecloud bot commented Jan 20, 2026

Uh oh!

coveralls commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	class TestTranslatePackage(unittest.TestCase):
	def test_translate(self):
	# remove("scb_1m_th-en_spm")
	self.assertIsNone(download_model_all())

	def download_model_all() -> None:
	"""
	Download all translation models in advance
	"""
	_download_install(_EN_TH_MODEL_NAME)
	_download_install(_TH_EN_MODEL_NAME)

	_EN_TH_MODEL_NAME = "scb_1m_en-th_moses"
	# SCB_1M-MT_OPUS+TBASE_en-th_moses-spm_130000-16000_v1.0.tar.gz
	_EN_TH_FILE_NAME = "SCB_1M-MT_OPUS+TBASE_en-th_moses-spm_130000-16000_v1.0"

	_TH_EN_MODEL_NAME = "scb_1m_th-en_spm"
	# SCB_1M-MT_OPUS+TBASE_th-en_spm-spm_32000-joined_v1.0.tar.gz
	_TH_EN_FILE_NAME = "SCB_1M-MT_OPUS+TBASE_th-en_spm-spm_32000-joined_v1.0"

Suppress benign camembert/roberta model type warning in ThEnTranslator #1215

Are you sure you want to change the base?

Suppress benign camembert/roberta model type warning in ThEnTranslator #1215

Conversation

Copilot AI commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this changes

What was wrong

How this fixes it

Your checklist for this pull request

Expected results

Current results

Steps to reproduce

PyThaiNLP version

Python version

Operating system and version

More info

Possible solution

Files

Comments on the Issue (you are @copilot in this section)

Uh oh!

bact commented Jan 19, 2026

Uh oh!

Copilot AI commented Jan 19, 2026

Uh oh!

sonarqubecloud bot commented Jan 20, 2026

Quality Gate passed

Uh oh!

coveralls commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Jan 19, 2026 •

edited

Loading