Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 19, 2026

What does this changes

Suppresses the transformers warning "Using a model of type camembert to instantiate a model of type roberta" when loading the scb_1m_th-en_spm translation model in ThEnTranslator.__init__(). Includes merge of dev branch for Python 3.13 compatibility fixes.

What was wrong

The pre-trained scb_1m_th-en_spm model checkpoint contains a CamemBERT configuration but fairseq loads it as a RoBERTa-compatible transformer. This triggers a model type mismatch warning from the transformers library during test execution.

How this fixes it

Wraps TransformerModel.from_pretrained() in a warnings.catch_warnings() context manager with a targeted filter for the specific warning pattern. The filter uses case-insensitive regex matching to handle potential message variations.

with warnings.catch_warnings():
    warnings.filterwarnings(
        "ignore",
        message="(?i).*using a model of type .* to instantiate a model of type.*",
    )
    self._model = TransformerModel.from_pretrained(...)

Merged the latest changes from dev branch to incorporate Python 3.13 syntax error fixes in tokenization modules (oskut.py, sefr_cut.py, wtsplit.py) and other compatibility improvements.

Your checklist for this pull request

  • Passed code styles and structures
  • Passed code linting checks and unit test
Original prompt

This section details on the original issue you should resolve

<issue_title>bug: translate: Using camembert model to instantiate roberta model</issue_title>
<issue_description>### Description

"You are using a model of type camembert to instantiate a model of type roberta. This is not supported for all configurations of models and can yield errors." warning issued during test_translate test.

Possibly from scb_1m_th-en_spm model.

Expected results

No warning

Current results

See https://github.com/PyThaiNLP/pythainlp/actions/runs/7861725110/job/21450401313#step:5:1449

Steps to reproduce

Run test_translate.py

PyThaiNLP version

5.0.1

Python version

3.8

Operating system and version

Linux and Mac

More info

No response

Possible solution

No response

Files

Relevant code blocks:

class TestTranslatePackage(unittest.TestCase):
def test_translate(self):
# remove("scb_1m_th-en_spm")
self.assertIsNone(download_model_all())

def download_model_all() -> None:
"""
Download all translation models in advance
"""
_download_install(_EN_TH_MODEL_NAME)
_download_install(_TH_EN_MODEL_NAME)

_EN_TH_MODEL_NAME = "scb_1m_en-th_moses"
# SCB_1M-MT_OPUS+TBASE_en-th_moses-spm_130000-16000_v1.0.tar.gz
_EN_TH_FILE_NAME = "SCB_1M-MT_OPUS+TBASE_en-th_moses-spm_130000-16000_v1.0"
_TH_EN_MODEL_NAME = "scb_1m_th-en_spm"
# SCB_1M-MT_OPUS+TBASE_th-en_spm-spm_32000-joined_v1.0.tar.gz
_TH_EN_FILE_NAME = "SCB_1M-MT_OPUS+TBASE_th-en_spm-spm_32000-joined_v1.0"
</issue_description>

Comments on the Issue (you are @copilot in this section)

@wannaphong I have a computing problems , so I will change this issues as future plan.

Oh wrong issue</comment_new>


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits January 19, 2026 14:54
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix bug using camembert model to instantiate roberta model Suppress benign camembert/roberta model type warning in ThEnTranslator Jan 19, 2026
Copilot AI requested a review from wannaphong January 19, 2026 14:57
@bact
Copy link
Member

bact commented Jan 19, 2026

@copilot There are changes in dev branch to fix the errors occured in Python 3.13. Merge the changes and run the tests again.

Copy link
Contributor Author

Copilot AI commented Jan 19, 2026

@copilot There are changes in dev branch to fix the errors occured in Python 3.13. Merge the changes and run the tests again.

Merged dev branch successfully in commit 3c56c3f. The Python 3.13 syntax fixes are now included, and all linting checks pass.

Copilot AI requested a review from bact January 20, 2026 00:00
@sonarqubecloud
Copy link

@coveralls
Copy link

Coverage Status

coverage: 53.593% (-0.03%) from 53.624%
when pulling 3c56c3f on copilot/fix-camembert-roberta-issue
into 65e6c54 on dev.

@bact bact marked this pull request as ready for review January 20, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: translate: Using camembert model to instantiate roberta model

4 participants