Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 19, 2026

What does this changes

Adds optional exclude_words parameter to all translation methods, allowing users to preserve specific words (proper nouns, technical terms, brand names) in their original form during translation.

What was wrong

The translation module had no mechanism to exclude specific words from translation. Users needed to preserve certain terms like brand names, technical jargon, or proper nouns but had to manually post-process translations.

How this fixes it

Core Implementation:

  • Added _prepare_text_with_exclusions() and _restore_excluded_words() helper functions in core.py
  • Uses unique placeholders (<<<PYTHAINLP_EXCLUDE_N>>>) to mark excluded words before translation
  • Implements regex-based word boundary matching for space-separated languages (English, French)
  • Falls back to direct replacement for non-space languages (Thai, Chinese)
  • Handles duplicates, overlapping words, and partial matches correctly

API Changes:

  • Updated Translate.translate() with exclude_words: list[str] | None = None parameter
  • Updated all translator classes: EnThTranslator, ThEnTranslator, ThZhTranslator, ZhThTranslator, ThFrTranslator, Small100Translator

Testing:

  • Added comprehensive test suite in tests/extra/testx_translate_helpers.py with 12 unit tests covering all edge cases
  • Updated existing tests in tests/extra/testx_translate.py to verify end-to-end functionality
  • All tests pass after merging Python 3.13 compatibility fixes from dev branch

Compatibility:

  • Merged latest changes from dev branch to incorporate Python 3.13 SyntaxError fixes
  • Verified all tests pass with the updated codebase

Example:

from pythainlp.translate import Translate

translator = Translate("th", "en")

# Preserve brand names and acronyms
result = translator.translate(
    "ฉันใช้ PyThaiNLP ในการทำ NLP",
    exclude_words=["PyThaiNLP", "NLP"]
)
# Output: "I use PyThaiNLP to do NLP"

Your checklist for this pull request

  • Passed code styles and structures
  • Passed code linting checks and unit test
Original prompt

This section details on the original issue you should resolve

<issue_title>Exclude words during translation with PyThaiNLP Translate</issue_title>
<issue_description>Hi all, was wondering if you guys have any method to exclude some words during the translation. </issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 5 commits January 19, 2026 14:53
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Copilot AI changed the title [WIP] Add functionality to exclude words during translation Add exclude_words parameter to translation methods Jan 19, 2026
Copilot AI requested a review from wannaphong January 19, 2026 15:04
@wannaphong
Copy link
Member

@copilot move test to tests/extra

@bact bact marked this pull request as ready for review January 19, 2026 15:07
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 19, 2026

@copilot move test to tests/extra

Done. Moved to tests/extra/testx_translate_helpers.py in commit 76495b5.

@bact
Copy link
Member

bact commented Jan 19, 2026

@copilot There are changes in dev branch to fix the errors occured in Python 3.13. Merge the changes and run the tests again.

Copy link
Contributor Author

Copilot AI commented Jan 19, 2026

@copilot There are changes in dev branch to fix the errors occured in Python 3.13. Merge the changes and run the tests again.

Done. Merged dev branch in commit 1dd03cd. All 12 tests pass successfully after the merge.

@sonarqubecloud
Copy link

Copilot AI requested a review from bact January 19, 2026 23:59
@coveralls
Copy link

Coverage Status

coverage: 53.304% (-0.3%) from 53.624%
when pulling 1dd03cd on copilot/exclude-words-in-translation
into 65e6c54 on dev.

@bact bact added the enhancement enhance functionalities label Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement enhance functionalities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exclude words during translation with PyThaiNLP Translate

4 participants