Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 19, 2026

What does this changes

Standardizes SPDX copyright headers and improves docstring clarity for Thai G2P transliteration modules.

What was wrong

  • SPDX headers: Inconsistent copyright year formats (some files showed "2026" instead of "2016-2026"), trailing periods, and malformed entries with "Copyright" prefix
  • Copyright blocks: The perceptron tagger had unformatted copyright lines outside the docstring
  • Docstrings: Three G2P transliteration classes (ThaiG2P, Umt5ThaiG2P) had identical generic docstrings despite using different model architectures (PyTorch, transformers, UMT5)

How this fixes it

SPDX header fixes (5 files):

  • pythainlp/tokenize/thai2fit.py, pythainlp/util/profanity.py: Changed 20262016-2026
  • pythainlp/corpus/tnc.py: Removed trailing period from Project.Project
  • pythainlp/tokenize/thaisumcut.py: Changed Copyright 20202020 in SPDX field
  • All files now follow standard order: FileCopyrightText, FileType, License-Identifier

Copyright block fix (1 file):

  • pythainlp/tag/_tag_perceptron.py: Moved MIT license copyright statements into docstring

Docstring improvements (3 files):

  • pythainlp/transliterate/thaig2p.py: Now specifies "PyTorch-based model (v1)"
  • pythainlp/transliterate/thaig2p_v2.py: Now specifies "transformer-based model (v2)"
  • pythainlp/transliterate/umt5_thaig2p.py: Now specifies "UMT5 model"

Each docstring now clearly describes the underlying architecture and links to relevant documentation.

Your checklist for this pull request

  • Passed code styles and structures
  • Passed code linting checks and unit test
Original prompt

Fix documentation, including code comments and docstring, inconsistencies across the repo, including formatting consistency.

Fact checks claims about external libraries if still valid.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@bact bact added the documentation improve documentation and test cases label Jan 19, 2026
Copilot AI and others added 2 commits January 19, 2026 14:51
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix documentation inconsistencies and validate external library claims Fix documentation inconsistencies: SPDX headers and docstrings Jan 19, 2026
Copilot AI requested a review from bact January 19, 2026 14:58
@sonarqubecloud
Copy link

@bact bact marked this pull request as ready for review January 19, 2026 18:57
@bact bact merged commit 809f271 into dev Jan 19, 2026
40 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation improve documentation and test cases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants