Skip to content

Fix halfwidth katakana voiced and semi-voiced sound marks in width calculations#89

Open
tats-u wants to merge 1 commit into
unicode-rs:masterfrom
tats-u:fix-half-voiced
Open

Fix halfwidth katakana voiced and semi-voiced sound marks in width calculations#89
tats-u wants to merge 1 commit into
unicode-rs:masterfrom
tats-u:fix-half-voiced

Conversation

@tats-u
Copy link
Copy Markdown

@tats-u tats-u commented May 26, 2026

Related: microsoft/terminal#18087

Halfwidth katakana (semi-)voiced sound marks U+FF9E & U+FF9F are the only Grapheme Extenders that belong to Letter (Lm). They are typical edge cases when you cosider grapheme clusters.

They should be counted as 1 (their EAW is H), not 0.


English transtion of initial Prompt (Copilot + GPT-5.4 high):

Add test_halfwidth_katakana:
パグ→4 characters


English translation of additional handling prompt after discovering that the added test fails (because GPT started flailing around trying to fix it):

The half-width Kana voiced mark test failed because they're Grapheme Extenders despite being Letters. The current spec sucks, so please fix it together


"pug dog" was hand-written. The rest part of that comment was completed by Copilot line completion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants