Skip to content

WIP: Implement wordSegmenter#76

Draft
kytta wants to merge 8 commits intocometkim:mainfrom
kytta:2-word-segmenter
Draft

WIP: Implement wordSegmenter#76
kytta wants to merge 8 commits intocometkim:mainfrom
kytta:2-word-segmenter

Conversation

@kytta
Copy link
Copy Markdown

@kytta kytta commented May 21, 2025

This PR tackles #25 and implements a word segmenter. This is still WIP

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented May 21, 2025

⚠️ No Changeset found

Latest commit: f42217d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@kytta kytta changed the title Implement wordSegmenter WIP: Implement wordSegmenter May 21, 2025
@cometkim
Copy link
Copy Markdown
Owner

Wow, this is amazing. Thank you very much.

Let me know if you need any help or want to discuss.

@kytta
Copy link
Copy Markdown
Author

kytta commented May 22, 2025

Wow, this is amazing. Thank you very much.

No problem :) I got irritated that there is no good polyfill for word segmenting with up-to-date Unicode rules, and I stumbled upon your project via e18e and decided to contribute :)

Let me know if you need any help or want to discuss.

Yeah so far I think I'm good. It's a bit confusing to implement because I try to base it off grapheme.js but also word.rs, and they are implemented completely differently 😂 It's all very new to me, but I think I'm getting the hang of it now. Will try to finish it this week.

@cometkim
Copy link
Copy Markdown
Owner

#112 needs to be addressed before implementing other segmenters.
This PR seems quite old, but if you're still interested in implementing it, please let me know.

@kytta
Copy link
Copy Markdown
Author

kytta commented Mar 22, 2026

This PR seems quite old, but if you're still interested in implementing it, please let me know.

Hey Hyeseong, sorry for the silence from my side. I'll be honest with you, but I don't know if I'll be working on this PR any time soon.

Since last time I touched the code, my own needs and circumstances have changed, and now I myself don't need a word segmenter for the project I initially needed it for. Because of this, I now have less motivation to finish this :) I might come back to it later when I get the time and inspiration to work with ICU again, but don't expect this to happen any time soon :/

If you, or anyone else, want to pick this up, be my guest 😅

@cometkim
Copy link
Copy Markdown
Owner

Understandable. I was wondering if people still need the word segmenter, but it seems to receive less attention than grapheme.

I tried a different implementation on my end, but having automated tests was a bit difficult due to the fragmentation of Intl.Segmenter behavior on Node.js.

I think I can postpone the development of additional segments a little more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants