Is there any interest in supporting alternative word-boundary rules as desired for (e.g.) text editing?
Currently, split_word_bounds follows UAX #29 word boundary rules, which explicitly considers things like 3,456.789 and example.com a single word (although interestingly e.g. has a boundary before the latter period).
Text editing usually wants slightly different rules; e.g. text.len should be considered two words + intervening punctuation.
In some cases CamelCase and snake_case may also be considered multiple words (e.g. KDE's Kate editor treats these as Camel Case and snake_ case for word-mode keyboard navigation (Ctrl + Right etc.) but does not sub-divide for double-click selection).
Is there any interest in supporting alternative word-boundary rules as desired for (e.g.) text editing?
Currently,
split_word_boundsfollows UAX #29 word boundary rules, which explicitly considers things like3,456.789andexample.coma single word (although interestinglye.g.has a boundary before the latter period).Text editing usually wants slightly different rules; e.g.
text.lenshould be considered two words + intervening punctuation.In some cases
CamelCaseandsnake_casemay also be considered multiple words (e.g. KDE's Kate editor treats these asCamelCaseandsnake_casefor word-mode keyboard navigation (Ctrl + Right etc.) but does not sub-divide for double-click selection).