-
Notifications
You must be signed in to change notification settings - Fork 222
refactor: simplify encoding mode detection logic #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The refactor introduced a regression: an empty string ( While technically valid—since all modes can represent an empty string— The most appropriate fix is to add an explicit check at the beginning: if ('' === $content) {
return Mode::BYTE();
}This restores the correct behavior and clarifies the intent—helping prevent future regressions. edit: PR has been updated with the correction. |
e68eaba to
6ee44e4
Compare
|
Very nice implementaiton, thanks! Though at the moment we still ahve failing unit tests in |
9fa3392 to
94c6c72
Compare
|
Please rebase against main to get the tests working. |
Refactor to simplify checks for numeric and alphanumeric content.
Fixes a regression introduced in the previous refactor where an empty string ('') was incorrectly encoded as Mode::ALPHANUMERIC() instead of Mode::BYTE().
While technically valid, Mode::BYTE() is a more appropriate choice—it reflects general-purpose encoding and avoids the limitations of the alphanumeric character set.
This change also improves clarity and helps prevent similar regressions in the future.
Similarly, it ensures correct behavior for Encoder::chooseMode('', 'SHIFT-JIS').
getAlphanumericCode(): - rename parameter from $code to $byte for clarity and consistency with docblock - simplify using null coalescing (also reduces lookups) isOnlyDoubleByteKanji(): - update docblock to note empty string returns true (which is important to consider in the caller)
94c6c72 to
c0915b5
Compare
|
Rebased. Please let me know what you would like to see improved (including commit messages) before considering this for merging. |
|
A faster version of private static function isOnlyAlphanumeric(string $content) : bool
{
$allowed = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ $%*+-./:';
return strspn($content, $allowed) === strlen($content);
}It is much, much faster (hundreds or even thousands of times) and highly scalable, as it appears to run in close to O(1). Edit: committed — would’ve been a shame not to use it. |
Implemented a technique that takes advantage of strspn() to optimize isOnlyAlphanumeric(). This approach is much, much faster (hundreds or even thousands of times) and highly scalable, appearing to run in close to O(1).
450b63d to
a45a746
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #201 +/- ##
============================================
- Coverage 70.58% 70.51% -0.08%
+ Complexity 995 993 -2
============================================
Files 49 49
Lines 3182 3174 -8
============================================
- Hits 2246 2238 -8
Misses 936 936 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
src/Encoder/Encoder.php
Outdated
| */ | ||
| private static function isOnlyAlphanumeric(string $content) : bool | ||
| { | ||
| $allowed = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ $%*+-./:'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of assigning this variable on every call, would it make sense to have this as a private const on the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It looks cleaner this way.
|
Aside: The previous “iterate and analyze byte‑by‑byte” approach would have made sense if we were implementing a “multiple modes” algorithm (see #111). However, we are still far from that. The algorithms are already known and can be generated in PHP, but they remain complex and not straightforward to put in place. |
Refactor
Encoder::chooseModeto simplify and optimize the checks for numeric and alphanumeric content.ctype_digit()can check the entire string at once, so we can check it upfront:This allows us to remove the logic related to
$hasNumeric, streamlining the rest of the function.