Conversation
c4aff6f to
091cee3
Compare
091cee3 to
d02924f
Compare
Adds UAX #44 identifier checking, and NFC quick check support, along with a few helpers like `isAscii` and `unescapeUnicode`.
d02924f to
2e0ffdb
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR introduces comprehensive Unicode support to the qtil library, providing CodeQL predicates for Unicode property checking, UAX #44 identifier validation, and NFC normalization checking. The implementation includes raw Unicode data generation, string utilities for Unicode escape handling, and efficient APIs for common Unicode operations.
- Adds extensible predicates for Unicode properties (enumeration, boolean, and numeric)
- Implements UAX #44 identifier validation and NFC normalization quick checking
- Provides utilities for Unicode escape sequences and ASCII validation
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/qtil/strings/Unicode.qll | Core Unicode module with extensible predicates and helper functions |
| src/qtil/Qtil.qll | Imports the new Unicode module |
| src/qlpack.yml | Adds data extension for generated Unicode data |
| scripts/generate_unicode.py | Python script to generate Unicode property data from Unicode standard files |
| test/qtil/strings/UnicodeTest.ql | Comprehensive test suite for Unicode functionality |
| test/qtil/strings/UnicodeTest.expected | Test expectations file |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if '..' not in code_point_hex_pair: | ||
| code_point_start = code_point_end = int(code_point_hex_pair, 16) | ||
| else: | ||
| # handle ranges like '00A0..00A7' | ||
| code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..') | ||
| code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16) |
There was a problem hiding this comment.
The comment and code handling ranges is duplicated on lines 128-130 and 157-159. Consider extracting this logic into a helper function to reduce duplication.
| if '..' not in code_point_hex_pair: | |
| code_point_start = code_point_end = int(code_point_hex_pair, 16) | |
| else: | |
| # handle ranges like '00A0..00A7' | |
| code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..') | |
| code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16) | |
| code_point_start, code_point_end = parse_code_point_range(code_point_hex_pair) |
| if '..' not in code_point_hex_pair: | ||
| code_point_start = code_point_end = int(code_point_hex_pair, 16) | ||
| else: | ||
| # handle ranges like '00A0..00A7' | ||
| code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..') | ||
| code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16) |
There was a problem hiding this comment.
The comment and code handling ranges is duplicated on lines 128-130 and 157-159. Consider extracting this logic into a helper function to reduce duplication.
| if '..' not in code_point_hex_pair: | |
| code_point_start = code_point_end = int(code_point_hex_pair, 16) | |
| else: | |
| # handle ranges like '00A0..00A7' | |
| code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..') | |
| code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16) | |
| code_point_start, code_point_end = parse_code_point_range(code_point_hex_pair) |
CodeQL coding standards is implementing MISRA rules that refer to unicode standard concepts such as UAX #44 compliant identifiers, and NFC normalization checks.
These concepts are neither specific to MISRA, nor C, and thus, deserve a home in qtil.
This pull request introduces
isAsciiandunescapeUnicodeQtil.qll.These features are pretty advanced, I'm not sure they're worth adding to the README.md.