fix: support Unicode method names in parser and compiler#12
Merged
Conversation
Add test cases for non-ASCII method name parsing: - Korean characters (안녕하세요) - Mixed ASCII and Unicode (비_영어_함수명___테스트1!) - Japanese characters (こんにちは) - Class methods with Unicode names All tests currently fail due to \w regex pattern limitation. Related to #11
Replace \w regex pattern with [\p{L}\p{N}_] to support non-ASCII
characters (Korean, Japanese, etc.) in method names.
Changes:
- Add IDENTIFIER_CHAR and METHOD_NAME_PATTERN constants
- Update parser.rb to detect Unicode method definitions
- Update compiler.rb to strip type annotations from Unicode methods
Fixes #11
Add parse_conditional method to BodyParser that properly parses if/unless/elsif/else blocks into IR::Conditional nodes. This enables the type inference system to collect all possible return values from conditional branches and unify them into union types. The fix handles: - Simple if/else blocks - elsif chains (parsed as nested if) - unless statements - Nested conditionals at correct depth Fixes #13
Modified collect_returns_recursive to return a termination flag.
When a return statement is encountered, subsequent code in the same
block is now correctly identified as unreachable and excluded from
type inference.
This ensures that methods like:
def test
return false
if condition
"string"
end
end
Are inferred as returning `bool` instead of `bool | String`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
\wregex pattern with[\p{L}\p{N}_]to match Unicode letters and numbersChanges
IDENTIFIER_CHARandMETHOD_NAME_PATTERNconstants, update method detection regexTest plan
Fixes #11