Skip to content

Conversation

@AbhishekRai456
Copy link
Member

Adds initial regex tokenizer with support for:

  • Literals, operators, grouping and anchors (^, $)
  • Character classes with ranges and shorthands (\d, \w, \s and negations)
  • Quantifiers {m,n}, {m,}, {m}
  • Implicit concatenation insertion
  • Error reporting with position tracking

This is a draft for review and future integration with postfix conversion and NFA construction.

Revert accidental formatting changes

Revert accidental formatting changes in exact module

Final fixes

l

auto read_int = [&]() -> int {
skip_spaces();
int val = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned int

Comment on lines +396 to +399
if (!found)
PzError::report_error(PzError::PzErrorType::PZ_INVALID_INPUT,
"Expected number in quantifier at position " +
std::to_string(t.pos));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{,9} implicitly means {0,9}

return t;
}

t.max = read_int();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not throw error ex: {1,} is 1 or more

Comment on lines +160 to +161
const char MIN_CHAR = '\0'; // ascii index 0
const char MAX_CHAR = '\x7F'; // ascii index 127
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use pz_types standard, same for other types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants