Skip to content

Use the regex library for extended syntax and flags #500

@slevithan

Description

@slevithan

Love the project! 😊

What do you think about adding the regex package to offer extended JS regex syntax (atomic groups, possessive quantifiers, subroutines, etc.)? A couple of options:

  • Use it by default. This might be a good approach since regex uses a strict superset of JS regex syntax, and all of its extended syntax is an error in native JS regexes.
    • Potentially add an option to enable free spacing mode (flag x), which is on by default in regex but could be disabled by default in RexReplace to avoid changing existing behavior.
  • Add an option (or potentially reuse the existing -E/--engine) to switch from native JS syntax to regex.
    • In this case, presumably you'd leave all of regex's implicit flags enabled.

Note: Since RexReplace would need to call regex with dynamic input rather than using it with backticks as a template tag, that would work like this: regex({raw: [<pattern-string>]}) or regex(…)({raw: [<pattern-string>]}). Something like the following options might work best if regex was used by default:

regex({
  flags: 'gim',
  subclass: true,
  disable: {
    n: true,
    x: true,
  },
})({raw: [pattern]});

Edit: If you also wanted to disable flag v's changes to escaping rules within character classes (to avoid breaking changes), you could add the options disable: {v: true} and unicodeSetsPlugin: null. You can see more details about all of regex's options here, but essentially, option disable: {v: true} means to use flag u even in environments that support v natively, and unicodeSetsPlugin: null tells regex not to apply flag v's escaping rules when using flag u. Note that regex always uses flag u or v, so using the library would implicitly set RexReplace's -u/--unicode option. But I think it might be a good breaking change to always implicitly use Unicode-aware mode anyway, since Unicode-unaware mode can silently introduce many Unicode-related bugs, doesn't get the benefit of strict errors for weird legacy syntax, and doesn't support \u{…}, \p{…}, or \P{…}. (Flag u has been supported since ES6/ES2015.)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions