Add ADR for regex engine use in C++#61
Conversation
pgeier
left a comment
There was a problem hiding this comment.
Had a look. Nothing much to mention. The important is really that a migration is not straightforward as every used regex has to be tested at runtime.
Otherwise I'm much in favour to provide just a set of general functions like match and let projects use/wrap PCRE2 directly for special use-cases and optimizations (e.g. JIT Compilation)
There was a problem hiding this comment.
Very nice overview.
A few high-level comments:
Would be great to have an issue in ECKIT that links to this PR. Such that
The current implementation will be marked as deprecated and removed in EcKit 3.0
is actually known there. :-D
It would be great to write a few sentences about the fact that the Regex.h actually is the intended RAII wrapper, such that most C++ code in the ECMWF stack does not have to do the RAII wrapping themselves. (Personal discussion with @Ozaq )
The manual wrapping is only for projects that do not depend on eckit.
Oh, and because of the numbering, it should be merged after #60
simondsmart
left a comment
There was a problem hiding this comment.
The decision quoted here says to implement an API compatible version of Regex.
There are some arguments to Regex - notably shell and extended. Are these arguments meaningful, and supportable with the new pattern. Do we have a feel for where/how often these arguments are used through the code base, or what might break?
My understanding is that one reason for not changing std::regex to something better is that to extract the performance improvements would require an API change. Is the API of PCRE2 appropriate to be wrapped cleanly by eckit::Regex?
|
|
||
| ### EcKit Migration Path | ||
|
|
||
| EcKit will provide a new API-compatible implementation for `Regex.h`. Since *PCRE2's* Perl-compatible syntax differs from POSIX ERE, every caller must verify that its patterns still match correctly. |
There was a problem hiding this comment.
This is not something that can be straightforwardly done. Regexes are supplied by, and used by, users - not developers - in many of the cases.
Can we see a documentary summary of the differences?
Good catch, I did not take the flags into account. However a cursory check revealed no callsite where
So std::regex is essentially "unfixable" because there are few design mistakes, as acknowledged by the standards committee as well as implementation issues. A central issue is that std::regex is locale aware and not unicode aware. This means matching behavior can change on setting a different locale. If I understand it correctly performance improvements are also hampered by some original choices and since the ABI may not be broken the implementation is essentially set in stone. This 2004 report lists 65 issues with std::regex. Then implementers shipped slow implementations, gcc predominantly, and now given the outlook that none of the mess can really be fixed no efforts are made to address performance issues that could be fixed. For us the general flow of |
Description
Contributor Declaration
By opening this pull request, I affirm the following: