Add regexp raw pattern API#1061
Conversation
keegancsmith
left a comment
There was a problem hiding this comment.
I'm a bit unsure about this, so I haven't deeply reviewed the regexp.go code change. As such I am requesting changes. If you are keen on this direction then re request review and I will review properly.
internal/syntaxutil is a part of the stdlib's regexp/syntax code with a commit reverted that gave us a performance problem. See README.md from internal/syntaxutil
We shouldn't just be modifying it. If we do we need to use a pattern which makes it clear how we changed it. EG put the new functions in a different file + include comments in the original file indicating changes. Finally you also need to update the README.
Why do you need to make this change out of interest. In the original thread about this issue I noted we had OOMs in Sourcegraph due to not using this function. Did you find even after using this function we still had OOMs and needed to "unsimplify".
How did ya test this. Did you have a reproduction before and after? In particular I wonder if our issue is infact that Simplify does a simplification we don't like / we don't think makes sense. Is the issue then not in simplify? EG we have other bits of code which take syntax.Regexp and construct matchtrees. I worry those bad patterns affect that code path as well.
| } | ||
|
|
||
| // RegexpString returns the marshaled raw regexp pattern for q. | ||
| func (q *Regexp) RegexpString() string { |
There was a problem hiding this comment.
This change makes sense no matter what. We want a way to get at this. I do think in Sourcegraph though we want to be able to directly use syntaxutil.RegexpString. We had a need to marshal the string for observability, which is outside of the need of marshalling a query.Regexp
There was a problem hiding this comment.
Are you sure you don't want to somehow export syntaxutil.RegexpString instead? I can't remember if we need to also use it outside of query.Regexp.String(), eg observability around regexes.
There was a problem hiding this comment.
I considered that, but I’d rather keep this PR narrower and avoid exporting the internal syntaxutil package unless really necessary. For what we're trying to fix here this should be enough.
eac2b70 to
37db178
Compare
|
@keegancsmith Thanks for the review. I scaled this back to the minimal Zoekt change. |
keegancsmith
left a comment
There was a problem hiding this comment.
LGTM. Feel free to land. FYI we have ./dev/zoekt/update in our monorepo. You can do
./dev/zoekt/update jan/regexp-string-api
to try out this change before landing it (or use go work).
| } | ||
|
|
||
| // RegexpString returns the marshaled raw regexp pattern for q. | ||
| func (q *Regexp) RegexpString() string { |
There was a problem hiding this comment.
Are you sure you don't want to somehow export syntaxutil.RegexpString instead? I can't remember if we need to also use it outside of query.Regexp.String(), eg observability around regexes.
Adds query.Regexp.RegexpString() as the public API for getting the marshaled raw regexp pattern. Existing query.Regexp string, proto and gob serialization paths now route through that method.
Some downstream callers need to marshal or recompile query regexps and were reaching into
q.Regexp.String(), which is the underlying syntax tree debug string rather than an official query API.This adds
query.Regexp.RegexpString()as the public API for getting the marshaled raw regexp pattern. Existingquery.Regexpstring and gob serialization paths now route through that method.query.Regexp.String()remains query/debug formatting and still includes wrappers likeregex:"...".