Skip to content

Add regexp raw pattern API#1061

Merged
janhartman merged 2 commits into
mainfrom
jan/regexp-string-api
May 15, 2026
Merged

Add regexp raw pattern API#1061
janhartman merged 2 commits into
mainfrom
jan/regexp-string-api

Conversation

@janhartman
Copy link
Copy Markdown
Contributor

@janhartman janhartman commented May 14, 2026

Some downstream callers need to marshal or recompile query regexps and were reaching into q.Regexp.String(), which is the underlying syntax tree debug string rather than an official query API.

This adds query.Regexp.RegexpString() as the public API for getting the marshaled raw regexp pattern. Existing query.Regexp string and gob serialization paths now route through that method.

query.Regexp.String() remains query/debug formatting and still includes wrappers like regex:"...".

Copy link
Copy Markdown
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure about this, so I haven't deeply reviewed the regexp.go code change. As such I am requesting changes. If you are keen on this direction then re request review and I will review properly.

internal/syntaxutil is a part of the stdlib's regexp/syntax code with a commit reverted that gave us a performance problem. See README.md from internal/syntaxutil

We shouldn't just be modifying it. If we do we need to use a pattern which makes it clear how we changed it. EG put the new functions in a different file + include comments in the original file indicating changes. Finally you also need to update the README.

Why do you need to make this change out of interest. In the original thread about this issue I noted we had OOMs in Sourcegraph due to not using this function. Did you find even after using this function we still had OOMs and needed to "unsimplify".

How did ya test this. Did you have a reproduction before and after? In particular I wonder if our issue is infact that Simplify does a simplification we don't like / we don't think makes sense. Is the issue then not in simplify? EG we have other bits of code which take syntax.Regexp and construct matchtrees. I worry those bad patterns affect that code path as well.

Comment thread internal/syntaxutil/regexp_test.go Outdated
Comment thread query/query.go
}

// RegexpString returns the marshaled raw regexp pattern for q.
func (q *Regexp) RegexpString() string {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense no matter what. We want a way to get at this. I do think in Sourcegraph though we want to be able to directly use syntaxutil.RegexpString. We had a need to marshal the string for observability, which is outside of the need of marshalling a query.Regexp

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you don't want to somehow export syntaxutil.RegexpString instead? I can't remember if we need to also use it outside of query.Regexp.String(), eg observability around regexes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but I’d rather keep this PR narrower and avoid exporting the internal syntaxutil package unless really necessary. For what we're trying to fix here this should be enough.

Comment thread query/regexp_test.go Outdated
Comment thread query/regexp_test.go Outdated
Comment thread query/regexp_test.go Outdated
Comment thread query/regexp_test.go Outdated
@janhartman janhartman force-pushed the jan/regexp-string-api branch from eac2b70 to 37db178 Compare May 14, 2026 13:47
@janhartman
Copy link
Copy Markdown
Contributor Author

@keegancsmith Thanks for the review. I scaled this back to the minimal Zoekt change.

@janhartman janhartman requested a review from keegancsmith May 14, 2026 13:49
Copy link
Copy Markdown
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to land. FYI we have ./dev/zoekt/update in our monorepo. You can do

./dev/zoekt/update jan/regexp-string-api

to try out this change before landing it (or use go work).

Comment thread query/query.go
}

// RegexpString returns the marshaled raw regexp pattern for q.
func (q *Regexp) RegexpString() string {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you don't want to somehow export syntaxutil.RegexpString instead? I can't remember if we need to also use it outside of query.Regexp.String(), eg observability around regexes.

@janhartman janhartman merged commit 5f844d4 into main May 15, 2026
7 checks passed
@janhartman janhartman deleted the jan/regexp-string-api branch May 15, 2026 08:56
Fanch- pushed a commit to leboncoin/zoekt that referenced this pull request May 18, 2026
Adds query.Regexp.RegexpString() as the public API for getting the marshaled raw regexp pattern. Existing query.Regexp string, proto and gob serialization paths now route through that method.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants