Go: Make version parsing robust in the face of custom Go builds#21817
Go: Make version parsing robust in the face of custom Go builds#21817jketema wants to merge 1 commit intogithub:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the Go extractor toolchain’s go version output parsing to better handle custom Go builds whose version strings include non-semver suffixes, aligning behavior with upstream Go version parsing expectations.
Changes:
- Strip custom build suffixes from the parsed Go version (e.g.
go1.26.3-X:...→go1.26.3). - Extend unit tests to cover custom-suffix versions and release-candidate versions.
Show a summary per file
| File | Description |
|---|---|
| go/extractor/toolchain/toolchain.go | Trims custom build suffixes from the parsed version string before downstream semver parsing. |
| go/extractor/toolchain/toolchain_test.go | Adds test cases for suffix and rc version formats. |
Copilot's findings
Comments suppressed due to low confidence (1)
go/extractor/toolchain/toolchain.go:134
var goVersion = ...is a redundant declaration+initialization in Go; prefer the short declaration (goVersion := ...) here for idiomatic style and to avoid implying a wider scope than needed.
var goVersion = strings.Fields(lastLine)[2]
- Files reviewed: 2/2 changed files
- Comments generated: 2
| var goVersion = strings.Fields(lastLine)[2] | ||
|
|
||
| // Drop custom build suffixes. | ||
| goVersion, _, _ = strings.Cut(goVersion, "-") |
| "go version go1.18.9 linux/amd64": "go1.18.9", | ||
| "go version go1.18.9 linux/amd64": "go1.18.9", | ||
| "go version go1.26.3-X:nodwarf5 linux/amd64": "go1.26.3", | ||
| "go version go1.26.3rc1 linux/amd64": "go1.26.3rc1", |
CI on this PR passed. Is there anything else I should run? |
|
I don't think so. |
| var goVersion = strings.Fields(lastLine)[2] | ||
|
|
||
| // Drop custom build suffixes. | ||
| goVersion, _, _ = strings.Cut(goVersion, "-") |
There was a problem hiding this comment.
I am not sure about this approach. It probably solves the immediate issue, but why not use the existing machinery we have in semver.go? I.e. wrap this in SemVer using NewSemVer. That will parse the semver (and should account for some Go special cases). You can then extract only the components you care about (to ignore the prerelease identifier and e.g. the build info if we don't care about those).
That said, we might want to think about the semver ordering rules and how we should treat prerelease identifiers or build info, rather than just discarding it.
There was a problem hiding this comment.
I'm not sure how you can make your first proposal work without creating code that flat out conflicts with the current -rc handling. Specifically
codeql/go/extractor/util/semver.go
Line 84 in 8cc6d78
There was a problem hiding this comment.
That part of the implementation is hacky for sure and probably breaks some theoretical inputs in any case, but my reading of it is that:
- If we have a version string that doesn't include
rcanywhere, then it is treated as a semver in the normal way. I.e. the pre-release identifier is kept as is. This should cover most of the potential cases we discussed in the thread. - Only if the version string contains
rcsomewhere, then this logic kicks in to try and convert it to-rc[..].
If we think that the potential breakage in the rc case is a problem, then we should probably address that in the NewSemVer implementation since the same version strings are likely to end up in there at some point as well.
There was a problem hiding this comment.
Thanks. It's not clear what we gain by all this though, without yet another major rewrite. Assuming Copilot above is correct, then the case we will be missing here is go version devel go1.22-abc123def456 linux/amd64, but we don't handle that anyway, as we have hardcoded strings.Fields(lastLine)[2].
cf. https://github.com/golang/go/blob/afcf04cb6401b439ce9bdcd18448c512f5bfda77/src/go/version/version.go#L20