Skip to content

feat: Implement ecosystem enumeration in Go#5140

Open
michaelkedar wants to merge 7 commits intogoogle:masterfrom
michaelkedar:🏜️🏞️

Hidden character warning

The head ref may contain hidden characters: "\ud83c\udfdc\ufe0f\ud83c\udfde\ufe0f"
Open

feat: Implement ecosystem enumeration in Go#5140
michaelkedar wants to merge 7 commits intogoogle:masterfrom
michaelkedar:🏜️🏞️

Conversation

@michaelkedar
Copy link
Copy Markdown
Member

Translated all the ecosystem queries for version enumeration into go

  • Bioconductor seems to be broken (the newer releases 500 on for the requests). We have 0 Bioconductor vulns, so I just disabled enumeration for them (but left the code in for reference)
  • Removed the Alpine enumeration, because querying the aports repo is quite heavy
  • opam is still Implemented and disabled, as it is in python
  • Most of the ecosystems are uninteresting - they're mostly just parsing versions out of JSON (though please look at them!). Debian does the most work in its thing
  • Added go-vcr so the tests don't depend on network requests. We'd want to have something regenerate these once in a while/test against live to make sure nothing starts breaking.


func (e packagistEcosystem) GetVersions(_ string) ([]string, error) {
panic("not yet implemented")
func packagistAPIURL(pkg string) string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that reminds me, I think ideally we should (as a follow up PR) have this logic mirror the linter to account for alternative repos i.e. Drupal

(assuming I've correctly understood that this is the right place to be doing that)

Copy link
Copy Markdown
Contributor

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good!


// GetVersions enumerates GHC versions.
// Different components of GHC are part of the same software release, so we ignore the package name.
func (e ghcEcosystem) GetVersions(_ string) ([]string, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably explain here that GHC ecosystem is just essentially one package.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the schema, GHC is

The Haskell compiler ecosystem. The name field is the name of a component of the GHC compiler ecosystem (e.g., compiler, GHCI, RTS).

So it's not just one package, but each of the packages are released together with the new compiler release, which I guess is what the comment was already saying (which was transplanted from Python)

"testing"
)

func TestGHC_GetVersions(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these tests should have t.Parallel(), since they are all in the ecosystem package, we don't want to test sequentially.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests all call setupHTTPClientForTest which messes with the global http client to use the go-vcr client.

We can't run them in parallel because they'd all interfere with each other.

I don't really want to have to pass in a http client to GetVersions

Copy link
Copy Markdown
Contributor

@another-rex another-rex Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know how long this test is taking right now? Another option is storing the httpClient (or just a httpClientOverride) in each ecosystem struct.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the latest Cloud Build run on this PR it took 21.617s, which I think is tolerable, but if you don't like it I can try adding a per-struct client

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some factoring to add a Provider to all the ecosystem implementations to allow the http client to be set separately. Running the tests in parallel now took 11.195s.
PTAL

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm that still feels slow no? Especially with everything cassetted it should be almost instant...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk - it does have to load megabytes of test data

}

// getVersionsDepsDev enumerates versions for a package using the deps.dev API.
func getVersionsDepsDev(e Ecosystem, depsDevSystem string, pkg string) ([]string, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using the depsdev package for grpc requests?

Also replace depsDevSystem with the depsdev enum maybe

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the REST API lets us use go-vcr for tests, which I think I'd prefer.

proto_major: 1
proto_minor: 1
content_length: 215
body: "\uFEFF\x3C\x3F\x78\x6D\x6C\x20\x76\x65\x72\x73\x69\x6F\x6E\x3D\"\x31\x2E\x30\"\x20\x65\x6E\x63\x6F\x64\x69\x6E\x67\x3D\"\x75\x74\x66\x2D\x38\"\x3F\x3E\x3C\x45\x72\x72\x6F\x72\x3E\x3C\x43\x6F\x64\x65\x3E\x42\x6C\x6F\x62\x4E\x6F\x74\x46\x6F\x75\x6E\x64\x3C\x2F\x43\x6F\x64\x65\x3E\x3C\x4D\x65\x73\x73\x61\x67\x65\x3E\x54\x68\x65\x20\x73\x70\x65\x63\x69\x66\x69\x65\x64\x20\x62\x6C\x6F\x62\x20\x64\x6F\x65\x73\x20\x6E\x6F\x74\x20\x65\x78\x69\x73\x74\x2E\n\x52\x65\x71\x75\x65\x73\x74\x49\x64\x3A\x36\x31\x38\x62\x33\x30\x36\x35\x2D\x39\x30\x31\x65\x2D\x30\x30\x33\x66\x2D\x31\x32\x39\x38\x2D\x62\x64\x37\x35\x35\x34\x30\x30\x30\x30\x30\x30\n\x54\x69\x6D\x65\x3A\x32\x30\x32\x36\x2D\x30\x33\x2D\x32\x37\x54\x30\x33\x3A\x31\x39\x3A\x32\x30\x2E\x34\x30\x38\x31\x38\x36\x38\x5A\x3C\x2F\x4D\x65\x73\x73\x61\x67\x65\x3E\x3C\x2F\x45\x72\x72\x6F\x72\x3E"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm bit odd this is completely binary encoded when most of these are ascii characters

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hallucinated conversion:

<?xml version="1.0" encoding="utf-8"?>
<Error><Code>BlobNotFound</Code><Message>The specified blob does not exist.
RequestId:618b3065-901e-003f-1298-bd7554000000
Time:2026-03-27T03:19:20.4081868Z</Message></Error>

"testing"
)

func TestNuGet_GetVersions(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this tests both paging logic (page.Items and page.[No]Items)? The cassette is so big it seems impossible to see.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I'm not actually sure - these test cases were copied from Python

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3Mb file :O

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh 😅
The python one is a similar size so that's extra nice

Co-authored-by: Rex P <106129829+another-rex@users.noreply.github.com>
another-rex
another-rex previously approved these changes Mar 31, 2026
"testing"
)

func TestGHC_GetVersions(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm that still feels slow no? Especially with everything cassetted it should be almost instant...


// Get returns an ecosystem for the given ecosystem name.
// If the ecosystem is not found, it returns nil, false.
func (p *Provider) Get(ecosystem string) (Ecosystem, bool) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this feels very... circular.

You use the provider to get the ecosystem which uses the provider to call fetch.

Few ideas, not sure about these:

  • The functions in the ecosystems map seems very ugly
    • Maybe we can use an interface for setting the providers on construction? (this might be too much work for not really much gain)
  • An option for defaulting to using the default http client would be to change the type of Provider from a pointer to just the struct Provider (For both in these signatures and each ecosystem field). And in the fetchJSON function we can do a check to see if client pointer exists, if not use the default http client.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this feels very... circular.

You use the provider to get the ecosystem which uses the provider to call fetch.

It's more like the Provider is the environment/context that the ecosystems exist in, so the ecosystems retain a handle on the environment that produced them to make requests within that environment.

The functions in the ecosystems map seems very ugly

Agreed, but setting up the interface adds more boilerplate that's not worth it IMO.
I would love it if I could have a generic function to do it, but go generics won't let you set fields on a generic struct.

An option for defaulting to using the default http client...

I feel like having the DefaultProvider is sufficient for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants