Skip to content

Conversation

@abraverm
Copy link

Bit mask for tracking which branches, this allows supporting repositories with more than 64 branches.

Assisted-by: Cursor

Bit mask for tracking which branches, this allows supporting repositories with
more than 64 branches.

Assisted-by: Cursor
Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I need to take a far deeper look into this. But this is just a quick overview I did now.

To be honest I'm surprised this change was as straightforward as it was. I'm guessing the tests made this relatively easy to change and be confident in the changes?

Something I thought about while reviewing this but I think is the wrong decision is to instead use roaring bitmaps. The overhead of that for the normal case would outweigh the benefit of using it I believe. However, for the case of a lot of branches its likely useful given I imagine you will have a lot of dense vs sparse sets which it likely would be good at encoding.

// a file appears in. This allows supporting repositories with more than 64 branches.

// newBranchMask allocates a branch mask that can hold at least numBranches bits.
func newBranchMask(numBranches int) []byte {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a tiny type like type bitmask []byte would make code in general more obvious? Might be more readable as well to make the functions below methods on bitmask


// firstSetBit returns the index of the first set bit in the mask,
// or -1 if no bits are set.
func firstSetBit(mask []byte) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using []byte if you use []uint8 you can then use the maths/bits package for speeding up operations like this.

toc.branchMasks.start(w)
for _, m := range b.branchMasks {
w.U64(m)
toc.branchMasks.addItem(w, m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look into the details again, but using a compound section isn't really that efficient here since every branchMasks will be the same size. I would rather you serialize something upfront about the size of the branchmasks then unmarshal it into one big []byte that you can divy up into memory.

return []byte{}
}

result := make([]byte, minLen)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite concerned that this (and other functions) allocate. In particular I am worried the impact this has on garbage collection / perf of allocation in general.

I think in places that are part of the query path, ideally you allocate something that looks like []byte instead of [][]byte and then functions like this can write into it somehow. Some sort of nice abstraction over this is likely needed. I can help suggest more, but this is just a cursory look I am taking.

@abraverm
Copy link
Author

abraverm commented Jan 14, 2026

Disclaimer: Cursor is responsible for this code. I had my requirements, and use cases for context and I have guided it to the result I wanted as a user. I have manually tested the code and it works as I wished. The quality of the code didn't look bad for me, so before doing a more deligent development I have decided to post it here first.

In my use case, there are repositories with many branches and tags (a PR that I would submit after this one) and I need everything - think of repositories with the size of the linux repository.

I'm comfortable with Go, so I can tailor fit this PR into something that you would be happy with.

So my familiarity with Zoekt internals is low at this point. Also, are there any other tests that you would recommend to run or add? From your comments, it sounds like pefromance might be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants