-
Notifications
You must be signed in to change notification settings - Fork 173
Unlimited branches #1002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Unlimited branches #1002
Conversation
Bit mask for tracking which branches, this allows supporting repositories with more than 64 branches. Assisted-by: Cursor
keegancsmith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I need to take a far deeper look into this. But this is just a quick overview I did now.
To be honest I'm surprised this change was as straightforward as it was. I'm guessing the tests made this relatively easy to change and be confident in the changes?
Something I thought about while reviewing this but I think is the wrong decision is to instead use roaring bitmaps. The overhead of that for the normal case would outweigh the benefit of using it I believe. However, for the case of a lot of branches its likely useful given I imagine you will have a lot of dense vs sparse sets which it likely would be good at encoding.
| // a file appears in. This allows supporting repositories with more than 64 branches. | ||
|
|
||
| // newBranchMask allocates a branch mask that can hold at least numBranches bits. | ||
| func newBranchMask(numBranches int) []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a tiny type like type bitmask []byte would make code in general more obvious? Might be more readable as well to make the functions below methods on bitmask
|
|
||
| // firstSetBit returns the index of the first set bit in the mask, | ||
| // or -1 if no bits are set. | ||
| func firstSetBit(mask []byte) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using []byte if you use []uint8 you can then use the maths/bits package for speeding up operations like this.
| toc.branchMasks.start(w) | ||
| for _, m := range b.branchMasks { | ||
| w.U64(m) | ||
| toc.branchMasks.addItem(w, m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to look into the details again, but using a compound section isn't really that efficient here since every branchMasks will be the same size. I would rather you serialize something upfront about the size of the branchmasks then unmarshal it into one big []byte that you can divy up into memory.
| return []byte{} | ||
| } | ||
|
|
||
| result := make([]byte, minLen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite concerned that this (and other functions) allocate. In particular I am worried the impact this has on garbage collection / perf of allocation in general.
I think in places that are part of the query path, ideally you allocate something that looks like []byte instead of [][]byte and then functions like this can write into it somehow. Some sort of nice abstraction over this is likely needed. I can help suggest more, but this is just a cursory look I am taking.
|
Disclaimer: Cursor is responsible for this code. I had my requirements, and use cases for context and I have guided it to the result I wanted as a user. I have manually tested the code and it works as I wished. The quality of the code didn't look bad for me, so before doing a more deligent development I have decided to post it here first. In my use case, there are repositories with many branches and tags (a PR that I would submit after this one) and I need everything - think of repositories with the size of the linux repository. I'm comfortable with Go, so I can tailor fit this PR into something that you would be happy with. So my familiarity with Zoekt internals is low at this point. Also, are there any other tests that you would recommend to run or add? From your comments, it sounds like pefromance might be an issue. |
Bit mask for tracking which branches, this allows supporting repositories with more than 64 branches.
Assisted-by: Cursor