Reimplemented xlocate as xbps-locate#585
Reimplemented xlocate as xbps-locate#585friedelschoen wants to merge 12 commits intovoid-linux:masterfrom
Conversation
|
how does this affect repodata size? why not make it part of xbps-query? this also means x(bps-)locate loses the power of pcre and delta-updating the index |
|
_BSD_SOURCE was fixed in 48c9879, rebase |
|
Making some calculations: about 60bytes filepath and some overhead, let's talk about 100bytes per file. 100x50x13'000 ≈ 60mb uncompressed. Maybe using an extra file like You're right about loosing the power of PCRE then, maybe a third-party library? |
so at the very minimum 235 MB assuming single-byte ASCII characters only and no plist overhead |
|
Oke! I wasn't aware of that much overhead to include it directly into |
|
It's worth noting that the existing xlocate index is large enough to already be in git, where it still takes ages to download if you don't already have a clone and can't take advantage of the delta updating that git provides. |
|
After some research, making a plist with all the files in xlocate.git: % cat ../make-plist.sh
echo "<plist>"
echo "\t<dict>"
for pkg in *; do
echo "\t\t<key>$pkg</key>"
echo "\t\t<array>"
for file in $(awk '{print $1}' $pkg); do
echo "\t\t\t<string>$file</string>"
done
echo "\t\t</array>"
done
echo "\t</dict>"
echo "</plist>"
% sh ../make-plist.sh | zstd -f9o ../files.zstd
/*stdin*\ : 5.21% ( 238 MiB => 12.4 MiB, ../files.zstd)
% find * -print -exec cat {} \; | zstd -f9o ../files.zstd
/*stdin*\ : 6.58% ( 197 MiB => 13.0 MiB, ../files.zstd) 13MiB still is a lot to just include into Taking gcc-fortran which is about 13MB takes 5.3s, cloning the xlocate.git takes about 11s. Then updating the git is for sure faster, but how often is that needed if files-lists don't really change with every version. I cannot tell how accurate this comparision is and how linear is behaves on slower networks. Please correct me if I'm wrong. |
8a8c61a to
e65aef4
Compare
That's reason git is used, it provides the mechanism to download only the new parts of the index ("delta-updating"), keeping existing files as is |
e65aef4 to
dc02b0e
Compare
|
I've now re-implemented xlocate into xbps-query (-o and --ownedhash) to have better integration. From there, you can still search by file/link but also by hash! Every file-hash is included into Also can someone with a binary-repo make a index-file with |
196d5ff to
4ac9897
Compare
I've implemented a new xbps-tool
xbps-locate!xbps-rindexcollects data intoindex.plistinside*-repodatabut also files intofiles.plist.xbps-locatewill fetch thefiles.plistfrom the repo-pool and search for the desired file. I cannot testAlso added to
TODO, cleanage ofxbps-rindexdoesn't cleanfiles.plistyet.I've also added into
repo_open_*inlib/repo.cthat the archive-iterator just assumes that the files are in order (they are still written in order for compatiblity) but is checking the actual filename.On my computer, I've to manually disable _BSD_SOURCE and _SVID_SOURCE, so there is a commit, I don't know if it's only on my computer. (Void Linux x86_64-musl)
Thanks for looking into my code!