Avoid copying on disk cache hits if supported by the file system#28369
Avoid copying on disk cache hits if supported by the file system#28369fmeum wants to merge 3 commits intobazelbuild:masterfrom
Conversation
c226814 to
9e2c0ea
Compare
There was a problem hiding this comment.
Code Review
The pull request introduces the MaybePathBacked interface, which allows OutputStream implementations to indicate if they are backed by a file system Path. This enables an important optimization in DiskCacheClient to use FileSystemUtils.copyFile for direct file-to-file copies, potentially avoiding an in-memory buffer and improving efficiency. The changes are well-structured and utilize modern Java features like pattern matching for instanceof.
One area for improvement is ensuring consistent progress reporting when the optimized file copy path is taken, as the current implementation might bypass the DownloadProgressReporter for these operations.
2d6504e to
f2c6746
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a significant performance optimization for disk cache hits by leveraging file system-level copy operations, avoiding redundant data transfer. This is enabled by a new MaybePathBacked interface to identify path-backed streams. A key part of this change is the intentional removal of digest verification for disk cache entries, which trades verification for speed. The implementation is generally clean, but I've identified a high-severity issue where the optimization inadvertently breaks download progress reporting. My review includes a detailed suggestion on how to fix this while keeping the performance benefits.
|
I tested this PR rebased on to the 9.0.0 tag using https://github.com/john-flanagan/size_test and it looks great! That test repo declares 50 iOS unit tests targets that all depend on the same framework containing a 2GB binary resource file. Without this change hitting the disk cache results in a full file copy into each of the 50 targets and consumes ~100GB of disk space. 9.0.0With this change hitting the disk cached is both much faster (25.8s vs 88.5s) and does not consume significant additional disk space. Dev Build |
|
@bazel-io fork 9.1.0 |
|
I will try to add a test for this that runs on macOS. Both Linux and Windows require special filesystem setups that would be difficult to prepare in CI. |
| * An interface to mark {@link java.io.OutputStream} and {@link java.io.InputStream}s that may be | ||
| * known to write to an associated {@link Path}. | ||
| */ | ||
| public interface MaybePathBacked { |
There was a problem hiding this comment.
Can we drop the maybe? It seems that in practice all implementations are path-backed.
There was a problem hiding this comment.
ReportingOutputStream is only conditionally path backed (if the wrapped stream is), but we can't represent that on the type level. That's why I went with Maybe, but I can still drop the prefix if you think it's unnecessary.
f2c6746 to
c8774c6
Compare
RELNOTES: Bazel no longer verifies the digests of disk cache entries upon a cache hit. # Conflicts: # src/main/java/com/google/devtools/build/lib/remote/CombinedCacheClientFactory.java # src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java # src/main/java/com/google/devtools/build/lib/remote/disk/DiskCacheClient.java # src/test/java/com/google/devtools/build/lib/remote/disk/DiskCacheClientTest.java # src/tools/remote/src/main/java/com/google/devtools/build/remote/worker/OnDiskBlobStoreCache.java
|
@tjgq I resolved the conflicts |
c8774c6 to
a245347
Compare
a245347 to
03a16d7
Compare
…elbuild#28369) RELNOTES: Bazel no longer verifies the digests of disk cache entries upon a cache hit. This honors the description but not the previous behavior of the `--remote_verify_downloads` flag, which in fact controlled digest verification for both remote and disk caches. In the process, `LazyFileOutputStream#flush` had to be changed to not open the stream if not open yet, otherwise it would override the existing contents. Tested manually by running a fully cached build of `//src:bazel` and then verifying that `du -kh bazel-bin/src/bazel` prints `0B` on macOS. Closes bazelbuild#28369. PiperOrigin-RevId: 881397852 Change-Id: Icf3535ff31e314605c6003bd48602f6e00317022
…tem (htt… (#28940) …ps://github.com//pull/28369) RELNOTES: Bazel no longer verifies the digests of disk cache entries upon a cache hit. This honors the description but not the previous behavior of the `--remote_verify_downloads` flag, which in fact controlled digest verification for both remote and disk caches. In the process, `LazyFileOutputStream#flush` had to be changed to not open the stream if not open yet, otherwise it would override the existing contents. Tested manually by running a fully cached build of `//src:bazel` and then verifying that `du -kh bazel-bin/src/bazel` prints `0B` on macOS. Closes #28369. PiperOrigin-RevId: 881397852 Change-Id: Icf3535ff31e314605c6003bd48602f6e00317022 <!-- Thank you for contributing to Bazel! Please read the contribution guidelines: https://bazel.build/contribute.html --> ### Description <!-- Please provide a brief summary of the changes in this PR. --> ### Motivation <!-- Why is this change important? Does it fix a specific bug or add a new feature? If this PR fixes an existing issue, please link it here (e.g. "Fixes #1234"). --> ### Build API Changes <!-- Does this PR affect the Build API? (e.g. Starlark API, providers, command-line flags, native rules) If yes, please answer the following: 1. Has this been discussed in a design doc or issue? (Please link it) 2. Is the change backward compatible? 3. If it's a breaking change, what is the migration plan? --> No ### Checklist - [ ] I have added tests for the new use cases (if any). - [ ] I have updated the documentation (if applicable). ### Release Notes <!-- If this is a new feature, please add 'RELNOTES[NEW]: <description>' here. If this is a breaking change, please add 'RELNOTES[INC]: <reason>' here. If this change should be mentioned in release notes, please add 'RELNOTES: <reason>' here. --> RELNOTES: None Commit 7666e2a Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
RELNOTES: Bazel no longer verifies the digests of disk cache entries upon a cache hit. This honors the description but not the previous behavior of the
--remote_verify_downloadsflag, which in fact controlled digest verification for both remote and disk caches.In the process,
LazyFileOutputStream#flushhad to be changed to not open the stream if not open yet, otherwise it would override the existing contents.Tested manually by running a fully cached build of
//src:bazeland then verifying thatdu -kh bazel-bin/src/bazelprints0Bon macOS.