Skip to content

Avoid copying on disk cache hits if supported by the file system#28369

Closed
fmeum wants to merge 3 commits intobazelbuild:masterfrom
fmeum:avoid-disk-cache-copy
Closed

Avoid copying on disk cache hits if supported by the file system#28369
fmeum wants to merge 3 commits intobazelbuild:masterfrom
fmeum:avoid-disk-cache-copy

Conversation

@fmeum
Copy link
Copy Markdown
Collaborator

@fmeum fmeum commented Jan 21, 2026

RELNOTES: Bazel no longer verifies the digests of disk cache entries upon a cache hit. This honors the description but not the previous behavior of the --remote_verify_downloads flag, which in fact controlled digest verification for both remote and disk caches.

In the process, LazyFileOutputStream#flush had to be changed to not open the stream if not open yet, otherwise it would override the existing contents.

Tested manually by running a fully cached build of //src:bazel and then verifying that du -kh bazel-bin/src/bazel prints 0B on macOS.

@fmeum fmeum requested a review from a team as a code owner January 21, 2026 18:18
@github-actions github-actions Bot added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Jan 21, 2026
@fmeum fmeum force-pushed the avoid-disk-cache-copy branch from c226814 to 9e2c0ea Compare January 21, 2026 18:20
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces the MaybePathBacked interface, which allows OutputStream implementations to indicate if they are backed by a file system Path. This enables an important optimization in DiskCacheClient to use FileSystemUtils.copyFile for direct file-to-file copies, potentially avoiding an in-memory buffer and improving efficiency. The changes are well-structured and utilize modern Java features like pattern matching for instanceof.

One area for improvement is ensuring consistent progress reporting when the optimized file copy path is taken, as the current implementation might bypass the DownloadProgressReporter for these operations.

@fmeum fmeum marked this pull request as draft January 21, 2026 18:24
@fmeum fmeum force-pushed the avoid-disk-cache-copy branch 2 times, most recently from 2d6504e to f2c6746 Compare January 22, 2026 12:22
@fmeum fmeum marked this pull request as ready for review January 22, 2026 12:22
@fmeum fmeum requested a review from tjgq January 22, 2026 12:22
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant performance optimization for disk cache hits by leveraging file system-level copy operations, avoiding redundant data transfer. This is enabled by a new MaybePathBacked interface to identify path-backed streams. A key part of this change is the intentional removal of digest verification for disk cache entries, which trades verification for speed. The implementation is generally clean, but I've identified a high-severity issue where the optimization inadvertently breaks download progress reporting. My review includes a detailed suggestion on how to fix this while keeping the performance benefits.

@john-flanagan
Copy link
Copy Markdown

john-flanagan commented Jan 22, 2026

I tested this PR rebased on to the 9.0.0 tag using https://github.com/john-flanagan/size_test and it looks great!

That test repo declares 50 iOS unit tests targets that all depend on the same framework containing a 2GB binary resource file.

Without this change hitting the disk cache results in a full file copy into each of the 50 targets and consumes ~100GB of disk space.

9.0.0

$ bazel clean --expunge
$ rm -rf ~/bazel_disk_cache_test
$ df -h
Filesystem        Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk3s1s1   926Gi    11Gi   549Gi     3%    453k  4.3G    0%   /

$ bazel build //...
INFO: Elapsed time: 128.151s, Critical Path: 54.48s

$ df -h
Filesystem        Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk3s1s1   926Gi    11Gi   546Gi     3%    453k  4.3G    0%   /

$ bazel clean
$ df -h
Filesystem        Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk3s1s1   926Gi    11Gi   547Gi     3%    453k  4.3G    0%   /

$ bazel build //...
INFO: Elapsed time: 88.520s, Critical Path: 28.19s

$ df -h
Filesystem        Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk3s1s1   926Gi    11Gi   449Gi     3%    453k  4.3G    0%   /

With this change hitting the disk cached is both much faster (25.8s vs 88.5s) and does not consume significant additional disk space.

Dev Build

$ bazel clean --expunge
$ rm -rf ~/bazel_disk_cache_test
$ df -h
Filesystem        Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk3s1s1   926Gi    11Gi   549Gi     3%    453k  4.3G    0%   /

$ ../bazel/bazel-bin/src/bazel-dev build //...
INFO: Elapsed time: 121.612s, Critical Path: 48.25s

$ df -h
Filesystem        Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk3s1s1   926Gi    11Gi   546Gi     3%    453k  4.3G    0%   /

$ ../bazel/bazel-bin/src/bazel-dev clean
$ df -h
Filesystem        Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk3s1s1   926Gi    11Gi   546Gi     3%    453k  4.3G    0%   /

$ ../bazel/bazel-bin/src/bazel-dev build //...
INFO: Elapsed time: 25.801s, Critical Path: 8.84s

$ df -h
Filesystem        Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk3s1s1   926Gi    11Gi   546Gi     3%    453k  4.3G    0%   /

@fmeum
Copy link
Copy Markdown
Collaborator Author

fmeum commented Jan 22, 2026

@bazel-io fork 9.1.0

@fmeum
Copy link
Copy Markdown
Collaborator Author

fmeum commented Jan 22, 2026

I will try to add a test for this that runs on macOS. Both Linux and Windows require special filesystem setups that would be difficult to prepare in CI.

* An interface to mark {@link java.io.OutputStream} and {@link java.io.InputStream}s that may be
* known to write to an associated {@link Path}.
*/
public interface MaybePathBacked {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop the maybe? It seems that in practice all implementations are path-backed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReportingOutputStream is only conditionally path backed (if the wrapped stream is), but we can't represent that on the type level. That's why I went with Maybe, but I can still drop the prefix if you think it's unnecessary.

Comment thread src/main/java/com/google/devtools/build/lib/remote/common/MaybePathBacked.java Outdated
@fmeum fmeum force-pushed the avoid-disk-cache-copy branch from f2c6746 to c8774c6 Compare January 24, 2026 10:36
@fmeum fmeum requested a review from tjgq January 24, 2026 10:39
@tjgq tjgq added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Mar 5, 2026
fmeum added 2 commits March 8, 2026 10:57
RELNOTES: Bazel no longer verifies the digests of disk cache entries upon a cache hit.

# Conflicts:
#	src/main/java/com/google/devtools/build/lib/remote/CombinedCacheClientFactory.java
#	src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java
#	src/main/java/com/google/devtools/build/lib/remote/disk/DiskCacheClient.java
#	src/test/java/com/google/devtools/build/lib/remote/disk/DiskCacheClientTest.java
#	src/tools/remote/src/main/java/com/google/devtools/build/remote/worker/OnDiskBlobStoreCache.java
@fmeum
Copy link
Copy Markdown
Collaborator Author

fmeum commented Mar 8, 2026

@tjgq I resolved the conflicts

@fmeum fmeum force-pushed the avoid-disk-cache-copy branch from c8774c6 to a245347 Compare March 8, 2026 10:00
@fmeum fmeum force-pushed the avoid-disk-cache-copy branch from a245347 to 03a16d7 Compare March 8, 2026 10:05
@github-actions github-actions Bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Mar 10, 2026
iancha1992 pushed a commit to iancha1992/bazel that referenced this pull request Mar 10, 2026
…elbuild#28369)

RELNOTES: Bazel no longer verifies the digests of disk cache entries upon a cache hit. This honors the description but not the previous behavior of the `--remote_verify_downloads` flag, which in fact controlled digest verification for both remote and disk caches.

In the process, `LazyFileOutputStream#flush` had to be changed to not open the stream if not open yet, otherwise it would override the existing contents.

Tested manually by running a fully cached build of `//src:bazel` and then verifying that `du -kh bazel-bin/src/bazel` prints `0B` on macOS.

Closes bazelbuild#28369.

PiperOrigin-RevId: 881397852
Change-Id: Icf3535ff31e314605c6003bd48602f6e00317022
github-merge-queue Bot pushed a commit that referenced this pull request Mar 13, 2026
…tem (htt… (#28940)

…ps://github.com//pull/28369)

RELNOTES: Bazel no longer verifies the digests of disk cache entries
upon a cache hit. This honors the description but not the previous
behavior of the `--remote_verify_downloads` flag, which in fact
controlled digest verification for both remote and disk caches.

In the process, `LazyFileOutputStream#flush` had to be changed to not
open the stream if not open yet, otherwise it would override the
existing contents.

Tested manually by running a fully cached build of `//src:bazel` and
then verifying that `du -kh bazel-bin/src/bazel` prints `0B` on macOS.

Closes #28369.

PiperOrigin-RevId: 881397852
Change-Id: Icf3535ff31e314605c6003bd48602f6e00317022

<!--
Thank you for contributing to Bazel!
Please read the contribution guidelines:
https://bazel.build/contribute.html
-->

### Description
<!--
Please provide a brief summary of the changes in this PR.
-->

### Motivation
<!--
Why is this change important? Does it fix a specific bug or add a new
feature?
If this PR fixes an existing issue, please link it here (e.g. "Fixes
#1234").
-->

### Build API Changes
<!--
Does this PR affect the Build API? (e.g. Starlark API, providers,
command-line flags, native rules)
If yes, please answer the following:
1. Has this been discussed in a design doc or issue? (Please link it)
2. Is the change backward compatible?
3. If it's a breaking change, what is the migration plan?
-->

No

### Checklist

- [ ] I have added tests for the new use cases (if any).
- [ ] I have updated the documentation (if applicable).

### Release Notes

<!--
If this is a new feature, please add 'RELNOTES[NEW]: <description>'
here.
If this is a breaking change, please add 'RELNOTES[INC]: <reason>' here.
If this change should be mentioned in release notes, please add
'RELNOTES: <reason>' here.
-->

RELNOTES: None

Commit
7666e2a

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-Remote-Exec Issues and PRs for the Execution (Remote) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants