-
Notifications
You must be signed in to change notification settings - Fork 291
Revised AzureBlobStore deletions. #1451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
geowebcache/azureblob/src/main/java/org/geowebcache/azure/AzureBlobStore.java
Outdated
Show resolved
Hide resolved
e8fef4f to
fd48c02
Compare
afabiani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential concerns are quite minor:
-
Log injection / spam
All log messages are parameterized via String.formatted, so you’re not concatenating untrusted text into log format specifiers. Unless user-controlled data can become part of prefix or container name (unlikely in GeoWebCache’s Azure store config), log injection isn’t a realistic threat. -
DOS via huge delete requests
Massive delete operations can always cause heavy load, but that was already true of the previous implementation. The patch doesn’t open a new path for an attacker to cause more damage than they could before (assuming the same APIs let them trigger a large truncate).
geowebcache/azureblob/src/main/java/org/geowebcache/azure/AzureBlobStore.java
Show resolved
Hide resolved
geowebcache/azureblob/src/main/java/org/geowebcache/azure/AzureBlobStore.java
Outdated
Show resolved
Hide resolved
geowebcache/azureblob/src/main/java/org/geowebcache/azure/AzureBlobStore.java
Outdated
Show resolved
Hide resolved
| .map(n -> container.getBlobClient(n).getBlobUrl()) | ||
| .collect(Collectors.toList()); | ||
|
|
||
| return batch.deleteBlobs(blobsUrls, DeleteSnapshotsOptionType.INCLUDE).stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PrefixBulkDelete may exceed the batch limit. Nothing here enforces itemNames.size() <= PAGE_SIZE. If Azure’s page size is greater than 256, you will send a batch with more than 256 deletes, which violates the batch constraints that motivated PAGE_SIZE = 256.
Recommendation (important):
Either set the page size explicitly:
ListBlobsOptions options = new ListBlobsOptions()
.setPrefix(prefix)
.setDetails(new BlobListDetails().setRetrieveDeletedBlobs(true))
.setMaxResults(DeleteManager.PAGE_SIZE);Or, more robustly, partition items inside the loop:
List<String> items = ...
for (int i = 0; i < items.size(); i += PAGE_SIZE) {
count += deleteItems(container, batch,
items.subList(i, Math.min(i + PAGE_SIZE, items.size())));
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being a private method that does what it says, I don't know if we need to repeat the check here.
geowebcache/azureblob/src/main/java/org/geowebcache/azure/DeleteManager.java
Show resolved
Hide resolved
|
@axl8713 the new streaming approach seems to work, however it's a bit convoluted... you start from a stream, turn it into an iterator, batch it, create a stream of lists that is consumed later. Seems pretty complicated. |
f1021c9 to
c15b7e4
Compare
|
Thanks, @aaime. I was overthinking it. Could you please check my new version to see if it is simpler? |
aaime
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
The backport to stderrstdoutTo backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.27.x 1.27.x
# Navigate to the new working tree
cd .worktrees/backport-1.27.x
# Create a new branch
git switch --create backport-1451-to-1.27.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick 7201edafdd2b82b17ae3d0657a1a6aebb1d040d7,690258c87df9c9f223e39cee8af639325a12c31a,c15b7e4cd8f52349d86337455b9d5d78b9cb78ee,c91d5877ecfeda9d890fc5f60ddb823ad5ae4e82
# Push it to GitHub
git push --set-upstream origin backport-1451-to-1.27.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.27.xThen, create a pull request where the |
These changes improve the handling of bulk deletions of cached tiles in Azure Blob storage.
Previously, truncating a TileRange involved blindly issuing one deletion request for every possible tile in the range, potentially flooding the blob store with thousands of unnecessary requests for tiles that did not exist.
With these changes, GWC first checks which tiles actually exist in the blob store within the specified range, and then issues a delete request only for those tiles.
Moreover, to avoid making one API call per deletion, the Azure BlobBatchClient
is now used to batch delete operations, reducing overhead and simplifying the code.
PS: Azure Batch Delete is documented as supporting up to 256 operations per batch, although this limit is not entirely clear or consistently referenced across all sources:
As a precaution, the page size has therefore been reduced from 1000 to 256 to ensure full compatibility with the platform’s batching constraints.