Skip to content

Disable s3 multipart download option#12413

Open
janvanmansum wants to merge 8 commits into
IQSS:developfrom
janvanmansum:disable-s3-multipart-download-option
Open

Disable s3 multipart download option#12413
janvanmansum wants to merge 8 commits into
IQSS:developfrom
janvanmansum:disable-s3-multipart-download-option

Conversation

@janvanmansum
Copy link
Copy Markdown
Contributor

@janvanmansum janvanmansum commented May 29, 2026

Cover Message for Pull Request

What this PR does / why we need it:
This PR addresses a 412 Precondition Failed error encountered when downloading certain large objects from S3-compatible storage (specifically Ceph) that were originally uploaded using multipart upload (e.g., via the boto3 library with default 8MiB part sizes).

The issue arises because the AWS SDK for Java, when multipart support is enabled, automatically attempts a parallel multipart download if it detects an ETag with a part-count suffix (e.g., -5). During this process, the client sends an If-Match header containing the ETag of an individual part to ensure consistency. However, some servers like Ceph expect the ETag for the entire object, leading to the 412 error.

To resolve this, the S3AccessIO class has been refactored to use two distinct S3 clients:

  1. A Write Client: Has multipart enabled to support efficient uploads (fixing issues with large file uploads).
  2. A Read Client: Has multipart disabled by default. This forces the server to handle the reassembly of parts and prevents the client from sending part-specific If-Match headers that trigger the 412 error.

Additionally, a new configuration option disable-multipart-download-for-indirect-download has been introduced to control this behavior.

Which issue(s) this PR closes:
N/a

Special notes for your reviewer:
The fix involves spliting the S3 client into s3ReadClient and s3WriteClient. The s3ReadClient is configured with .multipartEnabled(false) if the new configuration setting is set to true. By default s3ReadClient and s3WriteClient refer to the same client, which has multipart enabled.

Suggestions on how to test this:

  1. Reproduction: Attempt to download a file from a Ceph-backed S3 store that was uploaded in multiple parts (verify the ETag has a suffix like -5). Confirm it fails with 412 Precondition Failed without this PR. (JM) Set dataverse.files.<id>.download-redirect=false so that Dataverse is involved in the download.
  2. Verification: Apply the PR and verify the download succeeds.
  3. Regression: Verify that regular file uploads and downloads (non-multipart) still work as expected.
  4. Configuration: Test setting dataverse.files.<id>.disable-multipart-download-for-indirect-download to false and verify the 412 error returns (if testing against a problematic server).

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No.

Is there a release notes update needed for this change?:
https://github.com/IQSS/dataverse/pull/12413/changes#diff-827dccc047f09653dade26fe2dce6be6d20ae6a739448e2b9f2890849b64d2fd

Additional documentation:
No

@janvanmansum janvanmansum marked this pull request as ready for review May 29, 2026 09:57
@qqmyers qqmyers added the GDCC: DANS related to GDCC work for DANS label May 29, 2026
@qqmyers qqmyers added this to the 6.12 milestone May 29, 2026
@pdurbin pdurbin moved this to Ready for Triage in IQSS Dataverse Project May 29, 2026
Clarified the new configuration setting for S3 drivers to specify its effect on multipart downloads and added context regarding `download-redirect` behavior.
Copy link
Copy Markdown
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic change here - to provide the option to disable use of multipart transfer for download - is necessary for DANS to use the code with Ceph, (with download-redirect false). The implementation is straight forward - the code is refactored to have separate read/write client variables, but these still point to one client unless the new setting is true, in which case download uses a separate aws s3 client with multipart disabled.

The other general changes (noted in other comments) all see reasonable as well, though I think #12109 should make the same release.

The two requested changes are just to add Ceph in the list of known compatible instances and (minor) to clean up some of the formatting/reformatting that happened.

For QA, I'll note that DANS is using this already so I think the key thing is to regression test - not sure the core team needs to configure Ceph to test the actual fix.

@@ -0,0 +1,5 @@
A new configuration setting has been introduced for S3 compatible storage drivers that addresses an incompatibility between the AWS S3 library used in Dataverse and certain S3 implementations such as the Ceph Object Gateway:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvanmansum - I made some edits - feel free to revert. I wanted to make it clear this new setting is mainly to address a problem, not something you'd want to set true otherwise. I also guessed that you're talking about the Ceph Object Gateway.

Beyond that, I might suggest adding Ceph to the list at https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/installation/config.rst#reported-working-s3-compatible-storage with a note about this setting.

private static final ConcurrentHashMap<String, S3AsyncClient> driverDownloadClientMap = new ConcurrentHashMap<>();
private static final ConcurrentHashMap<String, S3AsyncClient> driverUploadClientMap = new ConcurrentHashMap<>();
private static final ConcurrentHashMap<String, S3Presigner> driverPresignerMap = new ConcurrentHashMap<>();
private static final ConcurrentHashMap<String, AwsCredentialsProvider> driverCredentialsProviderMap = new ConcurrentHashMap<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that this PR has some general improvements ~unrelated to the issue, such as the use of ConcurrentHashMaps.

if (super.getInputStream() == null) {
throw new IOException("Cannot get InputStream for S3 Object" + key);
catch (InterruptedException e) {
Thread.currentThread().interrupt();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resetting the interrupt is a recommended practice, not related to the issue.

return null;
} finally {
s3Presigner.close();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another general fix. FWIW: Not closing the presigners is one of the fixes in #12109 which has been sitting. We probably want the other fixes there too.

if (driverTMMap.containsKey(driverId)) {
return driverTMMap.get(driverId);
} else {
return driverTMMap.computeIfAbsent(driverId, id -> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another general improvement

Thread.currentThread().interrupt();
throw new IOException("S3AccessIO: Failed to get aux objects for listing.", e);
}
catch (ExecutionException e) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - this PR seems to be inconsistent in putting the catch on the line with the closing } or the next line and changes whether else is on a new line or not. It would be good to have things consistent (I think we generally have catch and else on the same line as the } )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GDCC: DANS related to GDCC work for DANS

Projects

Status: Ready for Triage

Development

Successfully merging this pull request may close these issues.

3 participants