Disable s3 multipart download option by janvanmansum · Pull Request #12413 · IQSS/dataverse

janvanmansum · 2026-05-29T09:21:23Z

Cover Message for Pull Request

What this PR does / why we need it:
This PR addresses a 412 Precondition Failed error encountered when downloading certain large objects from S3-compatible storage (specifically Ceph) that were originally uploaded using multipart upload (e.g., via the boto3 library with default 8MiB part sizes).

The issue arises because the AWS SDK for Java, when multipart support is enabled, automatically attempts a parallel multipart download if it detects an ETag with a part-count suffix (e.g., -5). During this process, the client sends an If-Match header containing the ETag of an individual part to ensure consistency. However, some servers like Ceph expect the ETag for the entire object, leading to the 412 error.

To resolve this, the S3AccessIO class has been refactored to use two distinct S3 clients:

A Write Client: Has multipart enabled to support efficient uploads (fixing issues with large file uploads).
A Read Client: Has multipart disabled by default. This forces the server to handle the reassembly of parts and prevents the client from sending part-specific If-Match headers that trigger the 412 error.

Additionally, a new configuration option disable-multipart-download-for-indirect-download has been introduced to control this behavior.

Which issue(s) this PR closes:
N/a

Special notes for your reviewer:
The fix involves spliting the S3 client into s3ReadClient and s3WriteClient. The s3ReadClient is configured with .multipartEnabled(false) if the new configuration setting is set to true. By default s3ReadClient and s3WriteClient refer to the same client, which has multipart enabled.

Suggestions on how to test this:

Reproduction: Attempt to download a file from a Ceph-backed S3 store that was uploaded in multiple parts (verify the ETag has a suffix like -5). Confirm it fails with 412 Precondition Failed without this PR. (JM) Set dataverse.files.<id>.download-redirect=false so that Dataverse is involved in the download.
Verification: Apply the PR and verify the download succeeds.
Regression: Verify that regular file uploads and downloads (non-multipart) still work as expected.
Configuration: Test setting dataverse.files.<id>.disable-multipart-download-for-indirect-download to false and verify the 412 error returns (if testing against a problematic server).

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No.

Is there a release notes update needed for this change?:
https://github.com/IQSS/dataverse/pull/12413/changes#diff-827dccc047f09653dade26fe2dce6be6d20ae6a739448e2b9f2890849b64d2fd

Additional documentation:
No

…n log

… ZIP

Clarified the new configuration setting for S3 drivers to specify its effect on multipart downloads and added context regarding `download-redirect` behavior.

qqmyers

The basic change here - to provide the option to disable use of multipart transfer for download - is necessary for DANS to use the code with Ceph, (with download-redirect false). The implementation is straight forward - the code is refactored to have separate read/write client variables, but these still point to one client unless the new setting is true, in which case download uses a separate aws s3 client with multipart disabled.

The other general changes (noted in other comments) all see reasonable as well, though I think #12109 should make the same release.

The two requested changes are just to add Ceph in the list of known compatible instances and (minor) to clean up some of the formatting/reformatting that happened.

For QA, I'll note that DANS is using this already so I think the key thing is to regression test - not sure the core team needs to configure Ceph to test the actual fix.

qqmyers · 2026-05-29T20:04:11Z

@@ -0,0 +1,5 @@
+A new configuration setting has been introduced for S3 compatible storage drivers that addresses an incompatibility between the AWS S3 library used in Dataverse and certain S3 implementations such as the Ceph Object Gateway:


@janvanmansum - I made some edits - feel free to revert. I wanted to make it clear this new setting is mainly to address a problem, not something you'd want to set true otherwise. I also guessed that you're talking about the Ceph Object Gateway.

Beyond that, I might suggest adding Ceph to the list at https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/installation/config.rst#reported-working-s3-compatible-storage with a note about this setting.

qqmyers · 2026-05-29T20:05:22Z

+    private static final ConcurrentHashMap<String, S3AsyncClient> driverDownloadClientMap = new ConcurrentHashMap<>();
+    private static final ConcurrentHashMap<String, S3AsyncClient> driverUploadClientMap = new ConcurrentHashMap<>();
+    private static final ConcurrentHashMap<String, S3Presigner> driverPresignerMap = new ConcurrentHashMap<>();
+    private static final ConcurrentHashMap<String, AwsCredentialsProvider> driverCredentialsProviderMap = new ConcurrentHashMap<>();


Just noting that this PR has some general improvements ~unrelated to the issue, such as the use of ConcurrentHashMaps.

qqmyers · 2026-05-29T20:06:23Z

-        if (super.getInputStream() == null) {
-            throw new IOException("Cannot get InputStream for S3 Object" + key);
+            catch (InterruptedException e) {
+                Thread.currentThread().interrupt();


Resetting the interrupt is a recommended practice, not related to the issue.

qqmyers · 2026-05-29T20:10:01Z

                return null;
-            } finally {
-                s3Presigner.close();
            }


Another general fix. FWIW: Not closing the presigners is one of the fixes in #12109 which has been sitting. We probably want the other fixes there too.

qqmyers · 2026-05-29T20:11:09Z

-        if (driverTMMap.containsKey(driverId)) {
-            return driverTMMap.get(driverId);
-        } else {
+        return driverTMMap.computeIfAbsent(driverId, id -> {


Another general improvement

qqmyers · 2026-05-29T20:13:19Z

+            Thread.currentThread().interrupt();
+            throw new IOException("S3AccessIO: Failed to get aux objects for listing.", e);
+        }
+        catch (ExecutionException e) {


Minor - this PR seems to be inconsistent in putting the catch on the line with the closing } or the next line and changes whether else is on a new line or not. It would be good to have things consistent (I think we generally have catch and else on the same line as the } )

janvanmansum added 7 commits May 29, 2026 10:45

DD-2271 separate clients for upload and download

5718ae6

Added extension mime type definition for css to git rid of warnings i…

afb1766

…n log

DD-2283 No Accept/Cancel buttons when downloading dataset in Archival…

859285f

… ZIP

wip

3b347a4

reformat

ed4f4e1

docs

a1420c0

Release note

05a25e7

janvanmansum marked this pull request as ready for review May 29, 2026 09:57

qqmyers added this to IQSS Dataverse Project May 29, 2026

qqmyers added the GDCC: DANS related to GDCC work for DANS label May 29, 2026

qqmyers added this to the 6.12 milestone May 29, 2026

pdurbin moved this to Ready for Triage in IQSS Dataverse Project May 29, 2026

Update release notes for S3 multipart download setting

a786378

Clarified the new configuration setting for S3 drivers to specify its effect on multipart downloads and added context regarding `download-redirect` behavior.

qqmyers requested changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable s3 multipart download option#12413

Disable s3 multipart download option#12413
janvanmansum wants to merge 8 commits into
IQSS:developfrom
janvanmansum:disable-s3-multipart-download-option

janvanmansum commented May 29, 2026 •

edited by qqmyers

Loading

Uh oh!

qqmyers left a comment

Uh oh!

qqmyers May 29, 2026

Uh oh!

qqmyers May 29, 2026

Uh oh!

qqmyers May 29, 2026

Uh oh!

qqmyers May 29, 2026

Uh oh!

qqmyers May 29, 2026

Uh oh!

qqmyers May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,5 @@
		A new configuration setting has been introduced for S3 compatible storage drivers that addresses an incompatibility between the AWS S3 library used in Dataverse and certain S3 implementations such as the Ceph Object Gateway:

Conversation

janvanmansum commented May 29, 2026 • edited by qqmyers Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cover Message for Pull Request

Uh oh!

qqmyers left a comment

Choose a reason for hiding this comment

Uh oh!

qqmyers May 29, 2026

Choose a reason for hiding this comment

Uh oh!

qqmyers May 29, 2026

Choose a reason for hiding this comment

Uh oh!

qqmyers May 29, 2026

Choose a reason for hiding this comment

Uh oh!

qqmyers May 29, 2026

Choose a reason for hiding this comment

Uh oh!

qqmyers May 29, 2026

Choose a reason for hiding this comment

Uh oh!

qqmyers May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

janvanmansum commented May 29, 2026 •

edited by qqmyers

Loading