Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
39f63a5
code changes for AWS SDK v2
qqmyers Mar 20, 2025
05b124a
pom updates for v2
qqmyers Mar 20, 2025
8ca9fab
new managed executor
qqmyers Mar 20, 2025
f073d08
test updates
qqmyers Mar 20, 2025
14c3070
update zipdownloader s3 to sdk v2
qqmyers Mar 21, 2025
6a9b390
cut/paste error
qqmyers Mar 21, 2025
57604db
try exclude
qqmyers Mar 21, 2025
1955873
switch to app scope
qqmyers Mar 21, 2025
b8bfcc5
more exclusions
qqmyers Mar 21, 2025
d7a8c45
changes to xml use of <rights></rights>
qqmyers Mar 21, 2025
9fe1518
update localstack, change tempdir calc
qqmyers Mar 21, 2025
6997b09
update localstack in docker-compose
qqmyers Mar 21, 2025
3e9156f
Fix bugs
qqmyers Mar 21, 2025
c83bb42
Merge remote-tracking branch 'IQSS/develop' into AWSv2
qqmyers Apr 5, 2025
2deb1e4
release note
qqmyers Apr 5, 2025
d4d1ef7
Merge remote-tracking branch 'IQSS/develop' into AWSv2
qqmyers Apr 10, 2025
ca0fc42
fix presigners to use endpoint override
qqmyers Apr 12, 2025
ce0ff7f
same change for download presigner
qqmyers Apr 13, 2025
d763015
Fix non-direct upload with endpoints that don't support the sha header
qqmyers Apr 14, 2025
3de4010
add path style to presigner, cache it
qqmyers Apr 14, 2025
dacda1e
simplify, add stack trace
qqmyers Apr 14, 2025
faabd63
add check on bucket creation
qqmyers Apr 14, 2025
a65c7da
fix path style access in client
qqmyers Apr 15, 2025
605228b
use tm for bag file
qqmyers Apr 15, 2025
cf55d5f
cleanup, remove payload-signing option
qqmyers Apr 15, 2025
5c655e2
abort unused stream
qqmyers Apr 15, 2025
77c14fa
Merge remote-tracking branch 'IQSS/develop' into AWSv2
qqmyers May 30, 2025
0165035
Merge branch 'develop' into AWSv2 #11360
pdurbin Jun 23, 2025
f21603b
improve release note #11360
pdurbin Jun 23, 2025
0e4e680
add note about the executor resource
qqmyers Jun 23, 2025
a985f2d
add note about XML serialization changes
qqmyers Jun 23, 2025
ab38373
Merge remote-tracking branch 'IQSS/develop' into AWSv2
qqmyers Jun 23, 2025
fb9a881
Merge branch 'AWSv2' of https://github.com/GlobalDataverseCommunityCo…
qqmyers Jun 23, 2025
47ead36
Merge remote-tracking branch 'IQSS/develop' into AWSv2
qqmyers Jun 25, 2025
151029c
Merge remote-tracking branch 'IQSS/develop' into AWSv2
qqmyers Jun 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions doc/release-notes/11360-AwsSdkV2.x.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Upgrade to AWS SDK v2 (for S3), v1 EOL in December 2025

To support S3 storage, Dataverse uses the AWS SDK. We have upgraded to v2 of this SDK because v1 reaches End Of Life (EOL) in [December 2025](https://aws.amazon.com/fr/blogs/developer/announcing-end-of-support-for-aws-sdk-for-java-v1-x-on-december-31-2025/).

As part of the upgrade, the payload-signing setting for S3 stores (`dataverse.files.<id>.payload-signing`) has been removed because it is no longer necessary. With the updated SDK, a payload signature will automatically be sent when required (and not sent when not required).

Dataverse developers should note that LocalStack is used to test S3 and older versions appear to be incompatible. The development environment has been upgraded to LocalStack v2.3.2 to v4.2.0, which seems to work fine.

See also #11073 and #11360.

### Settings Removed

- `dataverse.files.<id>.payload-signing`
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/api/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ This API changelog is experimental and we would love feedback on its usefulness.

v6.7
----

- An undocumented :doc:`search` parameter called "show_my_data" has been removed. It was never exercised by tests and is believed to be unused. API users should use the :ref:`api-mydata` API instead.
- /api/datasets/{id}/curationStatus API now includes a JSON object with curation label, createtime, and assigner rather than a string 'label' and it supports a new boolean includeHistory parameter (default false) that returns a JSON array of statuses
- /api/datasets/{id}/listCurationStates includes new columns "Status Set Time" and "Status Set By" columns listing the time the current status was applied and by whom. It also supports the boolean includeHistory parameter.
- Due to updates in libraries used by Dataverse, XML serialization may have changed slightly with respect to whether self-closing tags are used for empty elements. This primiarily affects XML-based metadata exports. The XML structure of the export itself has not changed, so this is only an incompatibility if you are not using an XML parser.

v6.6
----
Expand Down
7 changes: 3 additions & 4 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1321,7 +1321,6 @@ List of S3 Storage Options
dataverse.files.<id>.profile <?> Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
dataverse.files.<id>.proxy-url <?> URL of a proxy protecting the S3 store. Optional. (none)
dataverse.files.<id>.path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
dataverse.files.<id>.disable-tagging ``true``/``false`` Do not place the ``temp`` tag when redirecting the upload to the S3 server. ``false``
Expand Down Expand Up @@ -1370,12 +1369,12 @@ Reported Working S3-Compatible Storage
possibly slow) https://play.minio.io:9000 service.

`StorJ Object Store <https://www.storj.io>`_
StorJ is a distributed object store that can be configured with an S3 gateway. Per the S3 Storage instructions above, you'll first set up the StorJ S3 store by defining the id, type, and label. After following the general installation, set the following configurations to use a StorJ object store: ``dataverse.files.<id>.payload-signing=true`` and ``dataverse.files.<id>.chunked-encoding=false``. For step-by-step instructions see https://docs.storj.io/dcs/how-tos/dataverse-integration-guide/
StorJ is a distributed object store that can be configured with an S3 gateway. Per the S3 Storage instructions above, you'll first set up the StorJ S3 store by defining the id, type, and label. After following the general installation, set the following configuration to use a StorJ object store: ``dataverse.files.<id>.chunked-encoding=false``. For step-by-step instructions see https://docs.storj.io/dcs/how-tos/dataverse-integration-guide/

Note that for direct uploads and downloads, Dataverse redirects to the proxy-url but presigns the urls based on the ``dataverse.files.<id>.custom-endpoint-url``. Also, note that if you choose to enable ``dataverse.files.<id>.download-redirect`` the S3 URLs expire after 60 minutes by default. You can change that minute value to reflect a timeout value that’s more appropriate by using ``dataverse.files.<id>.url-expiration-minutes``.

`Surf Object Store v2019-10-30 <https://www.surf.nl/en>`_
Set ``dataverse.files.<id>.payload-signing=true``, ``dataverse.files.<id>.chunked-encoding=false`` and ``dataverse.files.<id>.path-style-request=true`` to use Surf Object
Set ``dataverse.files.<id>.chunked-encoding=false`` and ``dataverse.files.<id>.path-style-request=true`` to use Surf Object
Store. You will need the Swift client (documented at <http://doc.swift.surfsara.nl/en/latest/Pages/Clients/s3cred.html>) to create the access key and secret key for the S3 interface.

Note that the ``dataverse.files.<id>.proxy-url`` setting can be used in installations where the object store is proxied, but it should be considered an advanced option that will require significant expertise to properly configure.
Expand Down Expand Up @@ -2265,7 +2264,7 @@ The S3 Archiver defines one custom setting, a required :S3ArchiverConfig. It can

The credentials for your S3 account, can be stored in a profile in a standard credentials file (e.g. ~/.aws/credentials) referenced via "profile" key in the :S3ArchiverConfig setting (will default to the default entry), or can via MicroProfile settings as described for S3 stores (dataverse.s3archiver.access-key and dataverse.s3archiver.secret-key)

The :S3ArchiverConfig setting is a JSON object that must include an "s3_bucket_name" and may include additional S3-related parameters as described for S3 Stores, including "profile", "connection-pool-size","custom-endpoint-url", "custom-endpoint-region", "path-style-access", "payload-signing", and "chunked-encoding".
The :S3ArchiverConfig setting is a JSON object that must include an "s3_bucket_name" and may include additional S3-related parameters as described for S3 Stores, including "profile", "connection-pool-size","custom-endpoint-url", "custom-endpoint-region", "path-style-access", and "chunked-encoding".

\:S3ArchiverConfig - minimally includes the name of the bucket to use. For example:

Expand Down
2 changes: 1 addition & 1 deletion docker-compose-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ services:
dev_localstack:
container_name: "dev_localstack"
hostname: "localstack"
image: localstack/localstack:2.3.2
image: localstack/localstack:4.2.0
restart: on-failure
ports:
- "127.0.0.1:4566:4566"
Expand Down
7 changes: 4 additions & 3 deletions modules/dataverse-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,13 @@
<scope>import</scope>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-bom</artifactId>
<groupId>software.amazon.awssdk</groupId>
<artifactId>bom</artifactId>
<version>${aws.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>

<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>libraries-bom</artifactId>
Expand Down Expand Up @@ -151,7 +152,7 @@
<payara.version>6.2025.3</payara.version>
<postgresql.version>42.7.7</postgresql.version>
<solr.version>9.8.0</solr.version>
<aws.version>1.12.748</aws.version>
<aws.version>2.31.3</aws.version>
<google.library.version>26.30.0</google.library.version>

<!-- Basic libs, logging -->
Expand Down
33 changes: 29 additions & 4 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,18 @@
<groupId>org.apache.geronimo.specs</groupId>
<artifactId>geronimo-javamail_1.4_spec</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.geronimo.specs</groupId>
<artifactId>geronimo-stax-api_1.0_spec</artifactId>
</exclusion>
<exclusion>
<groupId>org.codehaus.woodstox</groupId>
<artifactId>wstx-asl</artifactId>
</exclusion>
<exclusion>
<groupId>org.codehaus.woodstox</groupId>
<artifactId>woodstox-core-asl</artifactId>
</exclusion>
Comment thread
qqmyers marked this conversation as resolved.
</exclusions>
</dependency>
<!-- Dependency for Apache Abdera and Apache Tika. Tika needs newer version. -->
Expand Down Expand Up @@ -167,10 +179,24 @@
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3</artifactId>
<!-- no version here as managed by BOM above! -->
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3-transfer-manager</artifactId>
<!-- no version here as managed by BOM above! -->
</dependency>
<!--dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>apache-client</artifactId-->
<!-- no version here as managed by BOM above! -->
<!--/dependency-->
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>netty-nio-client</artifactId>
<!-- no version here as managed by BOM above! -->
</dependency>
<dependency>
Expand All @@ -181,7 +207,6 @@
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.9.1</version>
<scope>compile</scope>
</dependency>
<!-- Should be refactored and moved to transitive section above once on Java EE 8 (makes WAR smaller) -->
Expand Down
5 changes: 3 additions & 2 deletions scripts/zipdownload/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,9 @@
<artifactId>postgresql</artifactId>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3</artifactId>
<!-- no version here as managed by BOM above! -->
</dependency>
</dependencies>
<build>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,14 @@

package edu.harvard.iq.dataverse.custom.service.util;

import com.amazonaws.SdkClientException;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.GetObjectRequest;
import com.amazonaws.services.s3.model.ObjectMetadata;
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.core.ResponseInputStream;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.GetObjectResponse;
import software.amazon.awssdk.services.s3.model.S3Exception;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
Expand All @@ -38,9 +40,9 @@
*
* @author Leonid Andreev
*/
public class DirectAccessUtil implements java.io.Serializable {
public class DirectAccessUtil implements java.io.Serializable {

private AmazonS3 s3 = null;
private S3Client s3 = null;

public InputStream openDirectAccess(String storageLocation) {
InputStream inputStream = null;
Expand All @@ -57,31 +59,17 @@ public InputStream openDirectAccess(String storageLocation) {
String bucket = storageLocation.substring(0, storageLocation.indexOf('/'));
String key = storageLocation.substring(storageLocation.indexOf('/') + 1);

//System.out.println("bucket: "+bucket);
//System.out.println("key: "+key);

/* commented-out code below is for looking up S3 metatadata
properties, such as size, etc. prior to making an access call:
ObjectMetadata objectMetadata = null;
long fileSize = 0L;
try {
objectMetadata = s3.getObjectMetadata(bucket, key);
fileSize = objectMetadata.getContentLength();
//System.out.println("byte size: "+objectMetadata.getContentLength());
} catch (SdkClientException sce) {
System.err.println("Cannot get S3 object metadata " + key + " from bucket " + bucket);
}*/

try {
inputStream = s3.getObject(new GetObjectRequest(bucket, key)).getObjectContent();
} catch (SdkClientException sce) {
ResponseInputStream<GetObjectResponse> s3Object = s3.getObject(GetObjectRequest.builder()
.bucket(bucket)
.key(key)
.build());
inputStream = s3Object;
} catch (S3Exception se) {
System.err.println("Cannot get S3 object " + key + " from bucket " + bucket);
}

} else if (storageLocation.startsWith("file://")) {
// This could be a static method; since no reusable client/maintainable
// state is required

storageLocation = storageLocation.substring(7);

try {
Expand All @@ -98,14 +86,13 @@ public InputStream openDirectAccess(String storageLocation) {
private void createOrReuseAwsClient() {
if (this.s3 == null) {
try {
AmazonS3ClientBuilder s3CB = AmazonS3ClientBuilder.standard();
s3CB.setCredentials(new ProfileCredentialsProvider("default"));
this.s3 = s3CB.build();

this.s3 = S3Client.builder()
.region(Region.US_EAST_1) // You may want to make this configurable
.credentialsProvider(ProfileCredentialsProvider.create("default"))
.build();
} catch (Exception e) {
System.err.println("cannot instantiate an S3 client");
System.err.println("Cannot instantiate an S3 client: " + e.getMessage());
}
}
}

}
20 changes: 0 additions & 20 deletions src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -1687,26 +1687,6 @@ public String getRsyncScriptFilename() {
return rsyncScriptFilename;
}

@Deprecated
public void requestDirectUploadUrl() {

S3AccessIO<?> s3io = FileUtil.getS3AccessForDirectUpload(dataset);
if (s3io == null) {
FacesContext.getCurrentInstance().addMessage(uploadComponentId, new FacesMessage(FacesMessage.SEVERITY_ERROR, BundleUtil.getStringFromBundle("dataset.file.uploadWarning"), "Direct upload not supported for this dataset"));
}
String url = null;
String storageIdentifier = null;
try {
url = s3io.generateTemporaryS3UploadUrl();
storageIdentifier = FileUtil.getStorageIdentifierFromLocation(s3io.getStorageLocation());
} catch (IOException io) {
logger.warning(io.getMessage());
FacesContext.getCurrentInstance().addMessage(uploadComponentId, new FacesMessage(FacesMessage.SEVERITY_ERROR, BundleUtil.getStringFromBundle("dataset.file.uploadWarning"), "Issue in connecting to S3 store for direct upload"));
}

PrimeFaces.current().executeScript("uploadFileDirectly('" + url + "','" + storageIdentifier + "')");
}

public void requestDirectUploadUrls() {

Map<String, String> paramMap = FacesContext.getCurrentInstance().getExternalContext().getRequestParameterMap();
Expand Down
Loading
Loading