Skip to content

Conversation

@jtuglu1
Copy link
Contributor

@jtuglu1 jtuglu1 commented Jan 8, 2026

Closes #16903.

Description

Upgrade to AWS SDK v2.40.0. Looks like dependencies hdfs-storage + ranger have transitive dependencies on V1. Those will need to be excluded or upgraded as well. Core modules/extensions required for runtime are not reliant on V1.

Migrated to strictly synchronous APIs (to maintain backwards-compatibility with V1). A follow-up PR should be able to make use of async APIs where suitable. S3TransferManager is the only case where an async client may be used.

Hiccups

  1. KinesisAggregatorV2 not available in Maven – Maven via JitPack: amazon-kinesis-aggregator-2.0.2 has v1 code, amazon-kinesis-deaggregator-2.0.2 unavailable awslabs/kinesis-aggregation#120. Doesn't look like anything uses aggregation=true, so added a warning message. Seems like [FLINK-32097][Connectors/Kinesis] Implement support for Kinesis deaggregation flink-connector-aws#188 also ran into similar issues, but implemented their own unmerged approach. Per the warning on the https://github.com/awslabs/kinesis-aggregation repo, looks like data loss can occur anyways when using this mode: https://github.com/awslabs/kinesis-aggregation/blob/master/potential_data_loss.md so don't think that's really a desired feature anyways.

Release note

Upgrade to AWS SDK v2.40.0


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jtuglu1 jtuglu1 changed the title Upgrade to AWS SDK V2 WIP: Upgrade to AWS SDK V2 Jan 8, 2026
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@jtuglu1 jtuglu1 force-pushed the upgrade-sdk-v2 branch 4 times, most recently from 12c1550 to 0bde0ea Compare January 8, 2026 17:24
@jtuglu1 jtuglu1 force-pushed the upgrade-sdk-v2 branch 2 times, most recently from 91ee288 to 84db439 Compare January 8, 2026 18:20
pom.xml Outdated
<graaljs.version>22.3.5</graaljs.version>
<mockito.version>5.14.2</mockito.version>
<aws.sdk.version>1.12.784</aws.sdk.version>
<aws.sdk.v2.version>2.28.28</aws.sdk.v2.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we upgrade to the latest aws v2 sdk version ?

Copy link
Contributor Author

@jtuglu1 jtuglu1 Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will bump it to 2.40 minor release. Wanted to minimize chances of upstream bugs by using a stable version for testing.

@jtuglu1 jtuglu1 force-pushed the upgrade-sdk-v2 branch 2 times, most recently from 3e2092e to e2afbe4 Compare January 12, 2026 20:09
@jtuglu1 jtuglu1 closed this Jan 12, 2026
@jtuglu1 jtuglu1 reopened this Jan 12, 2026
@jtuglu1 jtuglu1 changed the title WIP: Upgrade to AWS SDK V2 Upgrade to AWS SDK V2 Jan 13, 2026
public class AWSCredentialsUtils
{
public static AWSCredentialsProviderChain defaultAWSCredentialsProviderChain(final AWSCredentialsConfig config)
public static AwsCredentialsProvider defaultAWSCredentialsProviderChain(final AWSCredentialsConfig config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there is also a AwsCredentialsProviderChain.of(...) method, it might look cleaner.

@Fly-Style
Copy link
Contributor

Very good job @jtuglu1! I have no objections, PR looks good from my perspective!

@jtuglu1 jtuglu1 marked this pull request as ready for review January 14, 2026 03:46
@jtuglu1
Copy link
Contributor Author

jtuglu1 commented Jan 14, 2026

👍 I'd still like to add 1-2 ITs that fully cover the newer S3 changes, but overall think it's basically there.

@cryptoe
Copy link
Contributor

cryptoe commented Jan 14, 2026

Due to the criticality of the change, I think we should get 2 approvals for this PR.
cc @jtuglu1

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a partial review.
Wanted to ask the following questions:
Did we test this patch on any cluster with the following things with we have s3 as the deep storage.

  • Kinesis Ingestion.
  • MSQ with Durable storage.
  • MSQ select with durable storage as the destination.
  • Segment getting nuked out.
  • Task logs being viewable and getting purged out on S3.
  • Segment ingestion.

Also have you tried this patch in any of the production clusters ?


@Override
public void refresh()
private void refresh()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who calls the refresh method now ?

* Parse region from a Kinesis endpoint URL.
* Expected format: https://kinesis.{region}.amazonaws.com
*/
private static Region parseRegionFromEndpoint(String endpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add tests for this function.

this.maxListingLength = maxListingLength;
this.maxRetries = maxRetries;

prepareNextRequest();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit weird that in the constructor we are calling these methods.
Should we call these in hasNext() ?

return true;
}

// Recognize s3sync.rb directory placeholders by MD5/ETag value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share relevant docs for these placeholders ?

@jtuglu1
Copy link
Contributor Author

jtuglu1 commented Jan 14, 2026

Hi @cryptoe. Yes, I'm testing this on our production clusters running Kafka + S3. We don't have production clusters using Kinesis. I've only been able to run tests locally. I wonder if you folks might have some Kinesis clusters to test the new logic out on?

@cryptoe
Copy link
Contributor

cryptoe commented Jan 20, 2026

Hi @cryptoe. Yes, I'm testing this on our production clusters running Kafka + S3. We don't have production clusters using Kinesis. I've only been able to run tests locally.
Thanks for this. Do keep us posted about the findings.

I wonder if you folks might have some Kinesis clusters to test the new logic out on?
We also don't have active kinesis clusters but let me get back to you on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AWS SDK 1.x EOL - Migrate AWS SDK for Java from 1.x to 2.x

3 participants