-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Upgrade to AWS SDK V2 #18891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Upgrade to AWS SDK V2 #18891
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
12c1550 to
0bde0ea
Compare
91ee288 to
84db439
Compare
84db439 to
bd6cd75
Compare
...ns-core/s3-extensions/src/test/java/org/apache/druid/storage/s3/S3DataSegmentPullerTest.java
Fixed
Show fixed
Hide fixed
...ns-core/s3-extensions/src/test/java/org/apache/druid/storage/s3/S3DataSegmentPullerTest.java
Fixed
Show fixed
Hide fixed
...ns-core/s3-extensions/src/test/java/org/apache/druid/storage/s3/S3DataSegmentPullerTest.java
Fixed
Show fixed
Hide fixed
...sions-core/s3-extensions/src/test/java/org/apache/druid/data/input/s3/S3InputSourceTest.java
Fixed
Show fixed
Hide fixed
...sions-core/s3-extensions/src/test/java/org/apache/druid/data/input/s3/S3InputSourceTest.java
Fixed
Show fixed
Hide fixed
...-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/output/S3StorageConnector.java
Fixed
Show fixed
Hide fixed
...-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/output/S3StorageConnector.java
Fixed
Show fixed
Hide fixed
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3DataSegmentMover.java
Fixed
Show fixed
Hide fixed
...exing-service/src/test/java/org/apache/druid/indexing/kinesis/KinesisRecordSupplierTest.java
Fixed
Show fixed
Hide fixed
...exing-service/src/test/java/org/apache/druid/indexing/kinesis/KinesisRecordSupplierTest.java
Fixed
Show fixed
Hide fixed
pom.xml
Outdated
| <graaljs.version>22.3.5</graaljs.version> | ||
| <mockito.version>5.14.2</mockito.version> | ||
| <aws.sdk.version>1.12.784</aws.sdk.version> | ||
| <aws.sdk.v2.version>2.28.28</aws.sdk.v2.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we upgrade to the latest aws v2 sdk version ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will bump it to 2.40 minor release. Wanted to minimize chances of upstream bugs by using a stable version for testing.
3e2092e to
e2afbe4
Compare
| public class AWSCredentialsUtils | ||
| { | ||
| public static AWSCredentialsProviderChain defaultAWSCredentialsProviderChain(final AWSCredentialsConfig config) | ||
| public static AwsCredentialsProvider defaultAWSCredentialsProviderChain(final AWSCredentialsConfig config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: there is also a AwsCredentialsProviderChain.of(...) method, it might look cleaner.
cloud/aws-common/src/test/java/org/apache/druid/common/aws/AWSClientUtilTest.java
Show resolved
Hide resolved
...ons-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3ServerSideEncryption.java
Show resolved
Hide resolved
|
Very good job @jtuglu1! I have no objections, PR looks good from my perspective! |
|
👍 I'd still like to add 1-2 ITs that fully cover the newer S3 changes, but overall think it's basically there. |
e2afbe4 to
009f34e
Compare
|
Due to the criticality of the change, I think we should get 2 approvals for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a partial review.
Wanted to ask the following questions:
Did we test this patch on any cluster with the following things with we have s3 as the deep storage.
- Kinesis Ingestion.
- MSQ with Durable storage.
- MSQ select with durable storage as the destination.
- Segment getting nuked out.
- Task logs being viewable and getting purged out on S3.
- Segment ingestion.
Also have you tried this patch in any of the production clusters ?
|
|
||
| @Override | ||
| public void refresh() | ||
| private void refresh() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who calls the refresh method now ?
| * Parse region from a Kinesis endpoint URL. | ||
| * Expected format: https://kinesis.{region}.amazonaws.com | ||
| */ | ||
| private static Region parseRegionFromEndpoint(String endpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add tests for this function.
| this.maxListingLength = maxListingLength; | ||
| this.maxRetries = maxRetries; | ||
|
|
||
| prepareNextRequest(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels a bit weird that in the constructor we are calling these methods.
Should we call these in hasNext() ?
| return true; | ||
| } | ||
|
|
||
| // Recognize s3sync.rb directory placeholders by MD5/ETag value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please share relevant docs for these placeholders ?
|
Hi @cryptoe. Yes, I'm testing this on our production clusters running Kafka + S3. We don't have production clusters using Kinesis. I've only been able to run tests locally. I wonder if you folks might have some Kinesis clusters to test the new logic out on? |
|
Closes #16903.
Description
Upgrade to AWS SDK v2.40.0. Looks like dependencies hdfs-storage + ranger have transitive dependencies on V1. Those will need to be excluded or upgraded as well. Core modules/extensions required for runtime are not reliant on V1.
Migrated to strictly synchronous APIs (to maintain backwards-compatibility with V1). A follow-up PR should be able to make use of async APIs where suitable. S3TransferManager is the only case where an async client may be used.
Hiccups
KinesisAggregatorV2not available in Maven – Maven via JitPack: amazon-kinesis-aggregator-2.0.2 has v1 code, amazon-kinesis-deaggregator-2.0.2 unavailable awslabs/kinesis-aggregation#120. Doesn't look like anything usesaggregation=true, so added a warning message. Seems like [FLINK-32097][Connectors/Kinesis] Implement support for Kinesis deaggregation flink-connector-aws#188 also ran into similar issues, but implemented their own unmerged approach. Per the warning on the https://github.com/awslabs/kinesis-aggregation repo, looks like data loss can occur anyways when using this mode: https://github.com/awslabs/kinesis-aggregation/blob/master/potential_data_loss.md so don't think that's really a desired feature anyways.Release note
Upgrade to AWS SDK v2.40.0
This PR has: