[CELEBORN-2262] Prepare S3 directory only once and cache s3 client for MultiPartUploader#3604
[CELEBORN-2262] Prepare S3 directory only once and cache s3 client for MultiPartUploader#3604eolivelli wants to merge 4 commits intoapache:mainfrom
Conversation
… MultiPartUploader (apache#4) - Create only one S3 client for all MultiPartUploaders - Create S3 worker directory only once and not for every slot - Because on S3 AWS creating connections is slow (due to credentials handshaking and TLS handshaking) - Because "mkdirs" in S3 AWS is very slow (and it needs several S3 calls) No No Manual testing on real S3 AWS
|
Thanks for your contribution @eolivelli, I will take a look at this PR before tomorrow and add comments if needed. |
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/storage/StorageManager.scala
Outdated
Show resolved
Hide resolved
...loader/multipart-uploader-s3/src/main/java/org/apache/celeborn/S3MultipartUploadHandler.java
Outdated
Show resolved
Hide resolved
...loader/multipart-uploader-s3/src/main/java/org/apache/celeborn/S3MultipartUploadHandler.java
Show resolved
Hide resolved
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/storage/StorageManager.scala
Outdated
Show resolved
Hide resolved
...in/java/org/apache/celeborn/server/common/service/mpu/MultipartUploadHandlerSharedState.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
@eolivelli, thanks for improvement of S3MultipartUploadHandler. The changes generally look good. Could you provide the testing report of the performance with this improvement?
I added some flamegraphs in the PR description. |
What changes were proposed in this pull request?
Why are the changes needed?
Sample CPU flamegraph about the need of Connection pooling:

Sample CPU flamegraph about the need of pooling the client due to AssumeRoleWithWebIdentity

Does this PR resolve a correctness bug?
No
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manual testing.
There is one end-to-end integration test with S3 that exercise this code