Cancel stream on abort or timeout to prevent CRT HTTP client connection pool exhaustion#6919
Cancel stream on abort or timeout to prevent CRT HTTP client connection pool exhaustion#6919
Conversation
…nection from pool
55f6b1d to
af112fb
Compare
| // Cancels the stream on failure so the pool does not reuse it. | ||
| requestFuture.whenComplete((r, t) -> { | ||
| if (t != null) { | ||
| streamFuture.thenAccept(HttpStreamBase::cancel); |
There was a problem hiding this comment.
stream.cancel is not idempotent unfortunately :(. We have a helpful method here, but we probably should move it somewhere so that it can be used here
There was a problem hiding this comment.
While we are at it, can we add a checkstyle to prevent the use of HttpStreamBase.cancel directly?
There was a problem hiding this comment.
Thanks for the comments 👍
Updated code to use ResponseHandlerHelper. I tried creating a separate StreamHandler to pass it in between Executor and ResponseHandler, but that was causing a lot of changes in constructors and existing test cases, so reused the ResponseHandlerHelper in ResponseHandlers.
Also added checkstyle and tested with the previous fix, and it detected the checkstyle issue:
[INFO] --- checkstyle:3.1.2:check (checkstyle) @ aws-crt-client ---
[INFO] Starting audit...
[ERROR] /v2_java/http-clients/aws-crt-client/src/main/java/software/amazon/awssdk/http/crt/internal/CrtAsyncRequestExecutor.java:82:
Line matches the illegal pattern 'Don't call HttpStreamBase.cancel() directly. Use ResponseHandlerHelper.closeConnection() or the response handler's closeConnection() method, which is idempotent and pairs
cancel() with close() correctly.'. [NoCrtStreamCancel]
Audit done
494bbca to
df131c1
Compare
| long finalAcquireStartTime = acquireStartTime; | ||
|
|
||
| streamFuture.whenComplete((streamBase, throwable) -> { | ||
| if (shouldPublishMetrics) { |
There was a problem hiding this comment.
It seems like we can invoke onAcquireStream here?
|
|
||
| try (SdkHttpClient client = AwsCrtHttpClient.builder().maxConcurrency(3).build()) { | ||
| stubUnresponsiveServer(); | ||
| executeAndAbort(client, uri, 3); |
There was a problem hiding this comment.
Just curious, why do we need to send 3 requests?
There was a problem hiding this comment.
We use 3 to match maxConcurrency(3) and verify all 3 dead connections are evicted and the pool can accept 3 new requests.
5efcffe to
603b69d
Compare
| * {@code cancel()} because per {@link software.amazon.awssdk.crt.http.HttpStreamBase#cancel()} | ||
| * javadoc: "if the stream is already completing for other reasons, this call will have no effect." | ||
| */ | ||
| public void onResponseComplete() { |
There was a problem hiding this comment.
Had to add the streamCompleted check because without it, H2ErrorTest crashes with a SIGSEGV on the CRT side. The pool-exhaustion fix adds a second closeConnection() call through requestFuture.whenComplete. Combined with the existing call in onFailedResponseComplete, this results in cancel() being called after close() has already freed the native stream,
causing the crash. Setting streamCompleted = true before onFailedResponseComplete runs ensures closeConnection() skips cancel() when CRT has already completed the stream,
honoring the HttpStreamBase.cancel() javadoc contract: "if the stream is already completing for other reasons, this call will have no effect."
603b69d to
6a13a35
Compare
Motivation and Context
When API call timeout fires, the SDK does not abort the in-flight CRT HTTP request, causing connections to remain permanently leased and eventually exhausting the connection pool.
cancel()support in aws-crt-java#979Modifications
CrtRequestExecutor(sync) andCrtAsyncRequestExecutor(async), register arequestFuture.whenCompletehook that callsHttpStreamBase.cancel()when the futurecompletes exceptionally.
Testing
Screenshots (if appropriate)
Types of changes
License