You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have updated to the latest minor and patch version of Strands
I have checked the documentation and this is not expected behavior
I have searched existing issues and there are no duplicates of my issue
Strands Version
1.38.0 (latest release as of 2026-04-30).
Verified that BedrockModel.stream is unchanged on main HEAD (identical to 1.38.0 and 1.35.0 for this method).
Python Version
3.13
Operating System
Linux
Installation Method
pip
Steps to Reproduce
The bug is in how BedrockModel.stream manages the lifecycle of the background task it creates internally. Any outer cancellation of the generator exposes it. Three independent scenarios reproduce the same symptom:
Scenario 1 — Outer timeout via asyncio.wait_for
A standard Python idiom for bounding the time to the first streaming event:
importasynciofromstrands.models.bedrockimportBedrockModelasyncdefmain() ->None:
model=BedrockModel() # uses default model idgen=model.stream(
messages=[{"role": "user", "content": [{"text": "hello"}]}]
)
ait=gen.__aiter__()
try:
# Unrealistically short deadline to make the test deterministic:# any outer timeout shorter than the actual first-chunk latency reproduces.awaitasyncio.wait_for(ait.__anext__(), timeout=0.001)
exceptasyncio.TimeoutError:
pass# caller handles the timeout# The async generator is now abandoned. The internal asyncio.Task# that stream() created via asyncio.create_task(asyncio.to_thread(...))# is still running. When boto3's converse_stream() eventually# finishes or hits its read_timeout, the exception (or successful# result) lands on that orphan task and is never consumed.awaitasyncio.sleep(65) # wait long enough for boto3 read_timeoutasyncio.run(main())
Scenario 2 — Consumer break + generator GC
Equally standard in web servers / UI code that wants only the first few events:
asyncforeventingen:
# Stop early, generator is GC'd before sending completion.break
Scenario 3 — Client disconnect in a streaming HTTP endpoint
When a FastAPI/Starlette handler streams events from agent.stream_async(...) and the client disconnects mid-stream, the underlying coroutine is cancelled. The same orphan task remains inside BedrockModel.stream.
In all three scenarios, if the background boto3 call eventually raises (e.g. botocore.exceptions.ReadTimeoutError after read_timeout, or a ClientError), asyncio's default handler emits:
Task exception was never retrieved
future: <Task finished name='Task-N'
coro=<to_thread() done, defined at .../asyncio/threads.py:12>
exception=ReadTimeoutError("Read timeout on endpoint URL:
'https://bedrock-runtime.<region>.amazonaws.com/model/<model_id>/converse-stream'")>
No changes to SDK source are made. The reproducer imports BedrockModel as-is from strands.models.bedrock.
Expected Behavior
When the async generator returned by BedrockModel.stream is abandoned, cancelled, or closed before the normal completion path is reached, the SDK should ensure the background task it created internally is handled such that its exception (if any) is consumed. No orphan-task warning should surface to application logs.
Cancelling the underlying boto3 worker thread is not expected — that is a known Python asyncio.to_thread limitation. The ask is cleanup of the task object, not interruption of the thread.
Actual Behavior
BedrockModel.stream in strands/models/bedrock.py (verified against tag v1.38.0 and main HEAD) creates an independent task and only awaits it on the successful path:
# strands/models/bedrock.py (v1.38.0, unchanged on main)asyncdefstream(self, messages, ...):
defcallback(event: StreamEvent|None=None) ->None:
loop.call_soon_threadsafe(queue.put_nowait, event)
loop=asyncio.get_event_loop()
queue: asyncio.Queue[StreamEvent|None] =asyncio.Queue()
...
thread=asyncio.to_thread(self._stream, callback, messages, ...)
task=asyncio.create_task(thread) # independent TaskwhileTrue:
event=awaitqueue.get() # cancellation pointifeventisNone:
breakyieldeventawaittask# only reached on the happy path
There is no try/except/finally wrapping the loop. When the outer consumer leaves the generator via CancelledError, GeneratorExit, or any exception:
The async generator terminates without executing await task.
asyncio.to_thread cannot propagate cancellation to the worker thread, so self.client.converse_stream(**request) keeps running until it completes or hits boto3's read_timeout.
When the thread raises, the exception lands on task. With no awaiter, asyncio schedules a __del__-time warning via its default exception handler.
This is observable by the user as the "Task exception was never retrieved" message and, in debug mode, a full traceback.
Additional Context
Why this matters beyond a single codebase:
asyncio.wait_for is the idiomatic way to enforce any deadline on awaitables in Python, including "time to first event" on a streaming LLM call. Any Strands user who wraps stream() with wait_for hits this.
Client disconnects on streaming HTTP endpoints are common. FastAPI/Starlette propagate disconnects as CancelledError on the handler coroutine — same code path.
Async generators are normally safe to abandon. Python's async for ...: break pattern is common and expected to work without leaking resources. SDK consumers do not expect this to cause stderr noise hours later.
Agent.cancel() (introduced in feat: add CancellationToken for graceful agent execution cancellation #1772) does not address this layer. It sets a threading.Event checked inside process_stream's chunk loop. Before the first chunk arrives, or when the caller is not using Agent at all (using BedrockModel directly), the signal has no effect on the background task.
Python asyncio.to_thread cannot cancel the worker thread — this is a language constraint, not a Strands bug. The fix is to consume the task's exception, not to interrupt the thread.
Why Agent.cancel() alone is not a sufficient answer
The SDK supports direct use of BedrockModel.stream outside an Agent context (custom execution loops, testing, multi-provider wrappers). Those users have no Agent.cancel() to call.
Even inside an Agent, the cancellation checkpoint in process_stream requires that at least one chunk has been received. A slow first token leaves the checkpoint unreachable.
Agent.cancel() is a turn-level signal. Callers who want to cancel only the current model call (e.g. to switch to another provider) do not want to invalidate the whole turn.
Possible Solution
Three options, in increasing scope. The first is a minimal fix that addresses the reported symptom without changing any public behavior; the others are design-level improvements that could come later.
Option A — Retrieve the exception on any non-happy path (~10 line diff)
Pros: minimal diff; behavior identical on the happy path; silences the orphan warning across all three scenarios above.
Cons: the boto3 thread continues until read_timeout; no hook for callers that want to await it.
Option B — Drain the task in a detached coroutine
Same try/except shape as Option A, but schedule a background coroutine that awaits task so any side-effects (logging, observability hooks inside _stream) run to completion:
Pros: uniform teardown across all cancellation paths.
Cons: introduces a detached task without a handle callers can await.
Option C — First-class cancellation primitive
Add an optional cancellation argument to BedrockModel.stream (and eventually other providers) that integrates with the existing process_stream checkpoint pattern. This would offer parity with the TypeScript SDK's cancelSignal option on Agent.invoke and would make Agent.cancel() a natural composition on top.
Pros: enables declarative deadlines, aligns Python/TS APIs, composes well with Agent.cancel().
Cons: larger surface; benefits from a cancellation RFC covering other providers.
Recommendation
Option A alone solves the user-visible symptom reported here and is safe to ship independently of any larger cancellation redesign. Options B and C can follow as separate discussions once there is consensus on the teardown contract for all Model implementations.
Happy to open a PR for Option A if maintainers agree on the direction.
Checks
Strands Version
1.38.0 (latest release as of 2026-04-30).
Verified that
BedrockModel.streamis unchanged onmainHEAD (identical to 1.38.0 and 1.35.0 for this method).Python Version
3.13
Operating System
Linux
Installation Method
pip
Steps to Reproduce
The bug is in how
BedrockModel.streammanages the lifecycle of the background task it creates internally. Any outer cancellation of the generator exposes it. Three independent scenarios reproduce the same symptom:Scenario 1 — Outer timeout via
asyncio.wait_forA standard Python idiom for bounding the time to the first streaming event:
Scenario 2 — Consumer
break+ generator GCEqually standard in web servers / UI code that wants only the first few events:
Scenario 3 — Client disconnect in a streaming HTTP endpoint
When a FastAPI/Starlette handler streams events from
agent.stream_async(...)and the client disconnects mid-stream, the underlying coroutine is cancelled. The same orphan task remains insideBedrockModel.stream.In all three scenarios, if the background boto3 call eventually raises (e.g.
botocore.exceptions.ReadTimeoutErrorafterread_timeout, or aClientError), asyncio's default handler emits:No changes to SDK source are made. The reproducer imports
BedrockModelas-is fromstrands.models.bedrock.Expected Behavior
When the async generator returned by
BedrockModel.streamis abandoned, cancelled, or closed before the normal completion path is reached, the SDK should ensure the background task it created internally is handled such that its exception (if any) is consumed. No orphan-task warning should surface to application logs.Cancelling the underlying boto3 worker thread is not expected — that is a known Python
asyncio.to_threadlimitation. The ask is cleanup of the task object, not interruption of the thread.Actual Behavior
BedrockModel.streaminstrands/models/bedrock.py(verified against tagv1.38.0andmainHEAD) creates an independent task and only awaits it on the successful path:There is no
try/except/finallywrapping the loop. When the outer consumer leaves the generator viaCancelledError,GeneratorExit, or any exception:await task.asyncio.to_threadcannot propagate cancellation to the worker thread, soself.client.converse_stream(**request)keeps running until it completes or hits boto3'sread_timeout.task. With no awaiter,asyncioschedules a__del__-time warning via its default exception handler.This is observable by the user as the
"Task exception was never retrieved"message and, in debug mode, a full traceback.Additional Context
Why this matters beyond a single codebase:
asyncio.wait_foris the idiomatic way to enforce any deadline on awaitables in Python, including "time to first event" on a streaming LLM call. Any Strands user who wrapsstream()withwait_forhits this.CancelledErroron the handler coroutine — same code path.async for ...: breakpattern is common and expected to work without leaking resources. SDK consumers do not expect this to cause stderr noise hours later.Agent.cancel()(introduced in feat: add CancellationToken for graceful agent execution cancellation #1772) does not address this layer. It sets athreading.Eventchecked insideprocess_stream's chunk loop. Before the first chunk arrives, or when the caller is not usingAgentat all (usingBedrockModeldirectly), the signal has no effect on the background task.asyncio.to_threadcannot cancel the worker thread — this is a language constraint, not a Strands bug. The fix is to consume the task's exception, not to interrupt the thread.Why
Agent.cancel()alone is not a sufficient answerBedrockModel.streamoutside anAgentcontext (custom execution loops, testing, multi-provider wrappers). Those users have noAgent.cancel()to call.Agent, the cancellation checkpoint inprocess_streamrequires that at least one chunk has been received. A slow first token leaves the checkpoint unreachable.Agent.cancel()is a turn-level signal. Callers who want to cancel only the current model call (e.g. to switch to another provider) do not want to invalidate the whole turn.Possible Solution
Three options, in increasing scope. The first is a minimal fix that addresses the reported symptom without changing any public behavior; the others are design-level improvements that could come later.
Option A — Retrieve the exception on any non-happy path (~10 line diff)
Pros: minimal diff; behavior identical on the happy path; silences the orphan warning across all three scenarios above.
Cons: the boto3 thread continues until
read_timeout; no hook for callers that want to await it.Option B — Drain the task in a detached coroutine
Same try/except shape as Option A, but schedule a background coroutine that awaits
taskso any side-effects (logging, observability hooks inside_stream) run to completion:Pros: uniform teardown across all cancellation paths.
Cons: introduces a detached task without a handle callers can await.
Option C — First-class cancellation primitive
Add an optional cancellation argument to
BedrockModel.stream(and eventually other providers) that integrates with the existingprocess_streamcheckpoint pattern. This would offer parity with the TypeScript SDK'scancelSignaloption onAgent.invokeand would makeAgent.cancel()a natural composition on top.Pros: enables declarative deadlines, aligns Python/TS APIs, composes well with
Agent.cancel().Cons: larger surface; benefits from a cancellation RFC covering other providers.
Recommendation
Option A alone solves the user-visible symptom reported here and is safe to ship independently of any larger cancellation redesign. Options B and C can follow as separate discussions once there is consensus on the teardown contract for all
Modelimplementations.Happy to open a PR for Option A if maintainers agree on the direction.
Related Issues
feat: add CancellationToken for graceful agent execution cancellation(introducedAgent.cancel(); addresses a different layer)agent.cancel()a noop when agent is idle #2156 —[FEATURE] Make agent.cancel() a noop when agent is idle(cancellation semantics at the Agent level)[BUG] cancel_node in BeforeNodeCallEvent raises RuntimeError that kills the entire graph on resume(graph-level cancellation)None of these cover the
BedrockModel.streaminternal task cleanup described above.