This guide explains the fundamental concepts behind durable execution and how the SDK works. You'll understand:
- The difference between
aws-durable-execution-sdk-pythonandaws-durable-execution-sdk-python-testing - How checkpoints and replay enable reliable workflows
- Why your function code runs multiple times but side effects happen once
- The development workflow from writing to testing to deployment
The durable execution ecosystem has two separate packages:
This is the core SDK that runs in your Lambda functions. It provides:
DurableContext- The main interface for durable operations- Operations - Steps, waits, callbacks, parallel, map, child contexts
- Decorators -
@durable_execution,@durable_step, etc. - Configuration - StepConfig, CallbackConfig, retry strategies
- Serialization - How data is saved in checkpoints
Install it in your Lambda deployment package:
pip install aws-durable-execution-sdk-pythonThis is a separate SDK for testing your durable functions. It provides:
DurableFunctionTestRunner- Run functions locally without AWSDurableFunctionCloudTestRunner- Test deployed Lambda functions- Pytest integration - Fixtures and markers for writing tests
- Result inspection - Examine execution state and operation results
Install it in your development environment only:
pip install aws-durable-execution-sdk-python-testingKey distinction: The execution SDK runs in production Lambda. The testing SDK runs on your laptop or CI/CD. They're separate concerns.
Let's trace through a simple workflow to understand the execution model:
@durable_execution
def handler(event: dict, context: DurableContext) -> dict:
# Step 1: Call external API
data = context.step(fetch_data(event["id"]))
# Step 2: Wait 30 seconds
context.wait(Duration.from_seconds(30))
# Step 3: Process the data
result = context.step(process_data(data))
return resultFirst invocation (t=0s):
- Lambda invokes your function
fetch_dataexecutes and calls an external API- Result is checkpointed to AWS
context.wait(Duration.from_seconds(30))is reached- Function returns, Lambda can recycle the environment
Second invocation (t=30s):
- Lambda invokes your function again
- Function code runs from the beginning
fetch_datareturns the checkpointed result instantly (no API call)context.wait(Duration.from_seconds(30))is already complete, execution continuesprocess_dataexecutes for the first time- Result is checkpointed
- Function returns the final result
Key insights:
- Your function code runs twice, but
fetch_dataonly calls the API once - The wait doesn't block Lambda - your environment can be recycled
- You write linear code that looks synchronous
- The SDK handles all the complexity of state management
flowchart LR
subgraph dev["Development (Local)"]
direction LR
A["1. Write Function<br/>aws-durable-execution-sdk-python"]
B["2. Write Tests<br/>aws-durable-execution-sdk-python-testing"]
C["3. Run Tests<br/>pytest"]
end
subgraph prod["Production (AWS)"]
direction LR
D["4. Deploy<br/>SAM/CDK/Terraform"]
E["5. Test in Cloud<br/>pytest --runner-mode=cloud"]
end
A --> B --> C --> D --> E
style dev fill:#e3f2fd
style prod fill:#fff3e0
Here's how you build and test durable functions:
Install the execution SDK and write your Lambda handler:
pip install aws-durable-execution-sdk-pythonfrom aws_durable_execution_sdk_python import (
DurableContext,
durable_execution,
durable_step,
)
@durable_step
def my_step(step_context, data):
# Your business logic
return result
@durable_execution
def handler(event, context: DurableContext):
result = context.step(my_step(event["data"]))
return resultInstall the testing SDK and write tests:
pip install aws-durable-execution-sdk-python-testingimport pytest
from aws_durable_execution_sdk_python.execution import InvocationStatus
from my_function import handler
@pytest.mark.durable_execution(handler=handler, lambda_function_name="my_function")
def test_my_function(durable_runner):
with durable_runner:
result = durable_runner.run(input={"data": "test"}, timeout=10)
assert result.status == InvocationStatus.SUCCEEDEDRun tests without AWS credentials:
pytest test_my_function.pyPackage your function with the execution SDK (not the testing SDK) and deploy using your preferred tool (SAM, CDK, Terraform, etc.).
Run the same tests against your deployed function:
export AWS_REGION=us-west-2
export QUALIFIED_FUNCTION_NAME="MyFunction:$LATEST"
export LAMBDA_FUNCTION_TEST_NAME="my_function"
pytest --runner-mode=cloud test_my_function.pyReady to build your first durable function? Here's a minimal example:
from aws_durable_execution_sdk_python import (
DurableContext,
durable_execution,
durable_step,
StepContext,
)
@durable_step
def greet_user(step_context: StepContext, name: str) -> str:
"""Generate a greeting."""
return f"Hello {name}!"
@durable_execution
def handler(event: dict, context: DurableContext) -> str:
"""Simple durable function."""
name = event.get("name", "World")
greeting = context.step(greet_user(name))
return greetingDeploy this to Lambda and you have a durable function. The greet_user step is checkpointed automatically.
If you need to customize the boto3 Lambda client used for durable execution operations (for example, to configure custom endpoints, retry settings, or credentials), you can pass a boto3_client parameter to the decorator. The client must be a boto3 Lambda client:
import boto3
from botocore.config import Config
from aws_durable_execution_sdk_python import durable_execution, DurableContext
# Create a custom boto3 Lambda client with specific configuration
custom_lambda_client = boto3.client(
'lambda',
config=Config(
retries={'max_attempts': 5, 'mode': 'adaptive'},
connect_timeout=10,
read_timeout=60,
)
)
@durable_execution(boto3_client=custom_lambda_client)
def handler(event: dict, context: DurableContext) -> dict:
# Your durable function logic
return {"status": "success"}The custom Lambda client is used for all checkpoint and state management operations. If you don't provide a boto3_client, the SDK initializes a default Lambda client from your environment.
Now that you've built your first durable function, explore the core features:
Learn the operations:
- Steps - Execute code with retry strategies and checkpointing
- Wait operations - Pause execution for seconds, minutes, or hours
- Callbacks - Wait for external systems to respond
- Child contexts - Organize complex workflows
- Parallel operations - Run multiple operations concurrently
- Map operations - Process collections in parallel
Dive deeper:
- Error handling - Handle failures and implement retry strategies
- Testing patterns - Write effective tests for your workflows
- Best practices - Avoid common pitfalls
- Documentation index - Browse all guides and examples
- Architecture diagrams - Class diagrams and concurrency flows
- Logger integration - Replay-safe structured logging
- Examples directory - More working examples
See the LICENSE file for our project's licensing.