|
| 1 | +# External Storage Sample |
| 2 | + |
| 3 | +This sample demonstrates how to offload large workflow payloads to Amazon S3-compatible |
| 4 | +object storage using the Temporal Python SDK's built-in `ExternalStorage` system, |
| 5 | +combined with a gzip `PayloadCodec` so the payloads stored inline in Temporal and in |
| 6 | +S3 are both compressed. |
| 7 | + |
| 8 | +**Scenario:** A fulfillment center processes batches of shipping orders. The workflow |
| 9 | +receives a small request (a batch ID and order count), then internally calls a |
| 10 | +`fetch_orders` activity that returns the full list of orders with customer records, |
| 11 | +line items, and handling notes. That list — several hundred kilobytes even after |
| 12 | +compression — is passed to a second `process_orders` activity. Finally the workflow |
| 13 | +returns a small `BatchSummary` with totals. |
| 14 | + |
| 15 | +Each payload is first compressed by `CompressionCodec`. The SDK then checks the |
| 16 | +compressed size against the default 256 KiB threshold; payloads still above it are |
| 17 | +stored in S3 and replaced inline with compact claim-check references. The workflow's |
| 18 | +own input (`OrderBatchRequest`) and result (`BatchSummary`) compress to a few hundred |
| 19 | +bytes and remain inline. |
| 20 | + |
| 21 | +A mock S3 service (`s3.py`) is included so you can run the sample locally without |
| 22 | +an AWS account or Docker. A `codec_server.py` is included to decompress payloads |
| 23 | +on demand for the Temporal Web UI. |
| 24 | + |
| 25 | +## Prerequisites |
| 26 | + |
| 27 | +* [uv](https://docs.astral.sh/uv/) |
| 28 | +* [Temporal CLI](https://docs.temporal.io/cli#install) with a local dev server running: |
| 29 | + ``` |
| 30 | + temporal server start-dev |
| 31 | + ``` |
| 32 | + |
| 33 | +## 1. Sync dependencies |
| 34 | + |
| 35 | +```bash |
| 36 | +uv sync --group external-storage |
| 37 | +``` |
| 38 | + |
| 39 | +## 2. Start the mock S3 service |
| 40 | + |
| 41 | +In a dedicated terminal: |
| 42 | + |
| 43 | +```bash |
| 44 | +uv run external_storage/s3.py |
| 45 | +``` |
| 46 | + |
| 47 | +This starts a local S3-compatible server on port 5000 and creates the `temporal-payloads` |
| 48 | +bucket. Leave it running for the duration of the sample. |
| 49 | + |
| 50 | +## 3. Run the worker |
| 51 | + |
| 52 | +In a second terminal: |
| 53 | + |
| 54 | +```bash |
| 55 | +uv run external_storage/worker.py |
| 56 | +``` |
| 57 | + |
| 58 | +## 4. Run the starter |
| 59 | + |
| 60 | +In a third terminal: |
| 61 | + |
| 62 | +```bash |
| 63 | +uv run external_storage/starter.py |
| 64 | +``` |
| 65 | + |
| 66 | +Example output: |
| 67 | + |
| 68 | +``` |
| 69 | +Starting workflow external-storage-20260501-120000 (batch_id=BATCH-20260501-120000, order_count=200) |
| 70 | +
|
| 71 | +Batch BATCH-20260501-120000: 200 orders processed |
| 72 | + Total shipping cost: $28,512.40 |
| 73 | + Total weight: 19,684.2 kg |
| 74 | + Avg delivery: 4.4 days |
| 75 | +``` |
| 76 | + |
| 77 | +## 5. (Optional) Run the codec server |
| 78 | + |
| 79 | +Workflow payloads are gzip-compressed; the large ones additionally live in S3 as |
| 80 | +external storage references. The codec server serves both transformations on demand |
| 81 | +for the Temporal Web UI. Run it in a fourth terminal: |
| 82 | + |
| 83 | +```bash |
| 84 | +uv run external_storage/codec_server.py |
| 85 | +``` |
| 86 | + |
| 87 | +In the Temporal Web UI (http://localhost:8233), open Settings → Data Encoder and set |
| 88 | +the Remote Codec Endpoint to `http://localhost:8081`. Reload the workflow page and the |
| 89 | +inline compressed payloads will be displayed as readable JSON, and externally-stored |
| 90 | +payloads can be downloaded to fetch their actual content from S3. |
| 91 | + |
| 92 | +The Web UI sends the namespace as the `X-Namespace` header on each request, so |
| 93 | +multi-namespace setups can dispatch by reading that header. |
| 94 | + |
| 95 | +| Endpoint | Behavior | |
| 96 | +| --- | --- | |
| 97 | +| `POST /encode` | Compress the payload, then offload to S3 if it exceeds the threshold. | |
| 98 | +| `POST /decode` | Retrieve any external storage references from S3, then decompress. Pass `?preserveStorageRefs=true` to leave references as-is. | |
| 99 | +| `POST /download` | All inputs must be storage references. Retrieves them from S3 and decompresses. | |
| 100 | + |
| 101 | +## 6. Inspect the workflow |
| 102 | + |
| 103 | +Run `temporal workflow show` to see how payloads are stored: |
| 104 | + |
| 105 | +```bash |
| 106 | +temporal workflow show --workflow-id external-storage-<timestamp> |
| 107 | +``` |
| 108 | + |
| 109 | +The workflow's input (`OrderBatchRequest`) and result (`BatchSummary`) are gzip-encoded |
| 110 | +and stored inline in Temporal — small enough to compress to a few hundred bytes. The |
| 111 | +two activity payloads carrying the order list — the output of `fetch_orders` and the |
| 112 | +input to `process_orders` — exceed 256 KiB even after compression, so they appear as |
| 113 | +external storage references, confirming the SDK offloaded them to S3. |
| 114 | + |
| 115 | +## How it works |
| 116 | + |
| 117 | +The `DataConverter` is configured with both a `payload_codec` and an `external_storage`: |
| 118 | + |
| 119 | +```python |
| 120 | +driver = S3StorageDriver( |
| 121 | + client=new_aioboto3_client(s3_client), |
| 122 | + bucket=S3_BUCKET, |
| 123 | +) |
| 124 | +data_converter = dataclasses.replace( |
| 125 | + temporalio.converter.default(), |
| 126 | + payload_codec=CompressionCodec(), |
| 127 | + external_storage=ExternalStorage(drivers=[driver]), |
| 128 | +) |
| 129 | +``` |
| 130 | + |
| 131 | +On the encode path the SDK: |
| 132 | + |
| 133 | +1. Serializes the Python value to a `Payload`. |
| 134 | +2. Runs `CompressionCodec.encode` to gzip the payload bytes. |
| 135 | +3. Checks the compressed size against `payload_size_threshold` (default: 256 KiB). |
| 136 | +4. If still above the threshold, stores the compressed bytes in S3 via |
| 137 | + `S3StorageDriver` and replaces the inline payload with a claim-check reference. |
| 138 | + |
| 139 | +On the decode path the SDK reverses these steps, transparently retrieving from S3 and |
| 140 | +decompressing as needed. |
| 141 | + |
| 142 | +Both the worker and the starter must use the **same** `DataConverter` configuration |
| 143 | +(codec **and** storage) so each side can read what the other wrote. |
0 commit comments