temporalio · lennessyy · May 28, 2026
@@ -0,0 +1,125 @@
+---
+id: autoscaling
+title: Serverless Worker autoscaling
+sidebar_label: Autoscaling
+description:
+  How Temporal autoscales Serverless Workers on each compute provider, including the scaling signals, algorithm behavior,
+  and tuning parameters.
+slug: /encyclopedia/workers/serverless-workers/autoscaling
+toc_max_heading_level: 4
+keywords:
+  - serverless
+  - workers
+  - autoscaling
+  - lambda
+  - cloud run
+  - worker controller instance
+tags:
+  - Workers
+  - Concepts
+  - Serverless
+---
+
+:::tip SUPPORT, STABILITY, and DEPENDENCY INFO
+
+Serverless Workers are in [Pre-release](/evaluate/development-production-features/release-stages#pre-release) and available to select Temporal Cloud customers.
+To request access during Pre-release, create a [support ticket](/cloud/support#support-ticket) or contact your account team.
+APIs are experimental and may be subject to backwards-incompatible changes.
+[Sign up for updates](https://temporal.io/pages/serverless-workers-updates) to be notified when Serverless Workers reach Public Preview.
+
+:::
+
+The [Worker Controller Instance (WCI)](/serverless-workers#worker-controller-instance) autoscales Serverless Workers
+using two signals: sync match failure and Task Queue backlog. The autoscaling algorithm differs by compute provider
+because of differences in cold start latency, invocation duration limits, and provider APIs.
+
+## Scaling signals
+
+Both compute providers use the same two signals to drive scaling decisions.
+
+### Sync match failure {#sync-match-failure}
+
+When a Task is submitted, the [Matching Service](/temporal-service/temporal-server#matching-service) attempts to route
+it directly to an available Worker. If no Worker is available, the sync match fails, and the Matching Service pushes a
+signal to the WCI. Because the Matching Service pushes match failures as they happen rather than the WCI polling on a
+timer, scaling is responsive.
+
+### Task Queue backlog {#task-queue-backlog}
+
+The WCI monitors Task Queue metadata to determine whether pending Tasks exist without enough Workers to process them. If
+there are Tasks on the queue and not enough Workers, the WCI scales up.
+
+## AWS Lambda {#aws-lambda}
+
+The Lambda algorithm is event-driven and reactive. Sync match failure is the primary control signal, and backlog aids
+sizing.
+
+When the WCI needs more capacity, it calls the Lambda `InvokeFunction` API to start new Workers. Each call is a discrete
+action ("invoke N more functions"), not a target state. The WCI does not manage a fleet of instances.
+
+### Scale-out
+
+On sync match failure, the WCI invokes new Lambda functions. Because Lambda cold start is sub-second to low
+single-digit seconds, reactive-only control does not create meaningful backlog overshoot. The WCI can scale from zero
+with low latency.
+
+### Scale-in
+
+Scale-in is automatic. Each Lambda invocation runs until the Worker has finished processing available Tasks or
+approaches the 15-minute execution time limit, then shuts down. There is no drain logic or stabilization window. The WCI
+does not need to actively remove capacity.
+
+### Instance model
+
+Each invocation is independent. The Worker starts, creates a fresh client connection, processes multiple Tasks until near
+the execution time limit, and then shuts down gracefully. There is no shared state across invocations.
+
+## GCP Cloud Run {#cloud-run}
+
+The Cloud Run algorithm is a hybrid rate-plus-backlog controller. It extends the base algorithm with a latency-first
+fast-path that reacts to sync match failures.
+
+Unlike Lambda, the WCI outputs a target state ("there should be _c_ instances") rather than discrete invocations. The
+WCI adjusts Cloud Run's instance count through the Cloud Run admin API.
+
+### Scale-out
+
+The algorithm uses four layers to determine the desired instance count:
+
+1. **Feedforward base capacity.** The WCI estimates the required fleet size from the Task arrival rate, divided by per-instance throughput at the target utilization. Feedforward sizing is critical because Cloud Run cold start is approximately 10-30 seconds. Waiting for backlog to signal under-provisioning means new capacity is 10-30 seconds away.
+2. **Backlog-drain correction.** If a backlog exists, the WCI adds instances to drain it within the target queue wait time.
+3. **Warm-reserve headroom.** The WCI maintains extra capacity above the feedforward estimate to absorb sync match failures without triggering cold starts.
+4. **Sync match fast-path.** On any sync match failure, the WCI immediately re-evaluates and scales out if the current fleet is undersized. This event-triggered path bypasses the regular control interval.
+
+The final desired count is the maximum of the reactive and event-driven calculations, clamped to the configured minimum
+and maximum instance counts, and quantized to the scaling granularity.
+
+### Scale-in
+
+Scale-in is conservative to avoid oscillation:
+
+- **Scale-down stabilization window.** After a scale-down decision, the WCI waits (default 300 seconds) before removing instances. If load increases during this window, the scale-down is canceled.
+- **Hold after scale-out.** After scaling out, the WCI holds the new capacity for a minimum period before considering scale-in.
+- **Drain logic.** When removing instances, the WCI drains them over a configurable horizon, allowing in-flight Tasks to complete before the instance is terminated.
+
+### Minimum instances
+
+Setting `c_min >= 1` keeps at least one instance warm at all times. With constant traffic, this behaves like an
+always-on Worker with elastic scale-up and scale-down. Setting `c_min = 0` enables full scale-to-zero but means the
+first Task after an idle period incurs a cold start.
+
+### Tuning parameters
+
+The following parameters control Cloud Run autoscaling behavior. These are starting points for latency-first operation.
+
+| Parameter                  | Starting value              | Description                                                                                 |
+| -------------------------- | --------------------------- | ------------------------------------------------------------------------------------------- |
+| Control interval           | 15s                         | How often the WCI re-evaluates the desired instance count.                                  |
+| Utilization target         | 0.70-0.80                   | Target per-instance utilization for feedforward sizing.                                      |
+| Queue wait target          | 3-5s                        | Target time a Task should wait in the queue before being picked up.                         |
+| Drain horizon              | 30-60s                      | How long the WCI allows for in-flight Tasks to complete when removing an instance.          |
+| Event cooldown             | max(5s, 0.25 x scale-up latency) | Minimum time between event-triggered scale-out evaluations.                            |
+| Scale-down stabilization   | 300s                        | How long the WCI waits after a scale-down decision before removing instances.               |
+| Hold after scale-out       | max(60s, 2 x scale-up latency)   | Minimum time to hold new capacity before considering scale-in.                         |
+| Min instances              | >= 1 for latency-first      | Minimum instance count. Set to 0 for full scale-to-zero.                                    |
+| Scaling granularity        | 1                           | Minimum step size for scaling changes.                                                      |
@@ -10,6 +10,7 @@ keywords:
   - serverless
   - workers
   - lambda
+  - cloud run
   - compute provider
 tags:
   - Workers
@@ -68,10 +69,9 @@ The Worker Controller Instance (WCI) is a system Workflow that scales Serverless
 One WCI Workflow runs per Worker Deployment Version that has a compute provider configured. The WCI runs in the same
 Namespace as your Worker Deployment.
 
-The WCI responds to two triggers: [sync match failures](#sync-match-failure) and
-[Task Queue backlog](#task-queue-backlog). When either trigger fires, the WCI produces a scaling action, such as
-invoking the configured compute provider (for example, calling AWS Lambda's `InvokeFunction` API) to start new Workers.
-For details on how scaling works, see [Autoscaling](#autoscaling).
+The WCI responds to two triggers: sync match failures and Task Queue backlog. When either trigger fires, the WCI
+produces a scaling action, such as invoking the configured compute provider to start new Workers. For details on how
+scaling works, see [Autoscaling](#autoscaling).
 
 You can list WCI Workflows in your Namespace:
 
@@ -115,28 +115,15 @@ reuse or shared state across invocations.
 
 ## Autoscaling {#autoscaling}
 
-The [WCI](#worker-controller-instance) automatically scales Serverless Workers based on Task Queue signals. When Tasks
-arrive and no Worker is available, the WCI invokes new Workers. When the Tasks are done, Workers exit and scale to zero.
-
-The WCI uses two signals to decide when to invoke new Workers:
-
-### Sync match failure {#sync-match-failure}
-
-When a Task is submitted, the [Matching Service](/temporal-service/temporal-server#matching-service) attempts to route
-it directly to an available Worker. If no Worker is available, the sync match fails, and the Matching Service pushes a
-signal to the WCI. The WCI then invokes a new Worker. This is the primary scaling path. Because the Matching Service
-pushes match failures to the WCI as they happen rather than the WCI polling on a timer, latency stays low and scaling is
-responsive.
-
-### Task Queue backlog {#task-queue-backlog}
-
-The WCI monitors Task Queue metadata to determine whether pending Tasks exist without enough Workers to process them. If
-there are Tasks on the queue and not enough Workers, the WCI invokes additional Workers.
+The [WCI](#worker-controller-instance) automatically scales Serverless Workers based on Task Queue signals. The
+autoscaling algorithm differs by compute provider because of differences in cold start latency, invocation duration
+limits, and provider APIs. For details on how autoscaling works on each platform, see
+[Serverless Worker autoscaling](/encyclopedia/workers/serverless-workers/autoscaling).
 
 ## Scaling with long-lived Workers {#scaling-with-long-lived-workers}
 
 Serverless Workers can share a Task Queue with long-lived Workers. Because Serverless Workers are only invoked on
-[sync match failure](#sync-match-failure), Serverless Workers only pick up Tasks that no long-lived Worker was available
+sync match failure, Serverless Workers only pick up Tasks that no long-lived Worker was available
 to handle. In practice, the Serverless Workers act as spillover capacity for the long-lived fleet.
 
 :::caution
@@ -259,6 +246,7 @@ provider because the Worker process manages its own lifecycle.
 
 ### Supported providers
 
-| Provider   | Description                                                                   |
-| ---------- | ----------------------------------------------------------------------------- |
-| AWS Lambda | Temporal assumes an IAM role in your AWS account to invoke a Lambda function. |
+| Provider      | Description                                                                   |
+| ------------- | ----------------------------------------------------------------------------- |
+| AWS Lambda    | Temporal assumes an IAM role in your AWS account to invoke a Lambda function. |
+| GCP Cloud Run | Temporal manages Cloud Run instance scaling through the Cloud Run admin API.  |
@@ -1534,7 +1534,15 @@ module.exports = {
             'encyclopedia/workers/sticky-execution',
             'encyclopedia/workers/worker-shutdown',
             'encyclopedia/workers/worker-versioning',
-            'encyclopedia/workers/serverless-workers',
+            {
+              type: 'category',
+              label: 'Serverless Workers',
+              collapsed: true,
+              link: { type: 'doc', id: 'encyclopedia/workers/serverless-workers/index' },
+              items: [
+                'encyclopedia/workers/serverless-workers/autoscaling',
+              ],
+            },
           ],
         },
         {