Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 61 additions & 7 deletions docs/best-practices/cloud-access-control.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@ Temporal Cloud supports two secure authentication methods for Workers:

Both options help secure communication between workers and Temporal Cloud. Choosing the right method and managing it properly is key to maintaining security and minimizing downtime.

Use this page to define your operating model for machine access to Temporal Cloud. For setup steps and product-specific
mechanics, see [Manage API keys](/cloud/api-keys) and [Manage service accounts](/cloud/service-accounts).

Related guidance:
- [Namespace best practices](/best-practices/managing-namespace)
- [Multi-tenant application patterns](/production-deployment/multi-tenant-patterns)

The high-level end-to-end rotation process is:

1. **Generate new credentials**: Create new certificates or API keys in Temporal Cloud before the current ones expire
Expand All @@ -45,17 +52,64 @@ In the case that you are using multiple certificates signed by the same CA, and

One convention is to give certificates a common name that matches the namespace. If you do this when using the same CA for dev and prod, then you can leverage Certificate Filters to prevent access to production environments. This is described in detail under the [authorization section](https://docs.temporal.io/cloud/certificates#control-authorization) of the documentation.

## Best practices:
#### 1. Establish clear guidelines on authentication methods: Teams should standardize on either [mTLS certificates](https://docs.temporal.io/cloud/certificates) or [API keys](https://docs.temporal.io/cloud/api-keys) for the following operations:
## Best practices

### Establish clear guidelines on authentication methods

Teams should standardize on either [mTLS certificates](https://docs.temporal.io/cloud/certificates) or
[API keys](https://docs.temporal.io/cloud/api-keys) for the following operations:
- Connect Temporal clients to Temporal Cloud (e.g. Worker processes)
- Automation (e.g. Temporal Cloud [Operations API](https://docs.temporal.io/ops), [Terraform provider](https://docs.temporal.io/cloud/terraform-provider), [Temporal CLI](https://docs.temporal.io/cli/setup-cli))

By default, it is recommended for teams to use API keys and [service accounts](https://docs.temporal.io/cloud/service-accounts) for both operations because API keys are easier to manage and rotate for most teams. In addition, you can control account-level and namespace-level roles for service accounts.
By default, teams should use API keys with [service accounts](/cloud/service-accounts) for both operations. API keys
are generally easier to set up and rotate than mTLS certificates, and service accounts let you assign account-level and
namespace-level roles.

If your organization requires mutual authentication and stronger cryptographic guarantees, use
[mTLS certificates](/cloud/certificates) to authenticate Temporal clients to Temporal Cloud and use API keys for
automation, because the Temporal Cloud [Operations API](/ops) and [Terraform provider](/cloud/terraform-provider) only
support API key authentication. Unlike API keys tied to users or service accounts, mTLS certificate authentication is
not tied to Temporal Cloud RBAC identities. Namespace access is based on CA trust, with optional
[Certificate Filters](/cloud/certificates#manage-certificate-filters) to narrow access by Common Name.

### Default operating model for service accounts and API keys

For most organizations, use the following defaults:

- Create one Service Account per service or worker deployment, not one shared Service Account for an entire team
- Use account-level Service Accounts only when a service genuinely needs cross-Namespace or account-wide access
- Prefer Namespace-scoped Service Accounts when a service should only access one Namespace
- Grant Service Accounts namespace-level access only to the specific Namespaces they need

This approach gives you cleaner ownership, easier rotation, and better auditability than sharing a single machine
identity across multiple services.

### Use access boundaries that match your Namespace boundaries

The way you partition Namespaces should usually match the way you partition machine identities.

- If multiple services share a Namespace, you may still want one Service Account per service so that each deployment can
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If multiple services share a Namespace, you may still want one Service Account per service so that each deployment can
- If multiple services share a Namespace, you may still want one Service Account per service so that each deployment can rotate credentials independently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just line wrapping artifacts, shouldn't affect anything

rotate credentials independently.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rotate credentials independently.

- If you split workloads into separate Namespaces for security, capacity, or team ownership reasons, those Namespaces
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If you split workloads into separate Namespaces for security, capacity, or team ownership reasons, those Namespaces
- If you split workloads into separate Namespaces for security, capacity, or team ownership reasons, those Namespaces should usually have separate Service Accounts and API keys as well.

should usually have separate Service Accounts and API keys as well.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
should usually have separate Service Accounts and API keys as well.

- If you use Namespace-per-tenant isolation, expect your credential model and RBAC model to become correspondingly more
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If you use Namespace-per-tenant isolation, expect your credential model and RBAC model to become correspondingly more
- If you use Namespace-per-tenant isolation, expect your credential model and RBAC model to become correspondingly more granular.

granular.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
granular.


For more on topology tradeoffs, see [Namespace best practices](/best-practices/managing-namespace) and
[Multi-tenant application patterns](/production-deployment/multi-tenant-patterns).

### Rotate credentials without downtime

Use the following sequence when rotating credentials:

If your organization requires mutual authentication and stronger cryptographic guarantees, then it is encouraged for your teams to use mTLS certificates to authenticate Temporal clients to Temporal Cloud and use API keys for automation (because Temporal Cloud [Operations API](https://docs.temporal.io/ops) and [Terraform provider](https://docs.temporal.io/cloud/terraform-provider) only supports API key for authentication)
1. Create the replacement credential before the existing one expires.
2. For API keys, create the new valid key while the old key still works, then roll your Workers and clients to use the new key.
3. For client certificates, stage the new certificate before removing the old one when your deployment process supports that transition.
4. Validate connectivity and normal Workflow execution using the new credential.
5. Remove the old credential only after all clients and Workers have switched.

#### 2. Use Certificate Filters to restrict access when using shared CAs (e.g., `dev` vs `prod`):
### Use Certificate Filters to restrict access when using shared CAs (e.g., `dev` vs `prod`)

Certificate Filters are an additional way of validating using the client certificate presented during client authentication. Give certificates a common name that matches the namespace. This is not a requirement.
Certificate Filters are an additional way of validating using the client certificate presented during client authentication. Give certificates a common name that matches the namespace. This is not a requirement.

If you do this when using the same CA for dev and prod environments, then you can leverage Certificate Filters to prevent access to production.
If you do this when using the same CA for dev and prod environments, then you can leverage Certificate Filters to prevent access to production.
6 changes: 6 additions & 0 deletions docs/cloud/get-started/api-keys.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@ The authentication process follows this pathway:
unexpected or unauthorized activity.
- **Use a Key Management System (KMS)**: Employ a Key Management System to minimize the risk of key leaks.

For guidance on which identities should own API keys, when to use Namespace-scoped Service Accounts, and how to align
API keys with your Namespace topology, see [Managing Temporal Cloud access control](/best-practices/cloud-access-control).

### API key use cases

API keys are used for the following scenarios:
Expand Down Expand Up @@ -223,6 +226,9 @@ Temporal API keys automatically expire based on the specified expiration time. F
1. Switch clients to load the new key and start using it.
1. Delete the old key after it is no longer in use.

For a broader machine-identity rotation strategy across API keys and Service Accounts, see
[Managing Temporal Cloud access control](/best-practices/cloud-access-control).

## Manage API keys for Service Accounts {#serviceaccount-api-keys}

Global Administrators and Account Owners can manage and generate API keys for _all_ Service Accounts in their account.
Expand Down
5 changes: 5 additions & 0 deletions docs/cloud/get-started/service-accounts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ With the addition of Service Accounts, Temporal Cloud now supports 2 identity ty
Service Accounts use API Keys as the authentication mechanism to connect to Temporal Cloud.
You should use Service Accounts to represent a non-human identity when authenticating to Temporal Cloud for operations automation or the Temporal SDKs and the Temporal CLI for Workflow Execution and management.

For guidance on how to structure Service Accounts across services, Namespaces, and teams, see
[Managing Temporal Cloud access control](/best-practices/cloud-access-control). A common default is one Service Account
per service or worker deployment, with Namespace-scoped Service Accounts preferred when a service only needs access to a
single Namespace.

:::tip

Namespace Admins can now manage and create [Namespace-scoped Service Accounts](/cloud/service-accounts#scoped), regardless of their Account Role.
Expand Down
11 changes: 11 additions & 0 deletions docs/cloud/metrics/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,17 @@ When used together, Cloud and SDK metrics measure the health and performance of

Cloud metrics for all Namespaces in your account are available from the [OpenMetrics endpoint](/cloud/metrics/openmetrics), a Prometheus-compatible scrapable endpoint at `metrics.temporal.io`.

Use the following rule of thumb when deciding which signal to rely on:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustin-temporal please review


| Question | Primary signal |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads like it's for an LLM to reference, but if that's what we're going for then I'm good with it.

|---|---|
| Is Temporal Cloud accepting and serving work normally? | Cloud metrics |
| Are Tasks backing up in a Task Queue? | Cloud metrics plus SDK Schedule-To-Start metrics |
| Are my Workers saturated, under-provisioned, or misconfigured? | SDK metrics |
| Is my application logic, downstream dependency, or Activity behavior unhealthy? | SDK metrics and traces |

For a Worker-focused view of how to combine these signals, see [Monitor worker health](/cloud/worker-health).

- [OpenMetrics overview](/cloud/metrics/openmetrics) - Getting started and key concepts
- [Metrics integrations](/cloud/metrics/openmetrics/metrics-integrations) - Datadog, Grafana Cloud, New Relic, ClickStack, and more
- [API reference](/cloud/metrics/openmetrics/api-reference) - Endpoint specification and advanced configuration
Expand Down
8 changes: 8 additions & 0 deletions docs/cloud/metrics/openmetrics/metrics-integrations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@ This document is for basic configuration only. For advanced concepts such as lab

Datadog provides a serverless integration with the OpenMetrics endpoint. This integration will scrape metrics, store them in Datadog, and provides a default dashboard with some built in monitors. See the [integration page](https://docs.datadoghq.com/integrations/temporal-cloud-openmetrics/) for more details.

For Datadog users, treat this integration as the Cloud-side half of your observability setup:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustin-temporal please review


- Use OpenMetrics in Datadog to monitor Temporal Cloud behavior such as Task Queue backlog, poll success, and rate limiting.
- Collect [SDK metrics](/cloud/metrics/sdk-metrics-setup) from your Workers separately to monitor saturation, Schedule-To-Start latency, slot availability, and sticky cache behavior.

If you only ingest Cloud metrics, you will miss many worker-side bottlenecks. For recommended Worker monitors, see
[Monitor worker health](/cloud/worker-health).

### Grafana Cloud

Grafana provides a serverless integration with the OpenMetrics endpoint for Grafana Cloud. This integration will scrape metrics, store them in Grafana Cloud, and provides a default dashboard
Expand Down
4 changes: 4 additions & 0 deletions docs/cloud/worker-health.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ This page is a guide to monitoring a Temporal Worker fleet and covers the follow
- [How to detect misconfigured Workers](#detect-misconfigured-workers)
- [How to configure Sticky cache](#configure-sticky-cache)

This page assumes you are monitoring both Worker-side SDK metrics and Cloud-side metrics. Use SDK metrics to understand
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustin-temporal please review

what your Workers are doing, and Cloud metrics to understand what Temporal Cloud is seeing at the Task Queue and service
level. For an overview of how these signals fit together, see [Temporal Cloud metrics](/cloud/metrics).

## Minimal Observations {#minimal-observations}

These alerts should be configured and understood first to gain intelligence into your application health and behaviors.
Expand Down
Loading