Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
44d8a13
feat(data-drains): add GCS, Azure Blob, BigQuery, Snowflake, and Data…
waleedlatif1 May 11, 2026
4ddb6cc
fix(data-drains): address PR review comments
waleedlatif1 May 11, 2026
adbdb6b
fix(data-drains): extract sleepUntilAborted, honor abort across all d…
waleedlatif1 May 11, 2026
62b4aa4
fix(data-drains): widen BigQuery projectId max and dedupe parseServic…
waleedlatif1 May 11, 2026
253099e
fix(data-drains): tighten GCS bucket contract and expose Azure endpoi…
waleedlatif1 May 11, 2026
8677429
improvement(data-drains): extract normalizePrefix and buildObjectKey …
waleedlatif1 May 11, 2026
dc26c56
fix(data-drains): retry BigQuery network errors; tighten Azure accoun…
waleedlatif1 May 11, 2026
85b70ce
improvement(data-drains): consolidate parseRetryAfter; add Datadog ND…
waleedlatif1 May 11, 2026
8c87fcb
fix(data-drains): correct Datadog size guard and Snowflake VARIANT limit
waleedlatif1 May 11, 2026
c5dddf2
improvement(data-drains): consolidate backoffWithJitter into shared u…
waleedlatif1 May 11, 2026
d356ef8
fix(data-drains): align destinations with live provider specs
waleedlatif1 May 11, 2026
e2b5b22
fix(data-drains): address PR review on snowflake poll + shared NDJSON…
waleedlatif1 May 11, 2026
ef850a6
fix(data-drains): per-attempt fetch timeouts in gcs/bigquery, snowfla…
waleedlatif1 May 11, 2026
243e7fe
fix(data-drains): bigquery probe timeout + jittered retries, align Sn…
waleedlatif1 May 11, 2026
520c2f8
fix(data-drains): mirror webhook signingSecret min length in form gate
waleedlatif1 May 11, 2026
5b81d32
fix(data-drains): validate JSON client-side for Snowflake before binding
waleedlatif1 May 11, 2026
4d52c86
fix(data-drains): cross-cutting audit pass against live provider docs
waleedlatif1 May 11, 2026
abae299
fix(data-drains): force ddsource/service overrides on Datadog entries
waleedlatif1 May 11, 2026
a6c8afa
fix(data-drains): preserve row-distinguishing index when BigQuery ins…
waleedlatif1 May 11, 2026
45f93b8
fix(data-drains): refresh GCS token per retry, tighten Azure key regex
waleedlatif1 May 11, 2026
d77a43e
fix(data-drains): address bugbot review of 6336948f6
waleedlatif1 May 11, 2026
835dc0f
fix(data-drains): allow org-account Snowflake identifier with region …
waleedlatif1 May 11, 2026
9f1808a
fix(data-drains): drain retryable response bodies in datadog/gcs loops
waleedlatif1 May 11, 2026
a000c6e
fix(data-drains): drain snowflake poll bodies on 202 and retryable st…
waleedlatif1 May 11, 2026
c8a08b8
fix(data-drains): consume success bodies; check Snowflake sqlState on…
waleedlatif1 May 12, 2026
633d83a
fix(data-drains): drain datadog and bigquery probe success bodies
waleedlatif1 May 12, 2026
3804d01
chore(data-drains): regenerate enum migration as 0206 after staging r…
waleedlatif1 May 12, 2026
58a35bf
fix(data-drains): cap snowflake poll retries; tighten datadog tags mi…
waleedlatif1 May 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 58 additions & 2 deletions apps/docs/content/docs/en/enterprise/data-drains.mdx
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
title: Data Drains
description: Continuously export workflow logs, audit logs, and Mothership data to your own S3 bucket or HTTPS endpoint on a schedule
description: Continuously export workflow logs, audit logs, and Mothership data to your own object store, data warehouse, observability platform, or HTTPS endpoint on a schedule
---

import { FAQ } from '@/components/ui/faq'

Data Drains let organization owners and admins on Enterprise plans continuously export Sim data to a destination they control — a customer-owned S3 bucket or an HTTPS webhook. A drain runs on a schedule, picks up only new rows since its last successful run, and writes them as NDJSON to the destination. Viewing drain configuration and run history is restricted to owners and admins as well, since destinations expose internal bucket names and webhook URLs.
Data Drains let organization owners and admins on Enterprise plans continuously export Sim data to a destination they control — a customer-owned S3 bucket, Google Cloud Storage bucket, Azure Blob container, BigQuery table, Snowflake table, Datadog logs intake, or an HTTPS webhook. A drain runs on a schedule, picks up only new rows since its last successful run, and writes them to the destination. Viewing drain configuration and run history is restricted to owners and admins as well, since destinations expose internal bucket names, table identifiers, and webhook URLs.

Drains are independent of [Data Retention](/enterprise/data-retention) but designed to compose with it — see [Pairing with Data Retention](#pairing-with-data-retention) below.

Expand Down Expand Up @@ -67,6 +67,62 @@ Object keys are deterministic:

Objects are written with `AES256` server-side encryption.

### Google Cloud Storage

Writes one NDJSON object per delivered chunk to your GCS bucket.

- **Bucket** — the bucket name. Must already exist; Sim does not create buckets.
- **Prefix** *(optional)* — folder path inside the bucket. Trailing slash optional.
- **Service account JSON key** — paste the full JSON key for a service account with `storage.objects.create` (and `storage.objects.delete` if you want "Test connection" to clean up its probe). Sim authenticates via OAuth2 service-account JWT and uploads through the GCS JSON API.

Object names follow the same `{prefix}/{source}/{drainId}/{yyyy}/{mm}/{dd}/{runId}-{seq}.ndjson` layout as S3. Object metadata mirrors the S3 destination's `sim-*` keys via `x-goog-meta-*` headers.

### Azure Blob Storage

Writes one NDJSON block blob per delivered chunk to your container.

- **Account name** — your storage account (3–24 lowercase chars).
- **Container** — must already exist; Sim does not create containers.
- **Prefix** *(optional)* — folder path inside the container.
- **Account key** — a storage account access key with write access to the container.

Blob names follow the same `{prefix}/{source}/{drainId}/{yyyy}/{mm}/{dd}/{runId}-{seq}.ndjson` layout. The `sim-*` metadata is exposed as Azure blob metadata (collapsed to lowercase per Azure's identifier rules).

For sovereign clouds, set **Endpoint suffix** to `blob.core.usgovcloudapi.net` (US Gov), `blob.core.chinacloudapi.cn` (China), or `blob.core.cloudapi.de` (Germany).

### Google BigQuery

Streams each row into a target table via the `tabledata.insertAll` API, with per-row insertId dedup.

- **Project ID** — your GCP project (supports domain-scoped IDs like `example.com:my-project`).
- **Dataset ID / Table ID** — must already exist; Sim does not create tables. The table schema must accommodate the source's row shape (one column per top-level field, or a single `JSON`/`STRING` column with the rest as `ignoreUnknownValues`).
- **Service account JSON key** — needs `roles/bigquery.dataEditor` (insert) and `roles/bigquery.metadataViewer` (for the `tables.get` probe used by "Test connection"). Sim authenticates via OAuth2 service-account JWT.

Each row is sent with an `insertId` of `{drainId}-{runId}-{sequence}-{index}`. BigQuery dedupes inserts with the same `insertId` for ~60 seconds, so retries inside that window won't duplicate. If a chunk reports partial failure (`insertErrors`), the run fails with the offending row indices and an outer-driver retry may duplicate rows that already succeeded — the dispatcher's bounded retry minimizes this risk. Enforced per-request limits: 10 MB body, 50,000 rows.

### Snowflake

Inserts each row into a target VARIANT column via the Snowflake SQL API v2 with key-pair JWT auth.

- **Account** — the Snowflake account identifier. Preferred form is `<orgname>-<acctname>` (no dots). Legacy `<locator>.<region>.<cloud>` is also accepted.
- **User / Warehouse / Database / Schema / Table** — must already exist. The user needs `INSERT` privilege on the table and `USAGE` on the warehouse, database, and schema.
- **Column** *(optional)* — target VARIANT column name. Defaults to `DATA` (matches Snowflake's unquoted identifier folding).
- **Role** *(optional)* — Snowflake role to assume.
- **Private key (PEM)** — PKCS8-encoded RSA private key. Register the matching public key on the Snowflake user via `ALTER USER ... SET RSA_PUBLIC_KEY = '...'`.

Each chunk becomes a single `INSERT INTO "DB"."SCHEMA"."TABLE" ("col") VALUES (PARSE_JSON(?)), ...` with one TEXT binding per row. Identifiers are quoted to preserve case. The destination handles Snowflake's async 202-then-poll pattern transparently. Per-row JSON payloads are capped at 16 MB to match Snowflake's VARIANT limit.

### Datadog Logs

POSTs each row as a log entry to Datadog's v2 logs intake.

- **Site** — your Datadog site: `us1`, `us3`, `us5`, `eu1`, `ap1`, `ap2`, or `gov`.
- **Service** *(optional)* — value for the reserved `service` field. Defaults to `sim`.
- **Tags** *(optional)* — comma-separated `ddtags` appended to every entry alongside auto-injected `sim_drain_id:`, `sim_run_id:`, and `sim_source:` tags.
- **API key** — a Datadog API key (not an Application key) with logs-write permission.

Top-level row fields are auto-indexed as Datadog log attributes. The reserved fields `ddsource`, `service`, `ddtags`, and `message` are always set by Sim and override anything in the row. Payloads above 1 KB are gzip-compressed. Enforced limits match Datadog's intake: 5 MB per request (post-compression), 1000 entries per request, 1 MB per entry.

### HTTPS Webhook

POSTs each chunk as NDJSON to your endpoint.
Expand Down
23 changes: 23 additions & 0 deletions apps/sim/components/icons.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -6849,3 +6849,26 @@ export function HexIcon(props: SVGProps<SVGSVGElement>) {
</svg>
)
}

export function BigQueryIcon(props: SVGProps<SVGSVGElement>) {
return (
<svg viewBox='0 0 24 24' xmlns='http://www.w3.org/2000/svg' {...props}>
<path
fill='#4386FA'
d='M12 2.5a9.5 9.5 0 1 0 5.81 17.02l3.4 3.4a1 1 0 0 0 1.41-1.42l-3.4-3.4A9.5 9.5 0 0 0 12 2.5Zm0 2a7.5 7.5 0 1 1 0 15 7.5 7.5 0 0 1 0-15Z'
/>
<path fill='#4386FA' d='M8 11h1.6v4H8v-4Zm3 -2h1.6v6H11V9Zm3 1.5h1.6V15H14v-3.5Z' />
</svg>
)
}

export function SnowflakeIcon(props: SVGProps<SVGSVGElement>) {
return (
<svg viewBox='0 0 24 24' xmlns='http://www.w3.org/2000/svg' {...props}>
<path
fill='#29B5E8'
d='M12 2a1 1 0 0 1 1 1v3.59l2.3-2.3a1 1 0 1 1 1.4 1.42L13 9.41V12h2.6l3.7-3.7a1 1 0 0 1 1.4 1.4L18.42 12H22a1 1 0 1 1 0 2h-3.59l2.3 2.3a1 1 0 0 1-1.4 1.4L15.58 14H13v2.59l3.7 3.7a1 1 0 1 1-1.4 1.4L13 19.42V23a1 1 0 1 1-2 0v-3.58l-2.3 2.3a1 1 0 1 1-1.4-1.4l3.7-3.71V14H8.4l-3.7 3.7a1 1 0 0 1-1.4-1.4L5.58 14H2a1 1 0 0 1 0-2h3.59l-2.3-2.3a1 1 0 0 1 1.4-1.4L8.42 12H11V9.41L7.3 5.71a1 1 0 1 1 1.4-1.42l2.3 2.3V3a1 1 0 0 1 1-1Z'
/>
</svg>
)
}
32 changes: 29 additions & 3 deletions apps/sim/ee/data-drains/components/data-drains-settings.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,14 @@ import {
TableRow,
toast,
} from '@/components/emcn'
import { S3Icon } from '@/components/icons'
import {
AzureIcon,
BigQueryIcon,
DatadogIcon,
GoogleIcon,
S3Icon,
SnowflakeIcon,
} from '@/components/icons'
import { Input as BaseInput } from '@/components/ui'
import type { CreateDataDrainBody, DataDrain, DataDrainRun } from '@/lib/api/contracts/data-drains'
import { useSession } from '@/lib/auth/auth-client'
Expand Down Expand Up @@ -62,6 +69,11 @@ const SOURCE_LABELS: Record<(typeof SOURCE_TYPES)[number], string> = {

const DESTINATION_LABELS: Record<(typeof DESTINATION_TYPES)[number], string> = {
s3: 'Amazon S3',
gcs: 'Google Cloud Storage',
azure_blob: 'Azure Blob Storage',
datadog: 'Datadog',
bigquery: 'Google BigQuery',
snowflake: 'Snowflake',
webhook: 'HTTPS webhook',
}

Expand All @@ -73,8 +85,22 @@ const CADENCE_LABELS: Record<(typeof CADENCE_TYPES)[number], string> = {
const SOURCE_OPTIONS = SOURCE_TYPES.map((t) => ({ value: t, label: SOURCE_LABELS[t] }))
const CADENCE_OPTIONS = CADENCE_TYPES.map((t) => ({ value: t, label: CADENCE_LABELS[t] }))
function getDestinationIcon(type: (typeof DESTINATION_TYPES)[number]) {
if (type !== 's3') return null
return <S3Icon className='size-[14px] flex-shrink-0 text-[#1B660F]' />
switch (type) {
case 's3':
return <S3Icon className='size-[14px] flex-shrink-0 text-[#1B660F]' />
case 'gcs':
return <GoogleIcon className='size-[14px] flex-shrink-0' />
case 'azure_blob':
return <AzureIcon className='size-[14px] flex-shrink-0' />
case 'datadog':
return <DatadogIcon className='size-[14px] flex-shrink-0' />
case 'bigquery':
return <BigQueryIcon className='size-[14px] flex-shrink-0' />
case 'snowflake':
return <SnowflakeIcon className='size-[14px] flex-shrink-0' />
default:
return null
}
}

const DESTINATION_OPTIONS = DESTINATION_TYPES.map((t) => ({
Expand Down
Loading
Loading