Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
sidebar_label: 'Estuary'
slug: /integrations/estuary
description: 'Stream a variety of sources into ClickHouse with an Estuary integration'
title: 'Connect Estuary with ClickHouse'
sidebar_label: 'Connect with ClickPipes'
slug: /integrations/estuary/clickpipes
description: 'Set up an integration between Estuary and ClickHouse via ClickPipes'
title: 'Ingest Estuary Data via ClickPipes'
doc_type: 'guide'
integration:
- support_level: 'partner'
Expand All @@ -13,13 +13,15 @@

import PartnerBadge from '@theme/badges/PartnerBadge';

# Connect Estuary with ClickHouse
# Ingest Data from Estuary with ClickPipes

<PartnerBadge/>

[Estuary](https://estuary.dev/) is a right-time data platform that flexibly combines real-time and batch data in simple-to-setup ETL pipelines. With enterprise-grade security and deployment options, Estuary unlocks durable data flows from SaaS, database, and streaming sources to a variety of destinations, including ClickHouse.
Estuary can connect with ClickHouse via the Kafka ClickPipe.

Estuary connects with ClickHouse via the Kafka ClickPipe. You don't need to maintain your own Kafka ecosystem with this integration.
You don't need to maintain your own Kafka ecosystem with this integration. Instead, Estuary emits new data like Kafka messages. You can configure a Kafka ClickPipe to use Estuary's broker and schema registry information to consume these messages.

See also [Estuary's direct ClickHouse integration](/integrations/estuary/native).

## Setup guide {#setup-guide}

Expand All @@ -39,7 +41,7 @@

2. Click **+ New Materialization**.

3. Select the **ClickHouse** connector.
3. Select the **ClickHouse Kafka API** connector.

4. Fill out details in the Materialization, Endpoint, and Source Collections sections:

Expand Down Expand Up @@ -89,7 +91,7 @@

4. Choose to create a new table or load data into a matching existing table.

5. Map source fields to table columns, confirming column name, type, and whether it is nullable.

Check notice on line 94 in docs/integrations/data-ingestion/etl-tools/estuary/clickpipes.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Contractions

Suggestion: Use 'it's' instead of 'it is'.

6. In the final **Details and settings** section, you can select permissions for your dedicated database user.

Expand All @@ -101,10 +103,8 @@

## Additional resources {#additional-resources}

For more on setting up an integration with Estuary, see Estuary's documentation:
For more on setting up a ClickPipe integration with Estuary, see Estuary's documentation:

* Reference Estuary's [ClickHouse materialization docs](https://docs.estuary.dev/reference/Connectors/materialization-connectors/Dekaf/clickhouse/).
* Reference Estuary's [ClickHouse materialization docs](https://docs.estuary.dev/reference/Connectors/materialization-connectors/Dekaf/clickhouse/) for the ClickPipes integration.

* Estuary exposes data as Kafka messages using **Dekaf**. You can learn more about Dekaf [here](https://docs.estuary.dev/guides/dekaf_reading_collections_from_kafka/).

* To see a list of sources that you can stream into ClickHouse with Estuary, check out [Estuary's capture connectors](https://docs.estuary.dev/reference/Connectors/capture-connectors/).
44 changes: 44 additions & 0 deletions docs/integrations/data-ingestion/etl-tools/estuary/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
sidebar_label: 'Estuary'
slug: /integrations/estuary
description: 'Stream SaaS, database, and other sources into ClickHouse with an Estuary integration'
title: 'Connect Estuary with ClickHouse'
doc_type: 'guide'
integration:
- support_level: 'partner'
- category: 'data_ingestion'
- website: 'https://estuary.dev'
keywords: ['estuary', 'data ingestion', 'etl', 'pipeline', 'data integration']
---

import PartnerBadge from '@theme/badges/PartnerBadge';

# Connect Estuary with ClickHouse

<PartnerBadge/>

[Estuary](https://estuary.dev/) is a right-time data platform that flexibly combines real-time and batch data in simple-to-setup ETL pipelines. With enterprise-grade security and deployment options, Estuary unlocks durable data flows from SaaS, database, and streaming sources to a variety of destinations, including ClickHouse.

Check notice on line 20 in docs/integrations/data-ingestion/etl-tools/estuary/index.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Uppercase

Suggestion: Instead of uppercase for 'ETL', use lowercase or backticks (`) if possible. Otherwise, ask a Technical Writer to add this word or acronym to the rule's exception list.

Estuary provides two main ways to integrate with ClickHouse:
* [Directly connect to your ClickHouse database](/integrations/estuary/native).
* [Connect via Kafka ClickPipes](/integrations/estuary/clickpipes).

In both cases, Estuary handles data capture and movement. You don't need to maintain your own Kafka ecosystem or other infrastructure.

## When to choose each integration {#choose-integration-type}

Estuary's [direct ClickHouse materialization](/integrations/estuary/native) is recommended for most use cases. It is specifically designed to integrate with ClickHouse's native protocol and supports self-hosted deployments as well as ClickHouse Cloud instances.

Check notice on line 30 in docs/integrations/data-ingestion/etl-tools/estuary/index.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Wordy

Suggestion: Use 'and' instead of 'as well as'.

Check notice on line 30 in docs/integrations/data-ingestion/etl-tools/estuary/index.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Contractions

Suggestion: Use 'it's' instead of 'It is'.

Opt for the [ClickPipe integration](/integrations/estuary/clickpipes) instead if you specifically want to manage your pipelines via ClickPipes. This allows you to handle incoming data like Kafka messages.

## Additional resources {#additional-resources}

For more on setting up an integration with Estuary, see Estuary's documentation:

* [Explore Estuary's capabilities](https://docs.estuary.dev/).

* See reference documentation for Estuary's [direct ClickHouse materialization connector](https://docs.estuary.dev/reference/Connectors/materialization-connectors/ClickHouse/).

* See reference documentation for Estuary's [Kafka ClickPipe integration](https://docs.estuary.dev/reference/Connectors/materialization-connectors/Dekaf/clickhouse/).

* To see a list of sources that you can stream into ClickHouse with Estuary, check out [Estuary's capture connectors](https://docs.estuary.dev/reference/Connectors/capture-connectors/).
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
sidebar_label: 'Direct Materialization Connector'
slug: /integrations/estuary/native
description: 'Integrate between Estuary and ClickHouse with a connector using the native protocol'
title: 'Direct Materialization from Estuary to ClickHouse'
doc_type: 'guide'
integration:
- support_level: 'partner'
- category: 'data_ingestion'
- website: 'https://estuary.dev'
keywords: ['estuary', 'data ingestion', 'etl', 'pipeline', 'data integration']
---

import PartnerBadge from '@theme/badges/PartnerBadge';

# Estuary to ClickHouse Direct Materialization

Check failure on line 16 in docs/integrations/data-ingestion/etl-tools/estuary/native-protocol.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Headings

'Estuary to ClickHouse Direct Materialization' should use sentence-style capitalization.

<PartnerBadge/>

Estuary provides a direct materialization connector with ClickHouse that uses ClickHouse's [native protocol](/interfaces/tcp) and [native format](/interfaces/formats/Native).

This allows Estuary to:
* Materialize data to both self-hosted and ClickHouse Cloud instances
* Automatically handle tasks like table creation and schema evolution
* Support soft or hard deletes
* Use `ReplacingMergeTree` for standard merge updates or `MergeTree` for delta updates
* Provide exactly-once delivery

See also [Estuary's Kafka ClickPipe integration](/integrations/estuary/clickpipes) for a ClickPipe workflow.

## Setup guide {#setup-guide}

**Prerequisites**

* An [Estuary account](https://dashboard.estuary.dev/register)
* One or more [**captures**](https://docs.estuary.dev/concepts/captures/) in Estuary that pull data from your desired sources
* A ClickHouse instance, self-hosted or Cloud account
* A ClickHouse database user with credentials

<VerticalStepper headerLevel="h3">

### Configure ClickHouse for integration {#1-configure-clickhouse}

To set up Estuary's ClickHouse connector, you will need to gather some information from your ClickHouse instance and configure user permissions.

1. Copy your database's host endpoint.

For the port, use **9440** if TLS is enabled or **9000** if TLS is disabled.

Together, the host and port will form the **address** you need to provide to Estuary.

2. Grant permissions to the database user that Estuary will access.

To automatically create and manage tables for you, Estuary will need `CREATE`, `SELECT`, `INSERT`, etc permissions on your target database as well as permissions for metadata discovery and partition management.

Check notice on line 54 in docs/integrations/data-ingestion/etl-tools/estuary/native-protocol.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Wordy

Suggestion: Use 'and' instead of 'as well as'.

You can grant all required permissions by running these SQL commands, replacing `<database>` and `<user>` with your own information:

```sql
-- Target database access: CREATE TABLE, DROP TABLE, SELECT, INSERT, TRUNCATE, etc.
GRANT ALL ON <database>.* TO <user>;

-- System table access for metadata discovery and partition management.
-- These are NOT covered by the database grant above.
GRANT SELECT ON system.columns TO <user>;
GRANT SELECT ON system.parts TO <user>;
GRANT SELECT ON system.tables TO <user>;
```

3. Optionally restrict user system access to only the target database.

You can do so with row-level policies. For example:

```sql
CREATE ROW POLICY estuary_columns ON system.columns FOR SELECT USING database = '<database>' TO <user>;
CREATE ROW POLICY estuary_parts ON system.parts FOR SELECT USING database = '<database>' TO <user>;
CREATE ROW POLICY estuary_tables ON system.tables FOR SELECT USING database = '<database>' TO <user>;
```

You can then move to Estuary to finish setup.

### Create an Estuary materialization {#2-create-an-estuary-materialization}

1. In Estuary's dashboard, go to the [Destinations](https://dashboard.estuary.dev/materializations) page.

2. Click **+ New Materialization**.

3. Select the **ClickHouse** connector.

4. Fill out the **Materialization Details** section.

* Provide a unique name for your materialization
* Choose a data plane (cloud provider and region)

5. Fill out **Endpoint Config** details so Estuary can connect to your ClickHouse instance.

* **Address:** the host and port of your instance
* **Database:** target database name
* **Authentication:** username and password for the database user

You can also configure optional settings, such as whether to use hard deletes and the SSL mode to use.

### Configure source collections {#3-configure-source-collections}

Choose which source(s) you'd like to materialize into ClickHouse in the **Source Collections** section.

Check warning on line 104 in docs/integrations/data-ingestion/etl-tools/estuary/native-protocol.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.BadPlurals

Rewrite '(s)' to be plural without parentheses.

1. Link an existing **capture** or add individual data collections to materialize to ClickHouse.

2. Select a data collection from the list to configure further if necessary. Customization options include:

* Choose a different table name for the collection
* Select merge behavior for the collection (whether to use delta updates mode)
* Customize field selection behavior to control which fields are materialized

3. Once you're happy with how data will be materialized to ClickHouse, click **Next** and **Save and Publish**.

Estuary will start backfilling data from the selected collections to ClickHouse and then stream updates as they occur.

</VerticalStepper>

## Additional resources {#additional-resources}

For more on setting up a ClickHouse connector with Estuary, see Estuary's documentation:

* Reference Estuary's [ClickHouse materialization docs](https://docs.estuary.dev/reference/Connectors/materialization-connectors/ClickHouse/).

* Besides the UI-based workflow provided in these instructions, you can also manage pipeline setup with Estuary via CLI. See Estuary's [guides on `flowctl`](https://docs.estuary.dev/guides/flowctl/ci-cd/) for more on working with Estuary programmatically.
16 changes: 15 additions & 1 deletion sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -1149,7 +1149,21 @@ const sidebars = {
],
},
'integrations/data-ingestion/etl-tools/dlt-and-clickhouse',
'integrations/data-ingestion/etl-tools/estuary',
{
type: 'category',
label: 'Estuary',
className: 'top-nav-item',
collapsed: true,
collapsible: true,
link: {
type: 'doc',
id: 'integrations/data-ingestion/etl-tools/estuary/index',
},
items: [
'integrations/data-ingestion/etl-tools/estuary/native-protocol',
'integrations/data-ingestion/etl-tools/estuary/clickpipes',
],
},
{
type: 'category',
label: 'Fivetran',
Expand Down
Loading