Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/best-practices/minimize_optimize_joins.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ In general, denormalize when:
- Tables change infrequently or when batch refreshes are acceptable.
- Relationships aren't many-to-many or not excessively high in cardinality.
- Only a limited subset of the columns will be queried, i.e. certain columns can be excluded from denormalization.
- You have the capability to shift processing out of ClickHouse into upstream systems like Flink, where real-time enrichment or flattening can be managed.
- You have the capability to shift processing out of ClickHouse into upstream systems like [Flink](/integrations/data-ingestion/apache-flink/flink-connector.md), where real-time enrichment or flattening can be managed.

Not all data needs to be denormalized — focus on the attributes that are frequently queried. Also consider [materialized views](/best-practices/use-materialized-views) to incrementally compute aggregates instead of duplicating entire sub-tables. When schema updates are rare and latency is critical, denormalization offers the best performance trade-off.

Expand Down
4 changes: 2 additions & 2 deletions docs/data-modeling/denormalization.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ In general, we would recommend denormalizing in the following cases:

All information doesn't need to be denormalized - just the key information that needs to be frequently accessed.

The denormalization work can be handled in either ClickHouse or upstream e.g. using Apache Flink.
The denormalization work can be handled in either ClickHouse or upstream e.g. using [Apache Flink](/integrations/data-ingestion/apache-flink/flink-connector.md).

## Avoid denormalization on frequently updated data {#avoid-denormalization-on-frequently-updated-data}

Expand Down Expand Up @@ -371,4 +371,4 @@ Users have several options for orchestrating this in ClickHouse, assuming a peri

### Streaming {#streaming}

You may alternatively wish to perform this outside of ClickHouse, prior to insertion, using streaming technologies such as [Apache Flink](https://flink.apache.org/). Alternatively, incremental [materialized views](/guides/developer/cascading-materialized-views) can be used to perform this process as data is inserted.
You may alternatively wish to perform this outside of ClickHouse, prior to insertion, using streaming technologies such as [Apache Flink](/integrations/data-ingestion/apache-flink/flink-connector.md). Alternatively, incremental [materialized views](/guides/developer/cascading-materialized-views) can be used to perform this process as data is inserted.
437 changes: 437 additions & 0 deletions docs/integrations/data-ingestion/apache-flink/flink-connector.md

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions docs/integrations/data-ingestion/data-ingestion-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,16 @@
|------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Airbyte](/integrations/airbyte) | An open-source data integration platform. It allows the creation of ELT data pipelines and is shipped with more than 140 out-of-the-box connectors. |
| [Apache Spark](/integrations/apache-spark) | A multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters |
| [Apache Flink](https://github.com/ClickHouse/flink-connector-clickhouse) | Real-time data ingestion and processing into ClickHouse through Flink's DataStream API with support for batch writes |
| [Apache Flink](/integrations/apache-flink) | Real-time data ingestion and processing into ClickHouse through Flink's DataStream API with support for batch writes |
| [Amazon Glue](/integrations/glue) | A fully managed, serverless data integration service provided by Amazon Web Services (AWS) simplifying the process of discovering, preparing, and transforming data for analytics, machine learning, and application development. |
| [Artie](/integrations/artie) | A fully managed real-time data streaming platform that replicates production data into ClickHouse, unlocking customer-facing analytics, operational workflows, and Agentic AI in production. |
| [Azure Synapse](/integrations/azure-synapse) | A fully managed, cloud-based analytics service provided by Microsoft Azure, combining big data and data warehousing to simplify data integration, transformation, and analytics at scale using SQL, Apache Spark, and data pipelines. |
| [Azure Data Factory](/integrations/azure-data-factory) | A cloud-based data integration service that enables you to create, schedule, and orchestrate data workflows at scale. |
| [Azure Data Factory](/integrations/azure-data-factory) | A cloud-based data integration service that enables you to create, schedule, and orchestrate data workflows at scale. |
| [Apache Beam](/integrations/apache-beam) | An open-source, unified programming model that enables developers to define and execute both batch and stream (continuous) data processing pipelines. |
| [BladePipe](/integrations/bladepipe) | A real-time end-to-end data integration tool with sub-second latency, boosting seamless data flow across platforms. |
| [BladePipe](/integrations/bladepipe) | A real-time end-to-end data integration tool with sub-second latency, boosting seamless data flow across platforms. |
| [dbt](/integrations/dbt) | Enables analytics engineers to transform data in their warehouses by simply writing select statements. |
| [dlt](/integrations/data-ingestion/etl-tools/dlt-and-clickhouse) | An open-source library that you can add to your Python scripts to load data from various and often messy data sources into well-structured, live datasets. |
| [Estuary](/integrations/estuary) | A right-time data platform that enables millisecond-latency ETL pipelines with flexible deployment options. |
| [Estuary](/integrations/estuary) | A right-time data platform that enables millisecond-latency ETL pipelines with flexible deployment options. |

Check notice on line 27 in docs/integrations/data-ingestion/data-ingestion-index.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Uppercase

Suggestion: Instead of uppercase for 'ETL', use lowercase or backticks (`) if possible. Otherwise, ask a Technical Writer to add this word or acronym to the rule's exception list.
| [Fivetran](/integrations/fivetran) | An automated data movement platform moving data out of, into and across your cloud data platforms. |
| [NiFi](/integrations/nifi) | An open-source workflow management software designed to automate data flow between software systems. |
| [Vector](/integrations/vector) | A high-performance observability data pipeline that puts organizations in control of their observability data. |
44 changes: 43 additions & 1 deletion scripts/aspell-dict-file.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1294,4 +1294,46 @@ groupby
skipna
--docs/cloud/guides/infrastructure/01_deployment_options/byoc/03_onboarding/05_azure_private_preview.md--
SignalR
provisioner
provisioner
--docs/integrations/data-ingestion/apache-flink/flink-connector.md--
AsyncSinkBase
BigDecimal
BigInteger
ClientConfigProperties
DataStream
DataWriter
ElementConverter
Flink's
Kryo
LocalDate
LocalDateTime
ZonedDateTime
actualBytesPerBatch
actualRecordsPerBatch
actualTimeInBuffer
numBytesSend
numOfDroppedBatches
numOfDroppedRecords
numRecordSend
numRequestSubmitted
TaskManager
totalBatchRetries
triggeredByMaxBatchSizeCounter
triggeredByMaxBatchSizeInBytesCounter
triggeredByMaxTimeInBufferMSCounter
writeArray
writeBoolean
writeDate
writeDateTime
writeDecimal
writeFailureLatencyHistogram
writeFixedString
writeFloat
writeInt
writeIntUUID
writeJSON
writeLatencyHistogram
writeMap
writeString
writeTuple
writeUInt
1 change: 1 addition & 0 deletions scripts/aspell-ignore/en/aspell-dict.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3913,6 +3913,7 @@ unnest
unoptimized
unparsed
unpooled
unprocessable
unrealiable
unreplicated
unresolvable
Expand Down
1 change: 1 addition & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -1029,6 +1029,7 @@ const sidebars = {
'integrations/data-ingestion/apache-spark/spark-jdbc',
],
},
'integrations/data-ingestion/apache-flink/flink-connector',
'integrations/data-ingestion/aws-glue/index',
{
type: 'category',
Expand Down
2 changes: 1 addition & 1 deletion static/integrations-fallback.json

Large diffs are not rendered by default.

Loading