GlassFlow is an open source streaming ETL and stream processor designed to simplify data pipeline creation and management from multiple sources into ClickHouse. It provides a powerful, user-friendly interface for building and managing real-time data pipelines with built-in support for deduplication and temporal joins.
GlassFlow handles late-arriving events, ensures exactly-once correctness, and scales with high-throughput data. It delivers accurate, low-latency results from streaming, telemetry or other data types without compromising simplicity or performance. The tool's intuitive web interface makes it easy to configure and monitor pipelines, while its robust architecture ensures reliable data processing.
-
Deduplication:
- Real-time deduplication of Kafka or OpenTelemetry streams before ingestion into ClickHouse
- Configurable time windows up to 7 days for deduplication
- Simple configuration of deduplication keys and time windows
- One-click setup for deduplicated data pipelines
- Prevents duplicate data from reaching ClickHouse
-
Built-in Data Source Connectors:
- Automatic data extraction from multiple sources
- Seamless integration with Kafka clusters, OTel Connectors and other sources
- No manual data pulling required
- Native support for JSON data types
-
Optimized ClickHouse Sink:
- Native ClickHouse connection for maximum performance
- Configurable batch sizes for efficient data ingestion
- Adjustable wait times for optimal throughput
- Built-in retry mechanisms
- Automatic schema detection and management
- Full support for JSON data types in ClickHouse
-
User-Friendly Interface: Web-based UI for pipeline configuration and management
-
Local Development: Includes demo setup with local Kafka and ClickHouse instances
-
Docker Support: Easy deployment using Docker and docker-compose
-
Self-Hosted: Open-source solution that can be self-hosted in your infrastructure
To get started with GlassFlow, visit our main repository at glassflow/clickhouse-etl. The repository contains:
- Complete documentation
- Quick start guide
- Example configurations
- Docker setup instructions
- API documentation
Clone the repository to get started:
git clone https://github.com/glassflow/clickhouse-etl.git