feat(connectors): add Apache Doris sink connector#3215
Draft
ryankert01 wants to merge 1 commit intoapache:masterfrom
Draft
feat(connectors): add Apache Doris sink connector#3215ryankert01 wants to merge 1 commit intoapache:masterfrom
ryankert01 wants to merge 1 commit intoapache:masterfrom
Conversation
a9f3652 to
b5434dd
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3215 +/- ##
=============================================
- Coverage 74.46% 51.83% -22.64%
Complexity 943 943
=============================================
Files 1183 1182 -1
Lines 105866 92786 -13080
Branches 82899 69836 -13063
=============================================
- Hits 78835 48095 -30740
- Misses 24289 42123 +17834
+ Partials 2742 2568 -174
🚀 New features to boost your workflow:
|
68db081 to
cdc4af8
Compare
Sink connector that writes Iggy messages to Apache Doris via the HTTP
Stream Load API. v1 scope: JSON payloads only, HTTP Basic auth,
pre-created tables only (no DDL).
Behaviour:
- Manual 307/308 redirect following (capped at 5) so the Authorization
header survives the FE -> BE hop, which reqwest strips by default.
- Deterministic per-batch label
({prefix}-{stream}-{topic}-{partition}-{first_offset}-{last_offset})
so replays are deduplicated by Doris within label_keep_max_second.
- Response body Status field drives error classification: Success and
"Label Already Exists" -> Ok; Publish Timeout -> CannotStoreData
(transient); Fail or any unknown status -> PermanentHttpError so the
runtime DLQs the batch instead of looping.
- Optional columns / where / max_filter_ratio / batch_size / timeout
forwarded as Stream Load headers.
- Password held as secrecy::SecretString; auth header wrapped in
SecretString so Debug derivation never leaks the base64 credential.
- Client built in open() with InitError on failure; fe_url validated
there too so a bad config fails at startup rather than first batch.
Tests: 6 integration tests under core/integration/tests/connectors/doris
backed by an apache/doris all-in-one testcontainer (FE HTTP + FE MySQL).
Coverage includes happy path, 1k-row bulk, max_filter_ratio skip path,
label-replay dedupe, missing-target-table (proves no auto-create), and
the columns derived-expression header. The container must bind host:8040
1:1 because the FE 307-redirects to 127.0.0.1:8040; tests are serialized
via a 'doris' nextest test-group (max-threads = 1) so concurrent test
processes don't race for that port.
cdc4af8 to
efc7a9e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #3112
Rationale
Adds an Apache Doris sink so Iggy streams can be written into Doris for analytical querying.
What changed?
Iggy had no path to land messages in Apache Doris. A new
iggy_connector_doris_sinkcrate consumes JSON payloads and writes them via Doris's HTTP Stream Load API (PUT /api/{db}/{table}/_stream_load).The non-obvious bits the connector handles: re-attaching
Authorizationacross the FE→BE 307 redirect (whichreqweststrips by default), parsing the JSONStatusbody to classify success /Label Already Exists/ transient (Publish Timeout, 5xx) / permanent (Fail, 4xx, unknown), and emitting a deterministic per-batch label so replays are deduplicated by Doris's label-keep window. v1 is sink-only, JSON-only, HTTP Basic auth only, and assumes pre-created tables — no DDL.Local Execution
AI Usage
quickwit_sink/influxdb_sink, testcontainer fixture, and iteration on the Stream Load redirect +Status-body classification.apache/doris:doris-all-in-one-2.1.0container, covering happy path, 1k-row bulk,max_filter_ratio, label-replay dedupe, missing-target-table, andcolumnsderived expressions; row state verified via the MySQL frontend.