Skip to content

feat(connectors): add Apache Doris sink connector#3215

Draft
ryankert01 wants to merge 1 commit intoapache:masterfrom
ryankert01:feat/doris-sink-connector
Draft

feat(connectors): add Apache Doris sink connector#3215
ryankert01 wants to merge 1 commit intoapache:masterfrom
ryankert01:feat/doris-sink-connector

Conversation

@ryankert01
Copy link
Copy Markdown
Member

@ryankert01 ryankert01 commented May 5, 2026

Which issue does this PR close?

Closes #3112

Rationale

Adds an Apache Doris sink so Iggy streams can be written into Doris for analytical querying.

What changed?

Iggy had no path to land messages in Apache Doris. A new iggy_connector_doris_sink crate consumes JSON payloads and writes them via Doris's HTTP Stream Load API (PUT /api/{db}/{table}/_stream_load).

The non-obvious bits the connector handles: re-attaching Authorization across the FE→BE 307 redirect (which reqwest strips by default), parsing the JSON Status body to classify success / Label Already Exists / transient (Publish Timeout, 5xx) / permanent (Fail, 4xx, unknown), and emitting a deterministic per-batch label so replays are deduplicated by Doris's label-keep window. v1 is sink-only, JSON-only, HTTP Basic auth only, and assumes pre-created tables — no DDL.

Local Execution

  • Passed
  • Pre-commit hooks ran. Pre-push C#/Java hooks skipped (no dotnet/JDK locally; contribution is Rust-only).

AI Usage

  1. Claude Code (Anthropic).
  2. Crate scaffolding against quickwit_sink / influxdb_sink, testcontainer fixture, and iteration on the Stream Load redirect + Status-body classification.
  3. 14 unit tests + 6 integration tests against a real apache/doris:doris-all-in-one-2.1.0 container, covering happy path, 1k-row bulk, max_filter_ratio, label-replay dedupe, missing-target-table, and columns derived expressions; row state verified via the MySQL frontend.
  4. Yes.

@ryankert01 ryankert01 marked this pull request as draft May 5, 2026 17:58
@ryankert01 ryankert01 changed the title feat(connectors): add Apache Doris sink connector (#2753) feat(connectors): add Apache Doris sink connector May 5, 2026
@ryankert01 ryankert01 force-pushed the feat/doris-sink-connector branch from a9f3652 to b5434dd Compare May 5, 2026 18:17
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 85.40925% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.83%. Comparing base (fbf3885) to head (efc7a9e).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
core/connectors/sinks/doris_sink/src/lib.rs 85.40% 34 Missing and 7 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3215       +/-   ##
=============================================
- Coverage     74.46%   51.83%   -22.64%     
  Complexity      943      943               
=============================================
  Files          1183     1182        -1     
  Lines        105866    92786    -13080     
  Branches      82899    69836    -13063     
=============================================
- Hits          78835    48095    -30740     
- Misses        24289    42123    +17834     
+ Partials       2742     2568      -174     
Components Coverage Δ
Rust Core 45.52% <85.40%> (-30.22%) ⬇️
Java SDK 60.14% <ø> (ø)
C# SDK 69.07% <ø> (-0.31%) ⬇️
Python SDK 81.43% <ø> (ø)
Node SDK 91.53% <ø> (ø)
Go SDK 39.60% <ø> (ø)
Files with missing lines Coverage Δ
core/connectors/sinks/doris_sink/src/lib.rs 85.40% <85.40%> (ø)

... and 299 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ryankert01 ryankert01 force-pushed the feat/doris-sink-connector branch 2 times, most recently from 68db081 to cdc4af8 Compare May 6, 2026 03:05
Sink connector that writes Iggy messages to Apache Doris via the HTTP
Stream Load API. v1 scope: JSON payloads only, HTTP Basic auth,
pre-created tables only (no DDL).

Behaviour:
- Manual 307/308 redirect following (capped at 5) so the Authorization
  header survives the FE -> BE hop, which reqwest strips by default.
- Deterministic per-batch label
  ({prefix}-{stream}-{topic}-{partition}-{first_offset}-{last_offset})
  so replays are deduplicated by Doris within label_keep_max_second.
- Response body Status field drives error classification: Success and
  "Label Already Exists" -> Ok; Publish Timeout -> CannotStoreData
  (transient); Fail or any unknown status -> PermanentHttpError so the
  runtime DLQs the batch instead of looping.
- Optional columns / where / max_filter_ratio / batch_size / timeout
  forwarded as Stream Load headers.
- Password held as secrecy::SecretString; auth header wrapped in
  SecretString so Debug derivation never leaks the base64 credential.
- Client built in open() with InitError on failure; fe_url validated
  there too so a bad config fails at startup rather than first batch.

Tests: 6 integration tests under core/integration/tests/connectors/doris
backed by an apache/doris all-in-one testcontainer (FE HTTP + FE MySQL).
Coverage includes happy path, 1k-row bulk, max_filter_ratio skip path,
label-replay dedupe, missing-target-table (proves no auto-create), and
the columns derived-expression header. The container must bind host:8040
1:1 because the FE 307-redirects to 127.0.0.1:8040; tests are serialized
via a 'doris' nextest test-group (max-threads = 1) so concurrent test
processes don't race for that port.
@ryankert01 ryankert01 force-pushed the feat/doris-sink-connector branch from cdc4af8 to efc7a9e Compare May 6, 2026 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Apache Doris connector

1 participant