Skip to content

input_metric: split large metric contexts into chunk-sized batches#11554

Open
thewillyhuman wants to merge 1 commit intofluent:masterfrom
thewillyhuman:gfacundo/fix-metric-chunk-splitting
Open

input_metric: split large metric contexts into chunk-sized batches#11554
thewillyhuman wants to merge 1 commit intofluent:masterfrom
thewillyhuman:gfacundo/fix-metric-chunk-splitting

Conversation

@thewillyhuman
Copy link

@thewillyhuman thewillyhuman commented Mar 13, 2026

Addresses #9653

Summary

  • Add size-aware batch splitting to flb_input_metrics_append() so large metric payloads are automatically split into multiple chunks respecting the 2MB chunk size limit
  • Add unit tests for the splitting logic (6 test cases)

Problem

When using input plugins that collect a large number of metrics in a single operation (e.g., prometheus_scrape with 200K+ metrics), flb_input_metrics_append() encodes the entire CMetrics context into one monolithic msgpack buffer and writes it as a single chunk.

This chunk can far exceed the 2MB chunk size limit (FLB_INPUT_CHUNK_FS_MAX_SIZE), causing downstream plugins — notably out_opentelemetry — to fail when processing the oversized payload.

The existing 2MB chunk limit only locks a chunk after it is already oversized (preventing future writes to it), but does nothing to prevent a single write from exceeding it in the first place. There was simply no mechanism to split a large metric payload across multiple chunks.

This did not affect in_prometheus_remote_write because the sender controls batch size — each HTTP POST is a separate flb_input_metrics_append() call that naturally produces small chunks. But any input that collects all metrics in one shot (prometheus_scrape, node_exporter_metrics, etc.) is affected.

Root cause

flb_input_metrics_append() unconditionally:

  1. Encodes the full cmt context into a single msgpack buffer via cmt_encode_msgpack_create()
  2. Calls flb_input_chunk_append_raw() with the entire buffer as one unit
  3. Passes n_records = 0 for metrics, so the chunk system has no way to make record-level decisions

No size checking, no splitting, no batching existed.

Fix

A size-aware splitting path is added to input_metrics_append() in src/flb_input_metric.c:

Fast path (unchanged behavior): Encode the full cmt context. If the encoded size is <= 2MB, append as-is. Zero behavioral change for small payloads.

Slow path (new): When encoded size exceeds 2MB:

  1. Count total metric families across all 6 type lists (counters, gauges, untypeds, histograms, exp_histograms, summaries)
  2. Estimate families_per_batch proportionally: total_families * chunk_limit / encoded_size (minimum 1, using 64-bit arithmetic to avoid overflow on 32-bit platforms)
  3. Iterate all families using cmt_cat_*() to copy each into a temporary cmt batch context
  4. When a batch reaches the target family count, encode to msgpack and append as its own chunk
  5. Flush any remaining partial batch at the end

Static labels are copied to each batch. The splitting granularity is at the metric family level — individual families are never broken apart. If a single family still exceeds the chunk limit, a warning is logged since no further splitting is possible.

The fix is in flb_input_metrics_append() so it benefits all metric inputs, not just prometheus_scrape.

New helper functions

Function Purpose
copy_static_labels() Copies static labels from source to batch cmt context
input_metrics_split_and_append() Core splitting logic: counts families, estimates batch size, iterates and flushes batches

Design decisions

  • 64-bit arithmetic (uint64_t) for the batch size numerator to avoid overflow on 32-bit platforms (e.g., embedded ARM where Fluent Bit also runs)
  • cfl_list_foreach_safe used for list iteration as a defensive measure in case cmt_cat_*() functions ever modify source list nodes
  • goto-based centralized error cleanup instead of return from macro, making resource management explicit and debugger-friendly
  • Partial-append-on-error: if a mid-batch failure occurs, previously appended batches are NOT rolled back. This is acceptable because metrics consumers (Prometheus, OTLP) handle partial scrapes gracefully
  • No metadata deep-copy: internal/external kvlists are not copied to batches to avoid complexity; the primary consumers do not depend on per-chunk metadata

Testing

All 81 internal tests pass (80 existing + 1 new test file).

New test file tests/internal/input_metric.c with 6 test cases:

Test What it verifies
static_labels_copy Static labels are correctly copied between cmt contexts
cat_single_counter cmt_cat_counter produces identical encoded output
batch_split_preserves_families 20 families split into batches of 5 -> 4 batches, all 20 recovered via encode-decode round-trip
batch_split_mixed_types Mixed counters + gauges split correctly across batch boundaries
empty_context Empty cmt has 0 families in all 6 lists
families_per_batch_calculation Batch size formula correctness including edge cases (single family, large counts)

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Automatic splitting of oversized metric batches to stay within encoding limits, preventing dropped data.
    • Preservation of static labels and metric metadata across split batches so metrics remain consistent.
    • Robust per-batch processing that continues appending successfully encoded batches even if a later batch fails.
  • Tests

    • Comprehensive unit tests covering batch splitting, mixed metric types, label/metadata preservation, empty contexts, and batch-size calculations.

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

Adds internal batch-splitting logic to the input metrics path to handle oversized encoded metric batches by copying static labels/metadata and iteratively creating, encoding, and appending smaller batches; also adds unit tests exercising copy, splitting, mixed-type batching, and size calculations.

Changes

Cohort / File(s) Summary
Core Batching Implementation
src/flb_input_metric.c
Introduces static helpers to copy static labels/metadata, adds input_metrics_split_and_append to build/encode/append multiple batches when encoded size exceeds chunk limit, refactors input_metrics_append to select fast/slow paths, adds PROCESS_METRIC_LIST macro, adjusts out_context lifecycle and logging.
Test Build
tests/internal/CMakeLists.txt
Adds tests/internal/input_metric.c to UNIT_TESTS_FILES so new tests are built.
Unit Tests
tests/internal/input_metric.c
New tests for static label copying, single-family encode/decode, batch splitting (homogeneous and mixed types), empty context behavior, and families-per-batch size calculations using msgpack validation.

Sequence Diagram

sequenceDiagram
    participant Input as Input Metrics
    participant Append as input_metrics_append
    participant Fast as Fast Path
    participant Split as input_metrics_split_and_append
    participant Builder as Batch Builder (tmp cmt)
    participant Encoder as Encoder (msgpack)
    participant Out as Output Context

    Input->>Append: request append metric batch
    Append->>Encoder: attempt encode whole batch
    alt encoded size <= chunk
        Encoder-->>Append: encoded data
        Append->>Fast: fast append to Out
        Fast->>Out: append chunk
        Out-->>Append: success
    else encoded size > chunk
        Encoder-->>Append: over limit
        Append->>Split: split and append
        Split->>Builder: compute families_per_batch & create batch contexts
        loop for each batch
            Builder->>Encoder: encode batch families
            Encoder-->>Builder: encoded chunk
            Builder->>Out: append encoded chunk (copy labels/metadata)
            Out-->>Builder: success/error (no rollback)
            Builder->>Builder: destroy batch context
        end
        Split-->>Append: return aggregated result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

backport to v4.2.x

Suggested reviewers

  • edsiper
  • cosmo0920

"I’m a rabbit, nibbling code so neat,
batches split in tidy rows, a treat,
labels hop along, preserved and bright,
tests verify each little flight,
now metrics dance into the night." 🐇✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding batch splitting for oversized metric contexts to respect chunk size limits.
Docstring Coverage ✅ Passed Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b580c0b001

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

"could not create batch cmt context"); \
goto error; \
} \
ret = copy_static_labels(batch, src); \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve metadata when splitting oversized metric contexts

Each split batch is created from cmt_create() and only static labels are copied (copy_static_labels), so the original internal_metadata/external_metadata are dropped whenever the slow-path splitter is used. That is a functional regression for large contexts because OTLP encoding relies on cmt->external_metadata (resource/scope/data-point metadata) to preserve grouping and attributes, so oversized OpenTelemetry payloads lose metadata that is preserved for non-split payloads.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added copy_metadata() helper that deep-copies both internal_metadata and external_metadata kvlists to each batch context. It iterates each kvpair and copies by variant type (string, int64, uint64, double, bool, bytes). This is now called alongside copy_static_labels() when creating each batch, so OTLP resource/scope metadata is preserved across splits.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/internal/input_metric.c (1)

127-223: The split tests don't exercise the production path.

Both tests hand-roll the batching loop instead of calling flb_input_metrics_append(), and they only count recovered families. That misses regressions in the real helper such as dropped static labels/metadata, append-path cleanup, or batches that still exceed FLB_INPUT_CHUNK_FS_MAX_SIZE. Please add at least one end-to-end test with uneven family sizes and assert every emitted chunk stays within the limit.

Also applies to: 231-355

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/internal/input_metric.c` around lines 127 - 223, The test currently
hand-rolls batching logic in test_batch_split_preserves_families() instead of
exercising the real production helper; update or add an end-to-end test that
builds a src cmt with uneven family sizes and then calls
flb_input_metrics_append() (the real append/splitting path) to produce chunks,
verify every emitted chunk size is <= FLB_INPUT_CHUNK_FS_MAX_SIZE and that
static labels/metadata from the original counters are preserved in each decoded
chunk (use cmt_decode_msgpack_create to inspect chunks), and assert the total
recovered families equals the source; target symbols: flb_input_metrics_append,
FLB_INPUT_CHUNK_FS_MAX_SIZE,
cmt_encode_msgpack_create/cmt_decode_msgpack_create, and the test function
test_batch_split_preserves_families (or add a new test function) so the real
append logic is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/flb_input_metric.c`:
- Around line 149-158: The new batch creation path (inside the batch == NULL
branch where cmt_create() is called and copy_static_labels(batch, src) is
invoked) currently drops cmt-level internal/external metadata present on src
before cmt_cat_*() is used, causing payload divergence with the fast path; fix
this by copying the src cmt-level KV lists into the new batch after cmt_create()
(or detect their presence and fail early) — add a call to copy the
internal/external cmt metadata into batch (or bail out with flb_plg_error and
goto error when such metadata exists) so that batch contains the same cmt-level
metadata as src prior to any cmt_cat_*() operations.
- Around line 117-125: The current heuristic computes families_per_batch from a
global average (numerator/total_encoded_size) which can still produce mt_size >
FLB_INPUT_CHUNK_FS_MAX_SIZE for skewed family sizes; replace that estimate with
an actual-encoded-size loop: start with families_per_batch = 0 (but ensure at
least 1), then iteratively encode/measure the batch payload (using the same
encoding path that produces mt_size) adding one family at a time until adding
the next family would exceed FLB_INPUT_CHUNK_FS_MAX_SIZE, and use that count for
the batch; reference and remove reliance on numerator/total_encoded_size and
ensure the mt_size check is done proactively before appending the chunk rather
than only warning after append.

In `@tests/internal/input_metric.c`:
- Around line 105-112: The test only asserts size_src == size_dst which doesn't
prove cmt_cat_counter() copied data correctly; instead decode and compare
payloads or decoded structures: after calling cmt_encode_msgpack_create(src,
&buf_src, &size_src) and cmt_encode_msgpack_create(dst, &buf_dst, &size_dst) use
the msgpack decoder used in tests (or a deterministic byte-for-byte comparison
if the encoder is deterministic) to decode buf_dst and buf_src and assert that
the decoded family/counter entries (labels, sample values, timestamps) are
equal; alternatively, decode buf_dst and validate the specific counter
family/metric values expected from the source to ensure cmt_cat_counter()
preserved labels and samples rather than relying on size equality.

---

Nitpick comments:
In `@tests/internal/input_metric.c`:
- Around line 127-223: The test currently hand-rolls batching logic in
test_batch_split_preserves_families() instead of exercising the real production
helper; update or add an end-to-end test that builds a src cmt with uneven
family sizes and then calls flb_input_metrics_append() (the real
append/splitting path) to produce chunks, verify every emitted chunk size is <=
FLB_INPUT_CHUNK_FS_MAX_SIZE and that static labels/metadata from the original
counters are preserved in each decoded chunk (use cmt_decode_msgpack_create to
inspect chunks), and assert the total recovered families equals the source;
target symbols: flb_input_metrics_append, FLB_INPUT_CHUNK_FS_MAX_SIZE,
cmt_encode_msgpack_create/cmt_decode_msgpack_create, and the test function
test_batch_split_preserves_families (or add a new test function) so the real
append logic is exercised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8bcc8f49-d66f-4815-bd0c-29549da5f423

📥 Commits

Reviewing files that changed from the base of the PR and between acad02a and b580c0b.

📒 Files selected for processing (3)
  • src/flb_input_metric.c
  • tests/internal/CMakeLists.txt
  • tests/internal/input_metric.c

Addresses fluent#9653

Large metric scrapes (e.g., 200K+ metrics from prometheus_scrape)
were encoded as a single chunk exceeding the 2MB limit, breaking
downstream plugins like out_opentelemetry. Add a size-aware split
path to flb_input_metrics_append() that batches metric families
into chunk-sized cmt contexts using cmt_cat_*() before encoding.
Small payloads are unaffected (fast path preserved).

Signed-off-by: Guillermo Facundo Colunga <guillermo.facundo.colunga@cern.ch>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@thewillyhuman thewillyhuman force-pushed the gfacundo/fix-metric-chunk-splitting branch from b580c0b to eb84b31 Compare March 13, 2026 17:42
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/flb_input_metric.c (1)

108-111: Consider logging when unsupported kvlist types are skipped.

Silently skipping arrays and nested kvlists may cause subtle data loss if OTLP resource/scope metadata uses these types. A debug-level log could help with troubleshooting when expected metadata is missing.

💡 Suggested enhancement
         default:
-            /* Skip unsupported types (arrays, nested kvlists, references) */
-            ret = 0;
+            /* Skip unsupported types (arrays, nested kvlists, references).
+             * Log at trace level for debugging metadata preservation issues. */
+            flb_trace("[copy_kvlist] skipping unsupported type %d for key '%s'",
+                      pair->val->type, pair->key);
+            ret = 0;
             break;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/flb_input_metric.c` around lines 108 - 111, In the switch's default
branch (where it currently does "ret = 0;" and breaks) add a debug-level log
that notes an unsupported kvlist type was skipped, include context like the key
name and the type encountered; update the handling in the same function in
src/flb_input_metric.c (the default switch case that sets ret = 0) to call the
module logger at debug level before returning so operators can trace missing
OTLP resource/scope metadata without changing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/flb_input_metric.c`:
- Around line 108-111: In the switch's default branch (where it currently does
"ret = 0;" and breaks) add a debug-level log that notes an unsupported kvlist
type was skipped, include context like the key name and the type encountered;
update the handling in the same function in src/flb_input_metric.c (the default
switch case that sets ret = 0) to call the module logger at debug level before
returning so operators can trace missing OTLP resource/scope metadata without
changing behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e3355b91-6230-4e5a-9f96-09fb91a72661

📥 Commits

Reviewing files that changed from the base of the PR and between b580c0b and eb84b31.

📒 Files selected for processing (3)
  • src/flb_input_metric.c
  • tests/internal/CMakeLists.txt
  • tests/internal/input_metric.c
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/internal/CMakeLists.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant