enhancement(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking#25372
Conversation
tracking_scope setting f…8044081 to
9a136f2
Compare
| Some((metric_namespace, metric_name.clone())) | ||
| } else { | ||
| None | ||
| let metric_key = match self.config.tracking_scope { |
There was a problem hiding this comment.
One concern about tracking_scope: per_metric: the accepted_tags map can only grow (no cap, TTL, or eviction). In per_metric mode every distinct (namespace, name) seen on the wire becomes a permanent bucket, and within each bucket every tag key allocates its own AcceptedTagValueSet.
The pre-existing code had this growth pattern too, but it was bounded by the user's per_metric_limits config. With per_metric scope the bound becomes dynamic and controlled by upstream metric names, so if a source emits high cardinality metric names (an anti-pattern but one we see in the wild), the transform's memory grows monotonically for the lifetime of the process.
There are a few options here but adding a max_tracked_metrics (or similar) knob seems reasonable. When we hit this limit, we can reject new metric IDs. I am open to discussing an LRU strategy too.
There was a problem hiding this comment.
@pront I'd say technically this problem also existed before b/c with even a global tag counter the number of tags being tracked is also still unbounded (though definitely agree it's more likely to be an issue with this new per metric tracking scope)
Will go with a "max tracked tags" approach that can be used for either tracking scope that keeps track of the max number of items that can be tracked in total (either [metric, tag] pairs in the case of per metric scope or just [None, tag] in the case of global tracking scope).
Will leave any strategies like LRU cache out for now
There was a problem hiding this comment.
Also will leave this field as optional for those that do not want to set it
9a136f2 to
b402066
Compare
…or per-metric vs global tag tracking When metrics do not have an explicit `per_metric_limits` entry, their tag values were always pooled into a single shared bucket. The new `tracking_scope` setting lets users opt into per-metric tracking buckets instead, providing isolation at the cost of higher memory. Default is `global` (current behavior); `per_metric` gives every distinct (namespace, name) its own bucket regardless of `per_metric_limits` membership. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
b402066 to
445abd6
Compare
Summary
When metrics do not have an explicit
per_metric_limitsentry, their tag values were always pooled into a single shared bucket. This can lead to some of the following example scenarios:metric1andmetric2have thehosttag, butmetric1has a high cardinality for thehosttag (above the limit), thehosttag will be dropped onmetric2(even if the tag onmetric2only has 1-2 cardinality)host tag, and each tag has 1-2 unique values per metric, then a cardinality limit of 50 will drop this tag across all metrics.The new
tracking_scopesetting lets users opt into per-metric tracking buckets instead, providing isolation at the cost of higher memory.Default is
global(current behavior);per_metricgives every distinct (namespace, name) its own bucket regardless ofper_metric_limitsmembership.Vector configuration
How did you test this PR?
Tested with above configuration. Simulated an Otel Collector with the following Python script:
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.