Use linear search within unique label validation by geekswaroop · Pull Request #887 · prometheus/common

geekswaroop · 2026-03-15T19:19:03Z

Unique label validation was added in #263

Couple of our prometheus scraper services was spending lots of time on this function, mostly on map rehashing and growth. Profiling revealed that the map used within the unique label validation is contributing to most of this CPU usage.

The duplicate label check in startLabelName allocates a new map[string]struct{} on every call and startLabelName is called once per label, not once per metric line. For a metric with N labels, this creates N throwaway maps with resulting in O(N²) map operations per metric line.

This PR replaces the map-based check with a linear scan of currentLabelPairs before appending. Since the check runs incrementally on each label addition, two existing labels can never be duplicates because only the new label needs to be compared against previous ones. This preserves the same early-return error behavior.

Benchmark

The current test data here has 2-3 labels, and no regression was observed. I added a separate benchmark with 7+ labels (representative of real world metrics emitted) and I see the following.

% benchstat many_labels_before.txt many_labels_after.txt
goos: linux
goarch: amd64
pkg: github.com/prometheus/common/expfmt
cpu: AMD EPYC 7B13
                      │ many_labels_before.txt │        many_labels_after.txt        │
                      │         sec/op         │   sec/op     vs base                │
ParseTextManyLabels-48              671.0µ ± 2%   535.3µ ± 2%  -20.23% (p=0.000 n=10)

                      │ many_labels_before.txt │     many_labels_after.txt      │
                      │          B/op          │     B/op      vs base          │
ParseTextManyLabels-48             219.5Ki ± 0%   219.5Ki ± 0%  ~ (p=0.209 n=10)

                      │ many_labels_before.txt │      many_labels_after.txt      │
                      │       allocs/op        │  allocs/op   vs base            │
ParseTextManyLabels-48              8.276k ± 0%   8.276k ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

rabbbit

I would indeed if we were able to show this with existing benchmarks, but the logic sgtm.

(the O(n^2) doesn't change, I think, but it is fewer maps and many fewer map accesses)

geekswaroop · 2026-04-02T13:45:17Z

@roidelapluie @gotjosh

could you PTAL?

jdhudson3 · 2026-05-12T19:03:29Z

#369 (probably out of date by now) also targeted this issue due to similar performance issues I was encountering.
Commenting as a +1 that the original approach scales quite poorly when dealing with high label volumes which are pretty common.

startLabelName() - linear search

d3a3525

rabbbit approved these changes Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use linear search within unique label validation#887

Use linear search within unique label validation#887
geekswaroop wants to merge 1 commit into
prometheus:mainfrom
geekswaroop:startLabel-linear-search

geekswaroop commented Mar 15, 2026 •

edited

Loading

Uh oh!

rabbbit left a comment

Uh oh!

geekswaroop commented Apr 2, 2026

Uh oh!

jdhudson3 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

geekswaroop commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rabbbit left a comment

Choose a reason for hiding this comment

Uh oh!

geekswaroop commented Apr 2, 2026

Uh oh!

jdhudson3 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

geekswaroop commented Mar 15, 2026 •

edited

Loading