Skip to content

Gracefully degrade on metrics encoding failures#213

Merged
TheJokr merged 1 commit into
mainfrom
lblocher/metrics-soft-error
Jun 3, 2026
Merged

Gracefully degrade on metrics encoding failures#213
TheJokr merged 1 commit into
mainfrom
lblocher/metrics-soft-error

Conversation

@TheJokr
Copy link
Copy Markdown
Collaborator

@TheJokr TheJokr commented Jun 2, 2026

By default, prometheus-client aborts metrics collection/encoding on any error. This means one error from the EncodeMetric implementation of a single metric in a Registry makes all the metrics in that Registry unavailable.

To avoid this, we introduce an EncodeMetric wrapper that swallows errors (after reporting them via logging/eprintln!.) Due to the way prometheus-client is designed, this could leave a partially-written metrics line in the output. We fix this by introducing RewindableWriter, a Write implementation that we can rewind to the last newline in case we do see an error.

prometheus-client does not give us access to the underlying writer in EncodeMetric impls, so we are forced to use a side channel via thread-local storage to activate the rewind behavior.

@TheJokr TheJokr requested a review from fisherdarling June 2, 2026 16:01
@TheJokr TheJokr self-assigned this Jun 2, 2026
Comment thread foundations/src/telemetry/metrics/internal.rs
@TheJokr TheJokr force-pushed the lblocher/metrics-soft-error branch from b99a32e to c96fc66 Compare June 2, 2026 16:39
By default, `prometheus-client` aborts metrics collection/encoding on
_any_ error. This means one error from the EncodeMetric implementation
of a single metric in a Registry makes all the metrics in that Registry
unavailable.

To avoid this, we introduce an EncodeMetric wrapper that swallows errors
(after reporting them via logging/`eprintln!`.) Due to the way
`prometheus-client` is designed, this could leave a partially-written
metrics line in the output. We fix this by introducing
`RewindableWriter`, a `Write` implementation that we can rewind to the
last newline in case we do see an error.

`prometheus-client` does not give us access to the underlying writer in
EncodeMetric impls, so we are forced to use a side channel via
thread-local storage to activate the rewind behavior.
@TheJokr TheJokr force-pushed the lblocher/metrics-soft-error branch from c96fc66 to 0de2a0a Compare June 2, 2026 16:47
@TheJokr TheJokr merged commit 1893de8 into main Jun 3, 2026
20 checks passed
@TheJokr TheJokr deleted the lblocher/metrics-soft-error branch June 3, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants