Skip to content

collectors: consolidate config and introduce Runtime#500

Open
kgeckhart wants to merge 3 commits into
prometheus-community:masterfrom
kgeckhart:kgeckhart/add-runtime-config-validate
Open

collectors: consolidate config and introduce Runtime#500
kgeckhart wants to merge 3 commits into
prometheus-community:masterfrom
kgeckhart:kgeckhart/add-runtime-config-validate

Conversation

@kgeckhart
Copy link
Copy Markdown
Contributor

@kgeckhart kgeckhart commented May 6, 2026

The exporter's config + assembly logic is currently spread across config/ (mixing pure config with OTel-bridge plumbing — Option, AllOptions, OTelComponentDefaults), main() (project resolution, monitoring service creation), and the HTTP handler (collector cache). There's no single entry point for building collectors from a config.

This PR cleans up the layering:

  • config/: pure value-type package — Config, defaults, NewConfigWithDefaults, Validate. No GCP or collectors deps.
  • collectors.NewRuntime: single assembly entry point. Returns a *Runtime with API Collectors / CollectorsForPrefixes(prefix) / WithCache.
  • utils/ deleted: helpers now live unexported in collectors/ next to their callers.
  • Filter-path scrape errors return HTTP 500 instead of os.Exit(1) — one project's collector-build failure no longer crashes the process.

The public Go API surface is reduced. The only known external consumer (grafana/alloy and its forks) is unaffected — verified against their gcp_exporter.go and self_pruning_store.go.

@kgeckhart kgeckhart requested review from ArthurSens and SuperQ May 6, 2026 18:53
Copy link
Copy Markdown
Contributor

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but do you mind opening a PR in https://github.com/prometheus/prometheus-opentelemetry-collector with the decoder implementation?

@kgeckhart kgeckhart marked this pull request as draft May 6, 2026 21:06
@kgeckhart
Copy link
Copy Markdown
Contributor Author

I'm drafting it while I work through the end-to-end semantics. It was all working cleanly but the decoder on the other side ended up being a bit of a mess.

@ArthurSens ArthurSens self-requested a review May 7, 2026 19:28
@kgeckhart kgeckhart force-pushed the kgeckhart/add-runtime-config-validate branch from 6909981 to d2cd4e8 Compare May 11, 2026 18:59
@kgeckhart kgeckhart changed the title config: add Validate to RuntimeConfig, remove OTel-specific fields from Option collectors: consolidate config and introduce Runtime May 11, 2026
@kgeckhart kgeckhart marked this pull request as ready for review May 11, 2026 19:38
Copy link
Copy Markdown
Contributor

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you chose to perform the Config validation inside NewRuntime rather than in a separate func (c *config) Validate() function. Was that intentional?

I was also a bit confused by the removal of the config package, but after reading your comment here prometheus/prometheus-opentelemetry-collector#13 (comment), it's clear this was a conscious decision.

I'm not strongly opinionated about it. Since the config will always be used by the package that generates the Prometheus Registry, it sounds ok to keep both in the same package. I don't see anything bad in keeping them separate, either.

Comment thread collectors/config.go Outdated
Comment thread collectors/config.go Outdated
Comment thread stackdriver_exporter.go Outdated
Comment thread collectors/config.go Outdated
Comment thread collectors/config.go Outdated
@kgeckhart
Copy link
Copy Markdown
Contributor Author

I'm not strongly opinionated about it. Since the config will always be used by the package that generates the Prometheus Registry, it sounds ok to keep both in the same package. I don't see anything bad in keeping them separate, either.

I think I'm on board with keeping them separated at this point. It can leave us open to some interesting things where we could have a consumer that only needs the config package to do automated config rendering for high level use cases like "collect metrics from GCP VMs". Most of the awkwardness I found during this further refactor was caused by the various functions living in different places. Resolving some of your comments and taking a closer look at where the functions live should clear that part up.

Signed-off-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>
@kgeckhart kgeckhart force-pushed the kgeckhart/add-runtime-config-validate branch from d2cd4e8 to 54607be Compare May 13, 2026 18:30
Comment on lines +39 to +48
func normalizeMetricName(metricName string) string {
var parts []string
for _, word := range camelcase.Split(metricName) {
safe := strings.Trim(safeNameRE.ReplaceAllLiteralString(word, "_"), "_")
lower := strings.TrimSpace(strings.ToLower(safe))
if lower != "" {
parts = append(parts, lower)
}
}
return strings.Join(parts, "_")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're just moving a function from one place to another, so completely out of scope for this PR, but... in prometheus/otlptranslator we have an API that does the same thing and doesn't depend on regex matching. It's probably a lot faster :)

Copy link
Copy Markdown
Contributor

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one suggestion, otherwise LGTM!

Comment thread stackdriver_exporter.go Outdated
Comment on lines +260 to +267
// Validate up front so CLI users get a clean error before any GCP client
// setup. NewRuntime also calls Validate so embedded callers don't have to.
if err := cfg.Validate(); err != nil {
logger.Error("invalid configuration", "err", err)
os.Exit(1)
}

runtime, err := collectors.NewRuntime(ctx, logger, cfg, delta.NewInMemoryCounterStore, delta.NewInMemoryHistogramStore)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're calling NewRuntime, which also calls Validate(), right after the Validate() call here. I think we should choose one. Either call Validate outside of NewRuntime, and therefore delete the Validate call from there, or remove this call of Validate from stackdriver_exporter.go

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about 7f7bd26? I realized I have the same double validate dance in the receiver side. I mostly wanted to protect from unvalidated configs being sent through. Someone can still mutate after the fact but then they are intentionally breaking it vs unknowingly.

Signed-off-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>
Comment thread collectors/runtime.go Outdated
Signed-off-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the doc fixes

Copy link
Copy Markdown
Contributor

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one nit actually

Comment thread collectors/runtime.go
// to derive a sibling that does.
func NewRuntime(ctx context.Context, logger *slog.Logger, cfg *config.Config, counterFactory CounterStoreFactory, histogramFactory HistogramStoreFactory) (*Runtime, error) {
if !cfg.Validated() {
return nil, fmt.Errorf("config has not been validated; call cfg.Validate before NewRuntime")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe panic here? There's no action item for the end-user if we ever hit this. It is clearly a bug on our side

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not strongly opinionated though, I know panics are very very bad

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 fair point, hmm the CLI is going to end in error either way. An embedder might not want to crash if the component fails even for a programming error like this. I think I like an error since it's less forceful. I know in some early refactorings of other exporters the first thing we often did was expose errors vs panics especially for use cases that involve multi-tenancy panic'ing can be really impactful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor Stackdriver exporter

2 participants