Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 22 additions & 8 deletions helm/docs/monitoring-infrastructure.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,20 @@ how they were validated.

The chart provides two categories of monitoring integration:

1. **Prometheus `prometheus.io/*` annotations** on all Services (always enabled).
These allow standard Prometheus installations using `kubernetes_sd_configs`
to auto-discover and scrape CloudZero Agent metrics without any CRDs.
1. **Prometheus `prometheus.io/*` annotations** on all Services (enabled by
default, controlled by `components.monitoring.scrapeAnnotations`). These
allow standard Prometheus installations using `kubernetes_sd_configs` to
auto-discover and scrape CloudZero Agent metrics without any CRDs.

2. **Prometheus Operator CRDs** (opt-in via `components.monitoring.enabled`).
When enabled, the chart creates `ServiceMonitor` and `PrometheusRule`
resources that the Prometheus Operator automatically picks up.

When both are active simultaneously, Prometheus deployments that honor both
annotation-based discovery and ServiceMonitors may scrape each target twice.
Set `components.monitoring.scrapeAnnotations: false` to disable the annotations
when using ServiceMonitors.

These resources are designed to be useful regardless of the customer's
monitoring stack. The `ServiceMonitor` and `PrometheusRule` CRDs are the
standard interoperability format understood by the Prometheus Operator, but
Expand All @@ -32,6 +38,10 @@ components:
# false = never install CRDs (default while feature is being validated)
enabled: false

# true (default) = keep prometheus.io/* annotations on Services
# false = remove redundant annotations from Services
scrapeAnnotations: true

# Override namespace for CRDs (default: same as agent namespace)
namespace: ""

Expand Down Expand Up @@ -288,11 +298,15 @@ Validated using multiple test scenarios on the `bach` cluster:

Tested via `helm template` with all three modes:

| `components.monitoring.enabled` | ServiceMonitors | PrometheusRules | `prometheus.io/*` annotations |
| ------------------------------- | --------------- | --------------- | ----------------------------- |
| `null` (no CRDs in cluster) | 0 | 0 | 3 (always) |
| `true` | 4 | 1 | 3 (always) |
| `false` | 0 | 0 | 3 (always) |
| `components.monitoring.enabled` | ServiceMonitors | PrometheusRules | `prometheus.io/*` annotations<sup>†</sup> |
| ------------------------------- | --------------- | --------------- | ----------------------------------------- |
| `null` (no CRDs in cluster) | 0 | 0 | 3 |
| `true` | 4 | 1 | 3 |
| `false` | 0 | 0 | 3 |

<sup>†</sup> Annotation count assumes `components.monitoring.scrapeAnnotations: true`
(default). Set to `false` to omit annotations, e.g. when `enabled` is `true` or
`"auto"` to avoid duplicate scraping.

### Test Suite

Expand Down
6 changes: 5 additions & 1 deletion helm/templates/agent-service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,15 @@ metadata:
.Values.commonMetaLabels
)
) | nindent 2 }}
{{- $promAnnotations := dict -}}
{{- if not (eq .Values.components.monitoring.scrapeAnnotations false) -}}
{{- $promAnnotations = dict "prometheus.io/scrape" "true" "prometheus.io/port" "9090" "prometheus.io/path" "/metrics" -}}
{{- end -}}
{{- include "cloudzero-agent.generateAnnotations" (dict
"root" .
"annotations" (list
.Values.defaults.annotations
(dict "prometheus.io/scrape" "true" "prometheus.io/port" "9090" "prometheus.io/path" "/metrics")
$promAnnotations
)
) | nindent 2 }}
spec:
Expand Down
6 changes: 5 additions & 1 deletion helm/templates/aggregator-service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,16 @@ metadata:
.Values.components.aggregator.labels
)
) | nindent 2 }}
{{- $promAnnotations := dict -}}
{{- if not (eq .Values.components.monitoring.scrapeAnnotations false) -}}
{{- $promAnnotations = dict "prometheus.io/scrape" "true" "prometheus.io/port" (.Values.aggregator.collector.port | quote) "prometheus.io/path" "/metrics" -}}
{{- end -}}
{{- include "cloudzero-agent.generateAnnotations" (dict
"root" .
"annotations" (list
.Values.defaults.annotations
.Values.components.aggregator.annotations
(dict "prometheus.io/scrape" "true" "prometheus.io/port" (.Values.aggregator.collector.port | quote) "prometheus.io/path" "/metrics")
$promAnnotations
)
) | nindent 2 }}
spec:
Expand Down
6 changes: 5 additions & 1 deletion helm/templates/webhook-service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,17 @@ metadata:
.Values.components.webhookServer.labels
)
) | nindent 2 }}
{{- $promAnnotations := dict -}}
{{- if not (eq .Values.components.monitoring.scrapeAnnotations false) -}}
{{- $promAnnotations = dict "prometheus.io/scrape" "true" "prometheus.io/port" "8443" "prometheus.io/path" "/metrics" "prometheus.io/scheme" "https" -}}
{{- end -}}
{{- include "cloudzero-agent.generateAnnotations" (dict
"root" .
"annotations" (list
.Values.defaults.annotations
.Values.components.webhookServer.annotations
(dict "nginx.ingress.kubernetes.io/ssl-redirect" "false")
(dict "prometheus.io/scrape" "true" "prometheus.io/port" "8443" "prometheus.io/path" "/metrics" "prometheus.io/scheme" "https")
$promAnnotations
)
) | nindent 2 }}
namespace: {{ .Release.Namespace }}
Expand Down
54 changes: 54 additions & 0 deletions helm/tests/defaults_service_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,18 @@
#
# This test validates that Service resources properly inherit
# defaults.labels and defaults.annotations from the chart's defaults section.
# Also tests monitoring.scrapeAnnotations controls prometheus.io/* annotations.
#
# Services only support metadata-level defaults (labels and annotations).
# PodSpec defaults (affinity, tolerations, etc.) do not apply to Services.
#
# Templates tested:
# - agent-service.yaml
# - aggregator-service.yaml
# - webhook-service.yaml
suite: defaults.* properties apply to Service resources
templates:
- agent-service.yaml
- aggregator-service.yaml
- webhook-service.yaml
tests:
Expand Down Expand Up @@ -91,3 +94,54 @@ tests:
- equal:
path: metadata.annotations.test-defaults-annotation
value: sentinel-value-annotation

# ============================================================================
# monitoring.scrapeAnnotations tests
# ============================================================================
- it: should include prometheus.io annotations on agent-service by default
template: agent-service.yaml
asserts:
- equal:
path: metadata.annotations["prometheus.io/scrape"]
value: "true"

- it: should omit prometheus.io annotations on agent-service when scrapeAnnotations is false
template: agent-service.yaml
set:
components.monitoring.scrapeAnnotations: false
asserts:
- isNull:
path: metadata.annotations["prometheus.io/scrape"]

- it: should include prometheus.io annotations on aggregator-service by default
template: aggregator-service.yaml
asserts:
- equal:
path: metadata.annotations["prometheus.io/scrape"]
value: "true"

- it: should omit prometheus.io annotations on aggregator-service when scrapeAnnotations is false
template: aggregator-service.yaml
set:
components.monitoring.scrapeAnnotations: false
asserts:
- isNull:
path: metadata.annotations["prometheus.io/scrape"]

- it: should include prometheus.io annotations on webhook-service by default
template: webhook-service.yaml
set:
insightsController.enabled: true
asserts:
- equal:
path: metadata.annotations["prometheus.io/scrape"]
value: "true"

- it: should omit prometheus.io annotations on webhook-service when scrapeAnnotations is false
template: webhook-service.yaml
set:
insightsController.enabled: true
components.monitoring.scrapeAnnotations: false
asserts:
- isNull:
path: metadata.annotations["prometheus.io/scrape"]
4 changes: 4 additions & 0 deletions helm/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -6309,6 +6309,10 @@
}
]
},
"scrapeAnnotations": {
"default": true,
"type": "boolean"
},
"sharedSecret": {
"default": false,
"type": "boolean"
Expand Down
20 changes: 18 additions & 2 deletions helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -859,11 +859,27 @@ components:
#
# To opt in now, set to "auto" or true.
#
# Regardless of this setting, prometheus.io/* annotations are always added to
# Services for customers using standard Prometheus service discovery.
# By default, prometheus.io/* annotations are added to Services for customers
# using standard Prometheus service discovery. Set monitoring.scrapeAnnotations:
# false to disable them when using Prometheus Operator ServiceMonitors to avoid
# Prometheus scraping each target twice.
monitoring:
enabled: null

# Controls whether prometheus.io/* annotations are added to Services.
#
# Background: When monitoring.enabled is true, the chart creates
# ServiceMonitor CRDs that instruct the Prometheus Operator to scrape
# CloudZero Agent metrics. In that setup, the prometheus.io/* annotations on
# Services become redundant and in clusters where both annotation-based
# and CRD-based discovery are active, same metrics could be scraped twice.
#
# - true (default): Keep the prometheus.io/* annotations set on Services.
# This value ensures backward compatibility
#
# - false: Remove the redundant prometheus.io/* annotations from Services.
scrapeAnnotations: true

# Namespace override for PrometheusRule and ServiceMonitor CRDs.
# null (default) = same namespace as the agent installation.
# Some Prometheus Operator deployments require CRDs to be in a specific
Expand Down
1 change: 1 addition & 0 deletions tests/helm/template/alloy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1104,6 +1104,7 @@ data:
enabled: null
labels: {}
namespace: null
scrapeAnnotations: true
sharedSecret: false
prometheus:
image:
Expand Down