Skip to content

Fix the fallback for container metrics logic to query both container and pod metrics#789

Open
nabil-dbz wants to merge 4 commits intoGoogleCloudPlatform:masterfrom
nabil-dbz:container-custom-metrics
Open

Fix the fallback for container metrics logic to query both container and pod metrics#789
nabil-dbz wants to merge 4 commits intoGoogleCloudPlatform:masterfrom
nabil-dbz:container-custom-metrics

Conversation

@nabil-dbz
Copy link

The main idea here is to avoid failure when querying pod level metrics throw an error. For example for the metric type is kubernetes.io/container/accelerator/duty_cycle, this query:

metric.type = "kubernetes.io/container/accelerator/duty_cycle" AND resource.type = "k8s_pod"

Throws the following error:

The supplied filter does not specify a valid combination of metric and monitored resource descriptors. 
The query will not return any time series.

Which makes the adapter return an error before trying to query the metrics using the k8s_container.

The proposed solution for this is to add a new resource type called PodContainerType for which we use the operator one_of to handle the fallback logic. However, given that the response might contain both k8s_pod and k8s_container metrics (time series), we're adding a post-processing step to consider k8s_container metrics if k8s_pod metrics are absent.

Also, to support resource label filters for custom metrics, we're adding resource.labels to the list of allowed prefixes for custom metrics.

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Nov 5, 2024

qq: will this bring some breaking changes?

Currently, the latest customer-metrics-stackdriver-adapter is deployed on all existing kubernetes versions. Because users apply the latest production yaml inside this repo.

@nabil-dbz
Copy link
Author

nabil-dbz commented Nov 5, 2024

I don't expect this to be breaking anything unless I'm missing some details that I'm not aware of. It would probably be a better idea to have an expert review this in detail. I tested the changes locally with Workload Autoscaler and this looked to be working just fine.

Let me know if you suspect my changes to be breaking things

@raywainman
Copy link

I will take a close look at this too, adding this to my list for this week. Thanks @nabil-dbz!

}
}

func (p *StackdriverProvider) PostProcessPodContainerResp(response *stackdriver.ListTimeSeriesResponse, metricName string) (*stackdriver.ListTimeSeriesResponse, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really document what this does because modifying a response object here could be really confusing.

I was initially going to suggest not returning a ListTimeSeriesResponse here but I see that this would cause a bunch of issues with the follow-up calls.

// allowedCustomMetricsLabelPrefixes and allowedCustomMetricsFullLabelNames specify all metric labels allowed for querying
allowedCustomMetricsLabelPrefixes = []string{"metric.labels"}
allowedCustomMetricsFullLabelNames = []string{"reducer"}
allowedCustomMetricsFullLabelNames = []string{"resource.labels.container_name", "reducer"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can omit resource.labels here, can't you? since it will be picked up by the prefix above?

switch resourceType {
case PodType:
schema = PodSchema
case PodContainerType:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we named this something like PodOrContainerType?

Otherwise this gets confusing :(

@juli4n juli4n added the custom-metrics-stackdriver-adapter Issues for custom-metrics-stackdriver-adapter label Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

custom-metrics-stackdriver-adapter Issues for custom-metrics-stackdriver-adapter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants