Fix the fallback for container metrics logic to query both container and pod metrics#789
Fix the fallback for container metrics logic to query both container and pod metrics#789nabil-dbz wants to merge 4 commits intoGoogleCloudPlatform:masterfrom
Conversation
|
qq: will this bring some breaking changes? Currently, the latest customer-metrics-stackdriver-adapter is deployed on all existing kubernetes versions. Because users apply the latest production yaml inside this repo. |
…starting with resource.labels
|
I don't expect this to be breaking anything unless I'm missing some details that I'm not aware of. It would probably be a better idea to have an expert review this in detail. I tested the changes locally with Workload Autoscaler and this looked to be working just fine. Let me know if you suspect my changes to be breaking things |
|
I will take a close look at this too, adding this to my list for this week. Thanks @nabil-dbz! |
| } | ||
| } | ||
|
|
||
| func (p *StackdriverProvider) PostProcessPodContainerResp(response *stackdriver.ListTimeSeriesResponse, metricName string) (*stackdriver.ListTimeSeriesResponse, error) { |
There was a problem hiding this comment.
We should really document what this does because modifying a response object here could be really confusing.
I was initially going to suggest not returning a ListTimeSeriesResponse here but I see that this would cause a bunch of issues with the follow-up calls.
| // allowedCustomMetricsLabelPrefixes and allowedCustomMetricsFullLabelNames specify all metric labels allowed for querying | ||
| allowedCustomMetricsLabelPrefixes = []string{"metric.labels"} | ||
| allowedCustomMetricsFullLabelNames = []string{"reducer"} | ||
| allowedCustomMetricsFullLabelNames = []string{"resource.labels.container_name", "reducer"} |
There was a problem hiding this comment.
I think you can omit resource.labels here, can't you? since it will be picked up by the prefix above?
| switch resourceType { | ||
| case PodType: | ||
| schema = PodSchema | ||
| case PodContainerType: |
There was a problem hiding this comment.
What if we named this something like PodOrContainerType?
Otherwise this gets confusing :(
The main idea here is to avoid failure when querying pod level metrics throw an error. For example for the metric type is
kubernetes.io/container/accelerator/duty_cycle, this query:Throws the following error:
Which makes the adapter return an error before trying to query the metrics using the
k8s_container.The proposed solution for this is to add a new resource type called
PodContainerTypefor which we use the operatorone_ofto handle the fallback logic. However, given that the response might contain bothk8s_podandk8s_containermetrics (time series), we're adding a post-processing step to considerk8s_containermetrics ifk8s_podmetrics are absent.Also, to support resource label filters for custom metrics, we're adding
resource.labelsto the list of allowed prefixes for custom metrics.