Skip to content

feat: allow regex patterns in promql/aggregate keep and strip parameters#1690

Closed
Nepomuk5665 wants to merge 2 commits intocloudflare:mainfrom
Nepomuk5665:feature/aggregate-keep-regex-509
Closed

feat: allow regex patterns in promql/aggregate keep and strip parameters#1690
Nepomuk5665 wants to merge 2 commits intocloudflare:mainfrom
Nepomuk5665:feature/aggregate-keep-regex-509

Conversation

@Nepomuk5665
Copy link
Copy Markdown
Contributor

Summary

This PR implements issue #509, allowing users to specify regex patterns in the keep and strip parameters for the promql/aggregate check.

Changes

  • Modified AggregationCheck to accept a regex pattern for label matching instead of requiring literal strings
  • Added extractLiteralLabels helper to extract individual labels from simple alternation patterns like job|instance
  • Updated config validation to verify regex patterns in keep/strip lists
  • For simple literal patterns, the behavior is fully backward compatible
  • For regex patterns, all labels in src.Labels that match the pattern are checked

Example Usage

rule {
  match {
    kind = "alerting"
  }

  aggregate ".+" {
    keep = ["job|instance"]  # Preserve either job or instance labels
  }
}

Fixes #509

This change implements issue cloudflare#509, allowing users to specify regex patterns
in the keep and strip parameters for the promql/aggregate check.

Changes:
- Modified AggregationCheck to accept a regex pattern for label matching
  instead of a literal string
- Added extractLiteralLabels helper to extract individual labels from
  simple alternation patterns like 'job|instance'
- Updated config validation to verify regex patterns in keep/strip lists
- For simple literal patterns, the behavior is fully backward compatible
- For regex patterns, all labels in src.Labels that match the pattern
  are checked

Example usage:
  aggregate ".+" {
    keep = ["job|instance"]  # Preserve either job or instance labels
  }

Fixes cloudflare#509
Comment thread internal/checks/promql_aggregation.go Outdated
// labelLiteralPattern matches simple label patterns that are either:
// - A single literal label name (e.g., "job")
// - An alternation of literal label names (e.g., "job|instance")
var labelLiteralPattern = regexp.MustCompile(`^[a-zA-Z_][a-zA-Z0-9_]*(\|[a-zA-Z_][a-zA-Z0-9_]*)*$`)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand what this is trying to achieve.
This check already allows a list of names, so making values that are regexp which allows multiple values don't add anything.

This:

  aggregate ".+" {
    strip = ["job|instance"]
  }

can already be done using:

  aggregate ".+" {
    strip = ["job", "instance"]
  }

the ticket this references asks for:

  aggregate ".+" {
    strip = ["job_.+", ".*instance.*"]
  }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! You're absolutely right - the job|instance alternation pattern was redundant since ["job", "instance"] already achieves the same result.

I've updated the implementation to:

  1. Remove the extractLiteralLabels function that handled alternation patterns
  2. For true regex patterns like job_.+ or .*instance.*, iterate over src.Labels and match against the regex - this enables the use case from Allow regex to keep for promql/aggregate #509
  3. For literal label names (no regex metacharacters), maintain backward compatibility by also checking CanHaveLabel for labels not explicitly in the query

The new tests demonstrate the regex matching behavior:

  • keep = ["job_.+"] will catch labels like job_eu, job_us being removed
  • strip = [".*instance.*"] will catch labels like env_instance_id being present

Addresses review feedback by removing the redundant extractLiteralLabels
function that handled patterns like 'job|instance'. Such alternation
patterns don't add value since users can already specify multiple labels
as a list ['job', 'instance'].

Changes:
- Remove extractLiteralLabels and labelLiteralPattern that handled
  alternation patterns
- Add literalLabelName regex to detect single literal label names for
  backward compatibility with the original CanHaveLabel behavior
- For true regex patterns (like 'job_.+' or '.*instance.*'), only match
  against labels explicitly mentioned in the query
- For literal label names, maintain backward compatibility by also
  checking CanHaveLabel for labels not in src.Labels

This enables the feature requested in cloudflare#509: using regex patterns like
'job_.+' to match multiple labels dynamically, while maintaining full
backward compatibility for existing literal label configurations.
// isLiteralLabelName returns true if the pattern is a valid Prometheus label name
// (contains no regex metacharacters). This is used to determine whether to use
// the original CanHaveLabel behavior for backward compatibility.
var literalLabelName = regexp.MustCompile(`^[a-zA-Z_][a-zA-Z0-9_]*$`)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is needed.
But I also feel that it's slightly awkward for me to comment here just so that you can paste that into whatever AI models/tools you're using to generate this PR. I can ask AI myself and cut the middle man out.

@Nepomuk5665
Copy link
Copy Markdown
Contributor Author

Hey, I apologize for the confusion and frustration. You're right to call this out.

I'm a real person (Nepomuk), but I have been using AI tools to help me find and fix bugs across open source projects. I should have been more upfront about that.

Looking at this PR more honestly - I think I misunderstood the issue #509. The original request was for regex matching against label names dynamically, but I'm not sure my implementation actually solves the real use case properly.

If you'd prefer, I'm happy to close this PR. Or if you could point me in the right direction for what the implementation should actually look like, I'd be glad to take another crack at it - but I'll make sure I actually understand the problem first before submitting changes.

Again, sorry for wasting your time with back-and-forth that wasn't productive. I appreciate you maintaining pint - it's a great tool.

@Nepomuk5665
Copy link
Copy Markdown
Contributor Author

Closing this PR as the implementation doesn't properly address the issue. Thanks for your patience and feedback. If I have a better understanding of what's needed in the future, I may open a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow regex to keep for promql/aggregate

2 participants