feat(allocation-policy): Per-org overrides for bytes-scanned policy#7975
Open
phacops wants to merge 2 commits into
Open
feat(allocation-policy): Per-org overrides for bytes-scanned policy#7975phacops wants to merge 2 commits into
phacops wants to merge 2 commits into
Conversation
…ngPolicy
Add two new Configuration entries on BytesScannedRejectingPolicy:
- organization_referrer_scan_limit_override, keyed by
(organization_id, referrer)
- organization_scan_limit_override, keyed by organization_id
Previously the only way to override the scan limit for the
organization branch was the per-referrer override that applied to
every organization, which made it impossible to tune the limit for a
single noisy org without affecting everyone else.
Overrides on the organization branch are now resolved in order of
specificity, with the first one set winning:
(organization_id, referrer)
> organization_id
> (all orgs, referrer)
> default
The project branch and cross-org behavior are unchanged.
Also fix pre-existing lint/mypy issues in the same files that the
pre-commit hook surfaces once the files are touched (E712 truthiness
asserts, untyped tenant_ids dicts, ResourceIdentifier arg type, and a
misplaced type: ignore exposed by ruff reformatting).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Agent transcript: https://claudescope.sentry.dev/share/xXAC6_vLoFmdoaI4mlEdxiC7CUhp2mohJV2MuJoaNPU
onewland
approved these changes
May 28, 2026
Add two new Configuration entries on BytesScannedRejectingPolicy that forward a hard max_bytes_to_read value to ClickHouse and bypass the sliding-window scan limit for the configured organization: - organization_referrer_max_bytes_to_read, keyed by (organization_id, referrer) - organization_max_bytes_to_read, keyed by organization_id When either is set the query runs at full threads with the configured cap and the sliding window is not consulted; ClickHouse aborts the query if it would scan more than the cap. (org_id, referrer) is more specific and wins over org_id. This complements the existing global limit_bytes_instead_of_rejecting flow, which only caps queries after a tenant exceeds its scan limit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/K7ElxY0inzZzY1icw8PnMWaO4jjRKEoV3AthvEUGg7E
Comment on lines
+282
to
+292
| max_bytes_to_read=org_cap, | ||
| explanation={ | ||
| "reason": f"organization_id {tenant_ids.get('organization_id')} runs with a per-org max_bytes_to_read cap of {org_cap}" | ||
| }, | ||
| is_throttled=False, | ||
| throttle_threshold=MAX_THRESHOLD, | ||
| rejection_threshold=MAX_THRESHOLD, | ||
| quota_used=0, | ||
| quota_unit=QUOTA_UNIT, | ||
| suggestion=NO_SUGGESTION, | ||
| ) |
Contributor
There was a problem hiding this comment.
Bug: The new organization-level max_bytes_to_read check runs before determining the query type, causing project-based queries to bypass their specific rate limits if an org-level cap is set.
Severity: HIGH
Suggested Fix
The organization cap check should be moved to after the call to _get_customer_tenant_key_and_value(). The check should only be applied when the customer_tenant_key is confirmed to be "organization_id", ensuring it only affects organization-based queries.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: snuba/query/allocation_policies/bytes_scanned_rejecting_policy.py#L277-L292
Potential issue: The `__get_organization_max_bytes_to_read()` check is applied to any
query with an `organization_id` in `tenant_ids`. This check occurs before the logic that
determines if a query is project-based or organization-based. Consequently, for a
project-based query that also includes an `organization_id`, if an organization-level
`max_bytes_to_read` cap is configured, the function will return early with the
organization's cap. This bypasses the intended project-level sliding-window rate limits,
potentially allowing the project to scan an unlimited number of bytes beyond its
configured limit.
Did we get this right? 👍 / 👎 to inform future reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add per-organization controls to
BytesScannedRejectingPolicyso individual noisy orgs can be tuned without affecting every other organization.Per-org scan-limit overrides
Two new configs override the sliding-window scan limit on the organization branch:
organization_referrer_scan_limit_override, keyed by(organization_id, referrer)organization_scan_limit_override, keyed byorganization_idOverrides are resolved in order of specificity, with the first one set winning:
Per-org max_bytes_to_read cap
Two more configs forward a hard
max_bytes_to_readvalue to ClickHouse and bypass the sliding-window check entirely:organization_referrer_max_bytes_to_read, keyed by(organization_id, referrer)organization_max_bytes_to_read, keyed byorganization_idWhen set, the query is allowed to run at full threads with the cap applied; ClickHouse aborts it if it would scan more than the cap.
(org_id, referrer)wins overorg_id. This complements the existing globallimit_bytes_instead_of_rejectingflow, which only caps queries after a tenant exceeds its scan limit.The project branch and cross-org behavior are unchanged.
Drive-by fixes
Touching the file caused the pre-commit hook to flag pre-existing lint/mypy issues in the same files (E712 truthiness comparisons, untyped
tenant_idsdicts,ResourceIdentifierarg type, and a# type: ignorethat drifted off the offending line after ruff reformatted it). These are fixed in the same PR since the hook blocks otherwise.