authorizer: cache PreparedEvalQuery per (policy path, decisions)#724
Open
Jura-Z wants to merge 1 commit into
Open
authorizer: cache PreparedEvalQuery per (policy path, decisions)#724Jura-Z wants to merge 1 commit into
Jura-Z wants to merge 1 commit into
Conversation
Each Is() call rebuilds a rego.PreparedEvalQuery from the runtime's
compiler and store, including ast.ParseBody / ast.ParseRef for the
query body. Under concurrent load this fights the OPA compiler's
internal locks and burns CPU on duplicate parsing.
Memoize the prepared query keyed on (policy_context.path,
policy_context.decisions). The compiler/store/policy bundle are stable
between bundle reloads, so the prepared query is reusable. Invalidation
is wired up via plugins.Manager.RegisterCompilerTrigger, which fires
on bundle activation / discovery update / any compiler rotation.
Concurrency-safe: sync.Map for read-mostly access, golang.org/x/sync
singleflight to collapse concurrent misses on the same key into one
PrepareForEval call.
Measured against a stub iap.egress.http policy, native Go gRPC client
(port 8282), Apple M-series Mac. Median of 3 back-to-back 3-second runs
after a 100-call warmup:
conc | before rps | after rps | gain
-----+------------+-----------+------
1 | 7,803 | 11,517 | +48%
4 | 18,474 | 28,831 | +56%
8 | 23,880 | 37,572 | +57%
16 | 29,929 | 46,980 | +57%
p50 latency drops correspondingly (e.g. conc=1: 113 µs -> 77 µs;
conc=16: 522 µs -> 301 µs). Decisions are byte-identical to the
unmodified path.
Tests:
- TestCacheKey: key derivation is order-sensitive on the decisions
list and stable across runs.
- TestGetOrPrepare_CachesAndDedupes: factory runs at most once across
200 concurrent goroutines for the same key; subsequent calls hit
the cache without invoking the factory.
- TestGetOrPrepare_FactoryError: factory errors are propagated and
not cached, so transient failures don't poison the cache.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
authorizer: cache PreparedEvalQuery per (policy path, decisions)
Summary
Each
Is()call currently rebuilds arego.PreparedEvalQueryfrom the runtime'scompiler and store, including
ast.ParseBody/ast.ParseReffor the query body.The prepared query is a pure function of the policy path, the decisions list, and
the active OPA compiler — all stable between bundle reloads — so we can memoize it.
This PR adds a small
preparedQueryCache(sync.Map + singleflight) onAuthorizerServerthat holds prepared queries keyed by(policy_path, decisions).Cache invalidation is wired via
plugins.Manager.RegisterCompilerTrigger, whichfires on bundle activation / discovery update / any compiler rotation, so a policy
change is reflected on the next request.
Why
Under sustained concurrent load, every goroutine inside
Is()repeats the sameparse + plan work and contends on the OPA compiler's internal locks. The cache
removes that duplicated work entirely on the hot path — typical Topaz deployments
see only a handful of unique
(path, decisions)tuples across millions of calls,so the cache is read-mostly and the lifetime of each entry is the lifetime of the
bundle.
Benchmark
Stub
iap.egress.httppolicy (allow := truefor two test inputs, nodirectory lookups), native Go gRPC client to port 8282 (bypasses the REST
gateway), Apple M-series Mac (12 P-cores). Each row is a 3-second
fixed-time run; numbers below are the median of 3 back-to-back runs after a
100-call warmup. p50 latency is per-call, measured client-side.
Implementation notes
preparedQueryCache.entriesis async.Mapkeyed by a stable stringderived from the policy path and the ordered decisions list (separators are
ASCII
\x1f/\x1e, neither of which appear in valid Rego identifiers).singleflight.Groupcollapses concurrent misses on the same key into onePrepareForEvalcall, so a thundering herd on first use of a new(path, decisions)tuple doesn't multiply work.ensureCompilerWatcherregisters the invalidation hook lazily on firstuse — exactly once per
*plugins.Manager. The hook drops the entire cachewhen the compiler is rotated; bundle reloads are rare relative to
Is()rate, so we don't try to be more precise.
PrepareForEvalfailures on a malformedrequest) are propagated and not cached, so a transient failure doesn't
poison the cache.
policyPathanddecisionsby reference; theshared parsing helpers in
aserto-dev/runtime(ValidateRule,ValidateQuery) are still invoked, just inside the factory rather than onevery request.
Tests
TestCacheKey: key derivation is order-sensitive on the decisions list,distinguishes paths and decision lists correctly, and is stable for inputs
that contain edge-case characters.
TestGetOrPrepare_CachesAndDedupes: factory runs at most ~once across 200concurrent goroutines for the same key; subsequent calls hit the cache
without invoking the factory; a fresh key still triggers exactly one
factory call.
TestGetOrPrepare_FactoryError: factory errors propagate to the callerand are not cached.
All three pass. Existing tests in
topazd/authorizer/impl/still pass.Compatibility
golang.org/x/sync/singleflightis already a transitivedep through OPA / grpc-go.
What this PR does NOT solve
This PR removes one of several CPU costs in the per-call path. It does
not deliver linear concurrency scaling — Topaz still scales sub-linearly
under N-way concurrent load. To document this honestly for reviewers, here
is the full investigation that informed this PR:
Profiling under sustained 16-way load (after this PR's cache)
CPU breakdown from
go tool pprofon a 10s sample at 47k rps:schedule/findRunnable)runtime.usleep)PreparedEvalQuery.Eval(the actual work)The mutex profile shows 96% of contention in
runtime.unlock/ schedulerinternals; no application-level lock is the bottleneck after this fix.
Architectural ceiling tested empirically
To confirm the limit, I tried the obvious additional optimizations and
measured. With this PR as baseline (47,494 rps at conc=16):
GOMAXPROCS=4(vs default 18)GOMAXPROCS=4outperforming the default is striking — the Go runtimescheduler thrashes when given more P's than the workload's effective
parallelism, since each P repeatedly tries to steal nonexistent work. This
is independent of this PR; tuning
GOMAXPROCSis a runtime knob, not acode change.
The 2-process number confirms the per-process ceiling is real: doubling
the process count nearly doubles total throughput. Topaz scales out
horizontally, not up. The within-process ceiling is dominated by Go
scheduler / GC / netpoll cost at this allocation rate, not by any lock or
serialization in Topaz code.
Suggestions for further work (not in this PR)
If the maintainers want to push the per-process ceiling higher, the
remaining levers are:
the per-request zerolog instance in
TracingMiddlewarewhen level >Trace; don't generate prometheus exemplars when no scraper is attached.
sync.Poolthe protobufIsRequest/IsResponseand theinputmap[string]any. Profile shows ~16% CPU in GC; pooling could cutthat meaningfully.
GOMAXPROCStuning for high-throughput deployments — it'sa free 22% on the same hardware.
Each of those is a separate PR if the maintainers are interested.