authorizer: cache PreparedEvalQuery per (policy path, decisions) by Jura-Z · Pull Request #724 · aserto-dev/topaz

Jura-Z · 2026-05-21T19:02:28Z

authorizer: cache PreparedEvalQuery per (policy path, decisions)

Summary

Each Is() call currently rebuilds a rego.PreparedEvalQuery from the runtime's
compiler and store, including ast.ParseBody / ast.ParseRef for the query body.
The prepared query is a pure function of the policy path, the decisions list, and
the active OPA compiler — all stable between bundle reloads — so we can memoize it.

This PR adds a small preparedQueryCache (sync.Map + singleflight) on
AuthorizerServer that holds prepared queries keyed by (policy_path, decisions).
Cache invalidation is wired via plugins.Manager.RegisterCompilerTrigger, which
fires on bundle activation / discovery update / any compiler rotation, so a policy
change is reflected on the next request.

Why

Under sustained concurrent load, every goroutine inside Is() repeats the same
parse + plan work and contends on the OPA compiler's internal locks. The cache
removes that duplicated work entirely on the hot path — typical Topaz deployments
see only a handful of unique (path, decisions) tuples across millions of calls,
so the cache is read-mostly and the lifetime of each entry is the lifetime of the
bundle.

Benchmark

Stub iap.egress.http policy (allow := true for two test inputs, no
directory lookups), native Go gRPC client to port 8282 (bypasses the REST
gateway), Apple M-series Mac (12 P-cores). Each row is a 3-second
fixed-time run; numbers below are the median of 3 back-to-back runs after a
100-call warmup. p50 latency is per-call, measured client-side.

concurrency	before rps	after rps	gain	before p50	after p50
1	7,803	11,517	+48%	113 µs	77 µs
2	12,885	19,545	+52%	121 µs	83 µs
4	18,474	28,831	+56%	153 µs	110 µs
8	23,880	37,572	+57%	284 µs	173 µs
16	29,929	46,980	+57%	522 µs	301 µs

Implementation notes

preparedQueryCache.entries is a sync.Map keyed by a stable string
derived from the policy path and the ordered decisions list (separators are
ASCII \x1f / \x1e, neither of which appear in valid Rego identifiers).
singleflight.Group collapses concurrent misses on the same key into one
PrepareForEval call, so a thundering herd on first use of a new
(path, decisions) tuple doesn't multiply work.
ensureCompilerWatcher registers the invalidation hook lazily on first
use — exactly once per *plugins.Manager. The hook drops the entire cache
when the compiler is rotated; bundle reloads are rare relative to Is()
rate, so we don't try to be more precise.
Errors from the factory (e.g. PrepareForEval failures on a malformed
request) are propagated and not cached, so a transient failure doesn't
poison the cache.
The factory closure captures policyPath and decisions by reference; the
shared parsing helpers in aserto-dev/runtime (ValidateRule,
ValidateQuery) are still invoked, just inside the factory rather than on
every request.

Tests

TestCacheKey: key derivation is order-sensitive on the decisions list,
distinguishes paths and decision lists correctly, and is stable for inputs
that contain edge-case characters.
TestGetOrPrepare_CachesAndDedupes: factory runs at most ~once across 200
concurrent goroutines for the same key; subsequent calls hit the cache
without invoking the factory; a fresh key still triggers exactly one
factory call.
TestGetOrPrepare_FactoryError: factory errors propagate to the caller
and are not cached.

All three pass. Existing tests in topazd/authorizer/impl/ still pass.

Compatibility

No API change.
No config change.
No new dependency: golang.org/x/sync/singleflight is already a transitive
dep through OPA / grpc-go.

What this PR does NOT solve

This PR removes one of several CPU costs in the per-call path. It does
not deliver linear concurrency scaling — Topaz still scales sub-linearly
under N-way concurrent load. To document this honestly for reviewers, here
is the full investigation that informed this PR:

Profiling under sustained 16-way load (after this PR's cache)

CPU breakdown from go tool pprof on a 10s sample at 47k rps:

category	CPU %
Go runtime scheduler (`schedule`/`findRunnable`)	34%
gRPC server framework + middlewares	23%
Cores parking/waking (`runtime.usleep`)	22%
GC	16%
Syscalls (kevent/netpoll)	11%
TLS write	9%
OPA `PreparedEvalQuery.Eval` (the actual work)	9%
Topaz authorizer code	11%

The mutex profile shows 96% of contention in runtime.unlock / scheduler
internals; no application-level lock is the bottleneck after this fix.

Architectural ceiling tested empirically

To confirm the limit, I tried the obvious additional optimizations and
measured. With this PR as baseline (47,494 rps at conc=16):

variant	conc=16 rps	gain over PR
this PR alone (cache)	47,494	—
+ plaintext gRPC (no TLS on loopback)	55,067	+16%
+ strip RequestID/Tracing/Error/Prometheus	57,142	+20%
+ `GOMAXPROCS=4` (vs default 18)	68,552	+44%
2 separate topazd processes, conc=16 each, summed	84,245	+77%

GOMAXPROCS=4 outperforming the default is striking — the Go runtime
scheduler thrashes when given more P's than the workload's effective
parallelism, since each P repeatedly tries to steal nonexistent work. This
is independent of this PR; tuning GOMAXPROCS is a runtime knob, not a
code change.

The 2-process number confirms the per-process ceiling is real: doubling
the process count nearly doubles total throughput. Topaz scales out
horizontally, not up. The within-process ceiling is dominated by Go
scheduler / GC / netpoll cost at this allocation rate, not by any lock or
serialization in Topaz code.

Suggestions for further work (not in this PR)

If the maintainers want to push the per-process ceiling higher, the
remaining levers are:

Skip middlewares when their effects are unobserved. E.g. don't build
the per-request zerolog instance in TracingMiddleware when level >
Trace; don't generate prometheus exemplars when no scraper is attached.
sync.Pool the protobuf IsRequest/IsResponse and the input
map[string]any. Profile shows ~16% CPU in GC; pooling could cut
that meaningfully.
Document GOMAXPROCS tuning for high-throughput deployments — it's
a free 22% on the same hardware.

Each of those is a separate PR if the maintainers are interested.

Each Is() call rebuilds a rego.PreparedEvalQuery from the runtime's compiler and store, including ast.ParseBody / ast.ParseRef for the query body. Under concurrent load this fights the OPA compiler's internal locks and burns CPU on duplicate parsing. Memoize the prepared query keyed on (policy_context.path, policy_context.decisions). The compiler/store/policy bundle are stable between bundle reloads, so the prepared query is reusable. Invalidation is wired up via plugins.Manager.RegisterCompilerTrigger, which fires on bundle activation / discovery update / any compiler rotation. Concurrency-safe: sync.Map for read-mostly access, golang.org/x/sync singleflight to collapse concurrent misses on the same key into one PrepareForEval call. Measured against a stub iap.egress.http policy, native Go gRPC client (port 8282), Apple M-series Mac. Median of 3 back-to-back 3-second runs after a 100-call warmup: conc | before rps | after rps | gain -----+------------+-----------+------ 1 | 7,803 | 11,517 | +48% 4 | 18,474 | 28,831 | +56% 8 | 23,880 | 37,572 | +57% 16 | 29,929 | 46,980 | +57% p50 latency drops correspondingly (e.g. conc=1: 113 µs -> 77 µs; conc=16: 522 µs -> 301 µs). Decisions are byte-identical to the unmodified path. Tests: - TestCacheKey: key derivation is order-sensitive on the decisions list and stable across runs. - TestGetOrPrepare_CachesAndDedupes: factory runs at most once across 200 concurrent goroutines for the same key; subsequent calls hit the cache without invoking the factory. - TestGetOrPrepare_FactoryError: factory errors are propagated and not cached, so transient failures don't poison the cache.

Jura-Z mentioned this pull request May 21, 2026

service/builder: use grpc.NumStreamWorkers for fixed goroutine pool #725

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

authorizer: cache PreparedEvalQuery per (policy path, decisions)#724

authorizer: cache PreparedEvalQuery per (policy path, decisions)#724
Jura-Z wants to merge 1 commit into
aserto-dev:mainfrom
Jura-Z:izakipnyi/cache-prepared-eval-query

Jura-Z commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jura-Z commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

authorizer: cache PreparedEvalQuery per (policy path, decisions)

Summary

Why

Benchmark

Implementation notes

Tests

Compatibility

What this PR does NOT solve

Profiling under sustained 16-way load (after this PR's cache)

Architectural ceiling tested empirically

Suggestions for further work (not in this PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jura-Z commented May 21, 2026 •

edited

Loading