Skip to content

Plan + PlanQuery + ExecutePlan + PlanCache (cacheable execution shape)#740

Open
Nthalk wants to merge 17 commits into
graphql-go:masterfrom
IodeSystems:plan-cache
Open

Plan + PlanQuery + ExecutePlan + PlanCache (cacheable execution shape)#740
Nthalk wants to merge 17 commits into
graphql-go:masterfrom
IodeSystems:plan-cache

Conversation

@Nthalk
Copy link
Copy Markdown

@Nthalk Nthalk commented May 9, 2026

Summary

Adds a cacheable execution shape on top of the existing Execute API:

  • Plan, PlanQuery, ExecutePlan (commit 1) — precompute the per-(schema, document, operationName) work that Execute otherwise repeats every request: operati
    on identification, fragment collection, selection walking, *FieldDefinition resolution, literal-arg pre-coercion, constant @include/@skip evaluation. ExecutePlan
    walks the plan doing only inherently per-request work (variable substitution, abstract type resolution, resolver invocation, result materialization).
  • PlanCache (commit 4) — bounded LRU built on top, with an optional Normalize mode that extracts fully-literal field arguments into synthetic variables before ke
    ying the cache, so two queries that differ only in literal values share one plan.

Execute is unchanged in behavior — it now routes through PlanQuery + ExecutePlan internally. The full existing test suite passes unmodified, including TestMergesPar allelFragments, TestUnionInterfaceExecution, the TestExecutesResolveFunction* family, executor_resolve_test, executor_test, and abstract_test.

Why

For server hot loops, the same query strings show up over and over. Today every request re-parses, re-validates, re-collects fields, and re-resolves field definitions ag
ainst the schema. None of that work depends on the per-request variable values or context. Splitting it out lets servers cache the expensive half and pay only the cheap
half per request.

Numbers

Bench harness in plan_bench_test.go, schemas in benchutil/wide_schema.go. All numbers from go test -bench=. -benchmem.

                                              ms/op   allocs   bytes
Uncached graphql.Do (100-field wide)           5.55   28.9k    --
Cached *Plan + ExecutePlan, varied vars        1.48    9.9k    --
PlanCache native-vars hot loop                 1.47    9.9k    --
PlanCache normalized hot loop (literals)       2.10   14.9k    --
PlanCache no-normalize, literal-baked          5.21   28.9k    --

ListQuery_1K: cached vs Do, ~5% (completion-dominated, parse cost amortized)

Practical reading:

  • Clients that already use GraphQL variables: 3.55× rps win, 66% fewer allocs, no client changes. Just cache.Get + ExecutePlan.
  • Clients that bake literals into queries: 2.48× rps win with Normalize: true. ~70% of the optimal recovery without modifying clients.
  • Schema rebuilds: cache binds entries to *Schema pointer; rebuilds invalidate at lookup time. Reset() available for explicit drops.

API surface

New public types and functions, all additive:

  • type Plan struct { ... } — opaque plan handle.
  • func PlanQuery(schema *Schema, doc *ast.Document, operationName string) (*Plan, error)
  • func ExecutePlan(plan *Plan, params ExecuteParams) *Result
  • type PlanCache struct { ... } (concurrent-safe, bounded)
  • type PlanCacheOptions struct { MaxEntries, MaxQueryBytes int; Normalize bool }
  • type PlanResult struct { Plan *Plan; SynthArgs map[string]interface{}; Errors []FormattedError }
  • func NewPlanCache(opts PlanCacheOptions) *PlanCache
  • (*PlanCache).Get(schema *Schema, query, operationName string) PlanResult
  • (*PlanCache).HitsMisses() (hits, misses uint64)
  • (*PlanCache).Reset()

Execute and Do are unchanged externally — internally they now call PlanQuery + ExecutePlan. No signature changes, no behavior changes to existing code.

Scope (intentional cuts)

First-cut bounds, each individually expandable as follow-up:

  • Only field arguments are normalized (not directive args).
  • Only fully-literal arg values are extracted; mixed structures like {a: 1, b: $foo} stay literal.
  • Abstract-typed sub-selections (Interface/Union returns) aren't descended for arg normalization.
  • Named fragment definitions aren't rewritten — literals inside fragments stay as literals (still cached, just without dedup across literal variants).

Test plan

  • All upstream tests pass unmodified (go test ./... clean against this branch).
  • New plan_cache_test.go covers basic hits, normalize collapsing, end-to-end ExecutePlan of a normalized plan, Reset.
  • Production validation: iodesystems/go-api-gateway consumes PlanQuery + ExecutePlan in its gw.Handler() request path; measured +15% rps and −36% p99 versus its prior AST-only cache.
  • Bench parity: native-vars hot loop matches uncached-with-variables baseline within noise (1.47ms vs 1.48ms), confirming the cached path adds no per-request overhead.

Summary by CodeRabbit

  • New Features

    • Added a reusable query planning and execution pipeline plus a schema-aware plan cache to speed GraphQL requests.
    • Added query normalization so semantically identical queries with different literals can share cached plans.
    • Added utilities to generate and run wide/parametric queries for benchmarking.
  • Tests

    • Added benchmarks and unit tests covering planned vs uncached execution, cache behavior and normalization, normalized-plan execution, concurrency, and directive handling.
  • Chores

    • CI pipeline modernized and Go toolchain version updated.

Nthalk added 4 commits May 9, 2026 11:26
Adds a precomputed Plan tree per (schema, document, operationName)
triple. PlanQuery does the once-per-query work that Execute otherwise
repeats every request: identify the operation, collect fragments,
walk the selection tree, resolve every selection's *FieldDefinition
against the schema, pre-coerce literal arguments, and pre-compute
@include/@Skip predicates whose `if` is constant.

ExecutePlan walks the plan and only does work that's inherently
per-request: variable substitution, abstract type runtime resolution,
resolver invocation, and result materialization. Fields with all
literal arguments skip getArgumentValues entirely; sub-selections are
walked from the precomputed plan tree, never re-collected via
collectFields.

Plans are bound to the *Schema pointer they were planned against;
schema rebuilds invalidate plans (caller's responsibility).

Execute itself now flows through PlanQuery+ExecutePlan, so callers
that don't yet hold a Plan see no behavior change. Callers that issue
the same query repeatedly should hold the *Plan and pass it to
ExecutePlan to skip the per-call planning work.

Full upstream test suite passes unchanged, including:
  - TestMergesParallelFragments (multi-fragment same-key merging)
  - TestUnionInterfaceExecution (abstract type runtime resolution)
  - All TestExecutesResolveFunction* (literal + variable arg paths)
  - executor_resolve_test, executor_test, abstract_test
WideQuery_100_10:  2.54x faster, 49% fewer allocs, 39% less mem
ListQuery_1K:      ~5% faster (per-item completion dominates;
                   amortizes parse cost across 1000 items)
Adds an arged variant of the wide schema (every field takes
`value: String`) plus parametric benches:

  Planned + varied vars:    1.48 ms/op,  9.9k allocs
  Planned + static args:    1.48 ms/op,  9.9k allocs  (same)
  Uncached + varied lits:   5.55 ms/op, 28.9k allocs

3.75x faster than literal-baked queries through graphql.Do, and
varying the variable values per call has essentially zero overhead
vs static args. Confirms that real clients should always use GraphQL
variables: one cached *Plan handles arbitrary literal variations
because the plan stores the variable AST and getArgumentValues runs
at execute time only for variable-bearing args (literals are
pre-coerced at plan time).

Naive literal-baked-per-call clients pay parse + validate + plan +
execute every request. The 3.75x gap is the upper bound on what an
auto-de-var (literal-to-variable normalization) layer could recover
for those clients.
PlanCache encapsulates parse + validate + plan + LRU cache so a
gateway's hot loop is just `cache.Get + ExecutePlan`. Two modes:

  Normalize=false:  cache key = raw query string. Different literal
                    values miss; one entry per distinct query text.

  Normalize=true:   walk the parsed AST, extract fully-literal field
                    arguments into synth variables (__pcv0, __pcv1,
                    ...), key the cache by an FNV-1a fingerprint of
                    the normalized structural shape. Two queries
                    that differ only in literal values collapse to
                    one entry; the per-call literal values ride
                    along in PlanResult.SynthArgs and the caller
                    merges them into ExecutePlan's Args.

The fingerprint replaces a printer.Print key — at 100 fields the
print version cost ~5ms/op of normalize overhead and made the
normalized path slower than no cache at all. The hash drops that
to <1ms/op so the win is real.

Hot-loop benches (100-field wide schema, varied per call):

  HotLoop_NativeVars (vars in query):  1.47 ms/op,   9.9k allocs
  HotLoop_Normalized (auto-extract):   2.10 ms/op,  14.9k allocs
  HotLoop_NoNorm (literal, no rewrite):5.21 ms/op,  28.9k allocs
  Uncached graphql.Do:                 2.99 ms/op,  17.7k allocs

Native vars: 3.55x faster than naive Do(). Normalized: 2.48x
faster than naive Do(), recoups ~70% of the optimal rate without
client changes.

Scope (first cut, intentional):
  - Only field arguments are normalized (not directive args).
  - Only fully-literal arg values are extracted; mixed-variable
    structures (e.g. {a: 1, b: $foo}) stay literal.
  - Abstract-typed sub-selections (Interface/Union returns) aren't
    descended for arg normalization.
  - Fragment definitions aren't rewritten — literals inside named
    fragments stay as literals (still cached, just without
    de-duplication across literal variants).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds PlanQuery to build reusable Plans, ExecutePlan to run Plans per request, a concurrency-safe LRU PlanCache with optional normalization and per-call SynthArgs, benchutil helpers for arg-bearing wide schemas, benchmarks and unit tests, and refactors Execute to delegate to PlanQuery+ExecutePlan.

Changes

Query Planning and Caching System

Layer / File(s) Summary
Plan Type & Contracts
plan.go
Introduces Plan struct and internal plan types (selectionPlan, fieldPlan, argPlan) capturing precomputed execution graphs and skip predicates.
Query Planning
plan.go
Implements PlanQuery: validates inputs, selects operation, collects fragments, groups/merges fields by response key, plans arguments/directives, and pre-plans child selections for concrete objects and abstract types.
Plan Execution
plan.go
Implements ExecutePlan: initializes execution context, resolves variables, traverses planned selections, coerces args (static copy vs dynamic), invokes resolvers, and completes values (thunks, lists, objects, abstract resolution) with panic-to-error conversion.
Plan Cache Infrastructure
plan_cache.go
Adds PlanCache, PlanCacheOptions, PlanResult, concurrency-safe LRU storage, Get/Reset/HitsMisses, oversized-query bypass, and nil-receiver uncached behavior; caches plans and validation errors.
Document Normalization
plan_cache_normalize.go
Implements normalization: extracts literal field-argument values into synthetic variables, appends VariableDefinitions, fingerprints normalized operation via FNV-1a, and returns per-call SynthArgs.
Executor Integration
executor.go
Refactors Execute to call PlanQuery then ExecutePlan, returning formatted planning errors if planning fails and allowing callers to reuse Plan for repeated executions.
Benchmark Support
benchutil/wide_schema.go
Adds WideArgedSchemaWithXFieldsAndYItems for wide schemas with per-field value: String arguments; WideArgedSchemaQueryWithVariable and WideArgedSchemaQueryWithLiteral generate parametric and literal query variants.
Benchmarks
plan_bench_test.go
Adds benchmarks comparing planned execution, uncached execution, and PlanCache under fixed/varied args and literal normalization hot loops.
Tests
plan_cache_test.go, directives_test.go
Adds unit tests for PlanCache hit/miss behavior, normalization collapse and execution, normalization fallback, schema-pointer invalidation, concurrent Gets, Reset, and directive regression tests for @skip on fragments.
CI / Tooling
.circleci/config.yml, go.mod
Updates CircleCI job templates and images, modernizes coveralls step, and bumps module Go directive to Go 1.21.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant PlanCache
  participant PlanQuery
  participant ExecutePlan
  participant Resolver
  Client->>PlanCache: Get(schema, query, opName)
  alt cache hit
    PlanCache-->>Client: PlanResult (Plan + SynthArgs)
  else cache miss
    Client->>PlanQuery: PlanQuery(schema, doc, opName)
    PlanQuery-->>Client: Plan
    Client->>PlanCache: store PlanResult
  end
  Client->>ExecutePlan: ExecutePlan(Plan, params with Args/SynthArgs)
  ExecutePlan->>Resolver: invoke field resolvers per plan
  Resolver-->>ExecutePlan: field values
  ExecutePlan-->>Client: Result (data/errors)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

In burrows deep the planner sows,
A map of fields where data flows.
With synth-vars stitched and cache held tight,
Many queries share one crafted light. 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically names all major new API components (Plan, PlanQuery, ExecutePlan, PlanCache) added in this pull request and clarifies the core purpose (cacheable execution shape), matching the substantial changes documented in the file summaries.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
plan.go (1)

617-644: 💤 Low value

serialFields parameter is unused.

The body never branches on serialFields; only _ = serialFields keeps the compiler quiet. The mutation/query distinction is correctly handled at the top level via dethunkMapDepthFirst vs dethunkMapWithBreadthFirstTraversal in ExecutePlan (Lines 591-595), so the parameter is genuinely dead. Either drop it from the signature or wire it to the dethunker so callers can honor mutation semantics for nested invocations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plan.go` around lines 617 - 644, The serialFields parameter on
executePlannedSelection is dead (only referenced by `_ = serialFields`) — remove
the parameter from executePlannedSelection and its internal references, and
update all callers (e.g., wherever ExecutePlan calls executePlannedSelection or
helper paths that pass serialFields) to stop passing that argument; keep the
existing mutation/query traversal behavior driven by dethunkMapDepthFirst and
dethunkMapWithBreadthFirstTraversal in ExecutePlan so no logic changes are
needed beyond the signature cleanup.
plan_cache_test.go (1)

13-121: ⚡ Quick win

Tests cover the happy path well; consider adding regression coverage for known gaps.

The four tests verify the documented behavior. To strengthen confidence in the planning layer, consider adding tests for:

  • An inline fragment / fragment spread carrying @skip(if: $v) or @include(if: $v) — would catch the directive-handling regression flagged in plan.go Lines 299-352.
  • A cached *Plan re-executed after Schema rebuild — confirms schema-pointer invalidation kicks in at lookup.
  • Concurrent Get calls on the same key — sanity-checks the LRU/mutex behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plan_cache_test.go` around lines 13 - 121, Add three regression tests: (1)
TestPlanCacheFragmentDirectives — construct a query with an inline fragment and
a fragment spread that include `@skip`(if: $v) and `@include`(if: $v) directives,
call cache.Get (graphql.NewPlanCache with Normalize true/false as appropriate)
and assert the correct plan reuse/miss behavior and that synthesized args
reflect the directive variable; (2) TestPlanCacheSchemaRebuildInvalidation —
create a plan via cache.Get, then rebuild/replace the Schema (create a new
schema instance via benchutil.WideArgedSchemaWithXFieldsAndYItems), call
cache.Get again for the same query and assert the returned *Plan pointer is
different (cache invalidated) and hits/misses reflect a miss after rebuild; (3)
TestPlanCacheConcurrentGets — spawn multiple goroutines calling cache.Get
concurrently for the same query and verify no panics, consistent plan pointer
equality on hits and that HitsMisses reports expected hits (use sync.WaitGroup
and repeated Get calls); in each test use cache.Get, cache.HitsMisses,
cache.Reset, graphql.ExecutePlan and compare plan pointers and results to
validate behavior.
plan_cache.go (1)

130-176: ⚖️ Poor tradeoff

Concurrent misses on the same key duplicate work.

If N goroutines simultaneously call Get with the same uncached key, each sees a miss under lookup, drops the lock, and independently runs parser.Parse/ValidateDocument/PlanQuery; only the last store survives. For a hot endpoint, the first burst after deploy can produce N redundant plans for the same query. Consider a per-key in-flight dedupe (e.g., golang.org/x/sync/singleflight) so concurrent misses share one planning result.

Not a correctness bug — the resulting plans are equivalent — so this is a deferable optimization.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plan_cache.go` around lines 130 - 176, Concurrent cache misses cause
duplicate work; introduce a per-cache singleflight.Group to dedupe in-flight
planning for the same cache key. Add a singleflight.Group field to the cache
struct (e.g., sf or inflight) and wrap the expensive work path (the path that
runs parser.Parse/normalizeDocument/ValidateDocument/PlanQuery for the
normalized branch and the plain string-key branch if desired) inside
sf.Do(cacheKey, func() (interface{}, error) { ... }) so only one goroutine
performs parsing/validation/planning and others receive the same PlanResult;
ensure the returned value is a PlanResult (or pointer) and after sf.Do returns
attach this call's SynthArgs before returning to the caller (use the existing
functions: normalizeDocument, parser.Parse, ValidateDocument, PlanQuery,
c.lookup, c.store). Handle errors by returning PlanResult with Errors through
the singleflight result and still store the result in c.store inside the
singleflight body.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plan_cache_normalize.go`:
- Around line 66-71: When op == nil or foundOps > 1 && operationName == "" we
currently return an empty cacheKey which causes unrelated queries to collide;
change those return branches to produce a deterministic fingerprint of the
original document instead of "" (e.g. hash the raw doc bytes/string with sha256
and return a stable encoded string like "raw:<hex>") so cacheKey is unique per
input doc; update the same return sites that currently do return doc, nil, "",
nil to return doc, nil, "<fingerprint>", nil (keep existing return shapes) and
ensure whatever format you choose won't collide with normal operation names when
combined in plan_cache (where cacheKey is concatenated with operationName).

In `@plan_cache.go`:
- Around line 1-11: The code uses sync/atomic.Uint64 (new in Go 1.19) while the
module declares go 1.13; either bump the module go directive to 1.19 or revert
to plain uint64 counters and function-style atomics: replace atomic.Uint64
fields (e.g., the counter fields in the plan cache struct) with uint64, update
all uses to atomic.AddUint64/atomic.LoadUint64/atomic.StoreUint64, and ensure
those uint64 fields are placed at the end of the struct (after mutex/maps) to
preserve 8-byte alignment on 32-bit platforms; pick one approach and apply it
consistently across functions that reference those atomic counters (methods that
currently call .Add/.Load/.Store).

In `@plan.go`:
- Around line 299-352: The code drops a non-nil predicate from planDirectives
for InlineFragment and FragmentSpread, causing variable-driven `@skip/`@include on
fragments to be ignored; fix by threading that pred into the recursive planning
so emitted child plans inherit it: change collectInto (and any helper that
creates fieldPlan.skipPredicate) to accept an optional parent predicate and,
when a fragment's pred != nil, pass it down and compose it with any child's own
skipPredicate using logical-AND (i.e., childPred = and(parentPred, childPred));
apply this threading/composition in both the *ast.InlineFragment and
*ast.FragmentSpread branches (where _ = pred is now used) so fragment-level
predicates are respected.
- Around line 535-585: ExecutePlan currently uses p.Schema for variable coercion
and in executionContext.Schema which can mismatch the schema the Plan was
created against (Plan.schema); change ExecutePlan to use the plan-bound schema
(dereference plan.schema) for getVariableValues and set executionContext.Schema
(or alternatively validate p.Schema equals plan.schema and return an error on
mismatch). Update the calls that reference the schema: replace p.Schema passed
to getVariableValues and the executionContext.Schema assignment so they use
*plan.schema (or add an equality check between p.Schema and plan.schema and
error if different) so variable coercion and abstract-type resolution use the
plan's bound schema.

---

Nitpick comments:
In `@plan_cache_test.go`:
- Around line 13-121: Add three regression tests: (1)
TestPlanCacheFragmentDirectives — construct a query with an inline fragment and
a fragment spread that include `@skip`(if: $v) and `@include`(if: $v) directives,
call cache.Get (graphql.NewPlanCache with Normalize true/false as appropriate)
and assert the correct plan reuse/miss behavior and that synthesized args
reflect the directive variable; (2) TestPlanCacheSchemaRebuildInvalidation —
create a plan via cache.Get, then rebuild/replace the Schema (create a new
schema instance via benchutil.WideArgedSchemaWithXFieldsAndYItems), call
cache.Get again for the same query and assert the returned *Plan pointer is
different (cache invalidated) and hits/misses reflect a miss after rebuild; (3)
TestPlanCacheConcurrentGets — spawn multiple goroutines calling cache.Get
concurrently for the same query and verify no panics, consistent plan pointer
equality on hits and that HitsMisses reports expected hits (use sync.WaitGroup
and repeated Get calls); in each test use cache.Get, cache.HitsMisses,
cache.Reset, graphql.ExecutePlan and compare plan pointers and results to
validate behavior.

In `@plan_cache.go`:
- Around line 130-176: Concurrent cache misses cause duplicate work; introduce a
per-cache singleflight.Group to dedupe in-flight planning for the same cache
key. Add a singleflight.Group field to the cache struct (e.g., sf or inflight)
and wrap the expensive work path (the path that runs
parser.Parse/normalizeDocument/ValidateDocument/PlanQuery for the normalized
branch and the plain string-key branch if desired) inside sf.Do(cacheKey, func()
(interface{}, error) { ... }) so only one goroutine performs
parsing/validation/planning and others receive the same PlanResult; ensure the
returned value is a PlanResult (or pointer) and after sf.Do returns attach this
call's SynthArgs before returning to the caller (use the existing functions:
normalizeDocument, parser.Parse, ValidateDocument, PlanQuery, c.lookup,
c.store). Handle errors by returning PlanResult with Errors through the
singleflight result and still store the result in c.store inside the
singleflight body.

In `@plan.go`:
- Around line 617-644: The serialFields parameter on executePlannedSelection is
dead (only referenced by `_ = serialFields`) — remove the parameter from
executePlannedSelection and its internal references, and update all callers
(e.g., wherever ExecutePlan calls executePlannedSelection or helper paths that
pass serialFields) to stop passing that argument; keep the existing
mutation/query traversal behavior driven by dethunkMapDepthFirst and
dethunkMapWithBreadthFirstTraversal in ExecutePlan so no logic changes are
needed beyond the signature cleanup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6c095cdc-83ef-4e83-a6aa-734408bfdca7

📥 Commits

Reviewing files that changed from the base of the PR and between d5f8f3c and c31e90d.

📒 Files selected for processing (7)
  • benchutil/wide_schema.go
  • executor.go
  • plan.go
  • plan_bench_test.go
  • plan_cache.go
  • plan_cache_normalize.go
  • plan_cache_test.go

Comment thread plan_cache_normalize.go
Comment thread plan_cache.go
Comment thread plan.go
Comment thread plan.go Outdated
atomic.Uint64 (used in PlanCache) needs Go 1.19+. Raise go.mod to
1.21 and refresh CI: drop the 1.8/1.9/1.11 jobs and the
test_without_go_modules path, switch to cimg/go images at 1.21 and
1.26, and replace `go get goveralls` with `go install …@latest`.
@coveralls
Copy link
Copy Markdown

coveralls commented May 9, 2026

Coverage Status

coverage: 85.967% (-6.3%) from 92.284% — IodeSystems:plan-cache into graphql-go:master

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.circleci/config.yml (1)

25-25: ⚡ Quick win

Pin goveralls to a specific version for reproducible CI.

On Line 25, @latest can change underneath you and make CI flaky unexpectedly. Pin to a concrete release version instead.

Suggested fix
-      - run: go install github.com/mattn/goveralls@latest
+      - run: go install github.com/mattn/goveralls@v0.0.12
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.circleci/config.yml at line 25, Replace the dynamic install line that uses
`@latest` with a pinned release for reproducible CI: update the run command "go
install github.com/mattn/goveralls@latest" to use a concrete version tag (e.g.
`@vX.Y.Z`) in .circleci/config.yml, so the CI always installs the same goveralls
release; pick the desired stable tag and update the command accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.circleci/config.yml:
- Line 26: The CI test step currently runs "run: go test -v -cover -race
-coverprofile=coverage.out" which only tests the repo root; update this run
command to include the ./... package pattern (e.g. use "go test ./... -v -cover
-race -coverprofile=coverage.out") so tests and coverage include all subpackages
like language/, gqlerrors/, testutil/, examples/; modify the run entry that
contains the existing go test line accordingly.

---

Nitpick comments:
In @.circleci/config.yml:
- Line 25: Replace the dynamic install line that uses `@latest` with a pinned
release for reproducible CI: update the run command "go install
github.com/mattn/goveralls@latest" to use a concrete version tag (e.g. `@vX.Y.Z`)
in .circleci/config.yml, so the CI always installs the same goveralls release;
pick the desired stable tag and update the command accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04ce6a61-b1d2-4c94-97a3-779800d8975c

📥 Commits

Reviewing files that changed from the base of the PR and between c31e90d and 8c73931.

📒 Files selected for processing (2)
  • .circleci/config.yml
  • go.mod
✅ Files skipped from review due to trivial changes (1)
  • go.mod

Comment thread .circleci/config.yml Outdated
Nthalk added 4 commits May 9, 2026 13:00
planDirectives returns a non-nil predicate for variable-driven
@Skip(if:$v)/@include(if:$v) on inline fragments and fragment spreads.
collectInto used to discard that predicate (`_ = pred`) and walk the
fragment children unconditionally, so fragments gated on a runtime
variable were always emitted regardless of the variable's value. Since
Execute now routes through PlanQuery+ExecutePlan, this surfaced as a
spec-violating behavior change in the public Execute API.

Thread an enclosing predicate through collectInto and AND-compose it
with each new field's own skipPredicate. nil short-circuits to no
allocation in the common case (no enclosing fragment gate, no field
directive). Adds three regression tests covering the inline-fragment
and fragment-spread paths with variable-driven @Skip.
When normalizeDocument can't analyse the operation (op-not-found,
ambiguous multi-op without operationName), it returns an empty
normKey. Get builds cacheKey as `operationName + "\x00" + normKey`,
so unrelated documents with the same (or empty) operationName all
collapse to the same slot — and any cached parse/validate/plan
errors from the first such request bleed into every subsequent one.

When normKey is empty, fall back to a 64-bit FNV-1a fingerprint of
the raw query string, prefixed with `raw:` to keep it disjoint from
normalized fingerprints. Adds a regression test asserting two
distinct ambiguous-multi-op queries miss independently and then hit
their own slots on re-fetch.
Plan documents itself as bound to plan.schema — sub-plans,
abstractAlternatives, and field defs are all keyed off the *Object
pointers from the plan-time schema. ExecutePlan was reading p.Schema
for getVariableValues and the executionContext, so a caller passing
an ExecuteParams.Schema that diverged from the planned schema (e.g.
a rebuilt schema with the same shape but different *Object
pointers, different IsPossibleType set, or different extensions)
would silently coerce variables and resolve abstract types against
the wrong schema while sub-plans still ran against the original.

Use *plan.schema for both, so the binding documented at the top of
Plan is actually enforced.
Without ./... only the root package is exercised, which skips
language/, gqlerrors/, testutil/, etc.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
directives_test.go (1)

430-451: ⚡ Quick win

LGTM! Good regression coverage for variable-driven directives on fragment spreads.

The test correctly verifies that @skip(if: $s) with a variable set to true omits the fragment spread.

For completeness, you might consider adding a complementary test TestDirectivesWorksOnFragmentSpreadSkipVariableFalseIncludes to verify the s=false case (similar to how inline fragments have both test cases), but it's not essential since the behavior mirrors the already-tested inline fragment pattern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@directives_test.go` around lines 430 - 451, Add a complementary test named
TestDirectivesWorksOnFragmentSpreadSkipVariableFalseIncludes that mirrors
TestDirectivesWorksOnFragmentSpreadSkipVariableTrueOmits: parse the same query
(with fragment Frag and spread `@skip`(if: $s)), call testutil.TestExecute with
directivesTestSchema, directivesTestData and Args map setting "s": false, and
assert the Result includes both "a" and fragment field "b" (expected Data map
should contain "a" and "b"); use the same helpers TestParse, TestExecute,
testutil.EqualResults/testutil.Diff for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@directives_test.go`:
- Around line 430-451: Add a complementary test named
TestDirectivesWorksOnFragmentSpreadSkipVariableFalseIncludes that mirrors
TestDirectivesWorksOnFragmentSpreadSkipVariableTrueOmits: parse the same query
(with fragment Frag and spread `@skip`(if: $s)), call testutil.TestExecute with
directivesTestSchema, directivesTestData and Args map setting "s": false, and
assert the Result includes both "a" and fragment field "b" (expected Data map
should contain "a" and "b"); use the same helpers TestParse, TestExecute,
testutil.EqualResults/testutil.Diff for consistency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ad3b0c77-3794-4a67-8fdd-4513b0ed4dc9

📥 Commits

Reviewing files that changed from the base of the PR and between 8c73931 and d02d1b9.

📒 Files selected for processing (4)
  • directives_test.go
  • plan.go
  • plan_cache.go
  • plan_cache_test.go
✅ Files skipped from review due to trivial changes (1)
  • plan_cache_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • plan.go
  • plan_cache.go

Nthalk added 2 commits May 9, 2026 13:15
The parameter was a passthrough — only `_ = serialFields` referenced it.
Mutation vs. query traversal is handled at the top level of ExecutePlan
via dethunkMapDepthFirst / dethunkMapWithBreadthFirstTraversal, so the
walker doesn't need it.
TestPlanCacheSchemaRebuildInvalidation verifies the schema-pointer
guard: a structurally identical rebuilt *Schema must not collide on
the cached plan from the original.

TestPlanCacheConcurrentGets hammers a single key from many goroutines
to exercise the LRU/mutex paths under -race. We don't yet dedupe
in-flight planning across concurrent misses (that would want
singleflight), so this asserts safety + result consistency rather
than miss count.
@Nthalk
Copy link
Copy Markdown
Author

Nthalk commented May 9, 2026

CodeRabbit review follow-up

Actionable comments — all addressed:

File / line Fix
plan_cache_normalize.go:71 (empty cacheKey collision) 783fe2b
plan_cache.go:13 (atomic.Uint64 vs Go 1.13 floor) 8c73931
plan.go:343 (variable-driven @skip/@include on fragments dropped) da4b3f8
plan.go:585 (ExecutePlan ignored plan-bound schema) d02d1b9
.circleci/config.yml:26 (./... missing on coveralls test) 659c616

Nitpicks:

  • Dead serialFields parameter on executePlannedSelection removed in 1c17d41.
  • Additional plan-cache regression tests in a11569c (TestPlanCacheSchemaRebuildInvalidation, TestPlanCacheConcurrentGets); fragment-directive coverage came in alongside the fix in da4b3f8.

Deferred to follow-up:

  • Singleflight dedupe for concurrent misses on the same key. Not a correctness issue (concurrent planners produce equivalent plans, only the last store survives) — worth doing as a separate PR so the optimization can be benchmarked on its own.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
plan_cache_test.go (1)

166-169: ⚡ Quick win

Add explicit success checks after cache.Get before plan-pointer assertions.

At Line 166 and Line 218 (also similarly at Line 41), the test can pass pointer assertions even if Get regresses to returning errors or a nil plan. Add a success guard first so failures are surfaced directly.

Proposed patch
@@
 	r3 := cache.Get(&schema, `{ wide { a(value: "y") } }`, "")
+	if len(r3.Errors) > 0 || r3.Plan == nil {
+		t.Fatalf("third call: %v", r3.Errors)
+	}
 	if r1.Plan == r3.Plan {
 		t.Fatalf("un-normalized: different literals must miss the cache")
 	}
@@
 	r1b := cache.Get(&s1, q, "")
+	if len(r1b.Errors) > 0 || r1b.Plan == nil {
+		t.Fatalf("re-lookup against s1: %v", r1b.Errors)
+	}
 	if r1b.Plan == r2.Plan {
 		t.Fatalf("post-rebuild lookup against s1 returned the s2-bound plan")
 	}
@@
 	r := cache.Get(&schema, q, "")
+	if len(r.Errors) > 0 || r.Plan == nil {
+		t.Fatalf("post-Reset Get: %v", r.Errors)
+	}
 	_, misses := cache.HitsMisses()
 	if misses != 2 {
 		t.Fatalf("expected 2 misses after Reset, got %d", misses)
 	}
-	if r.Plan == nil {
-		t.Fatalf("expected new plan after Reset, got nil")
-	}

Also applies to: 218-225, 41-44

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plan_cache_test.go` around lines 166 - 169, After each call to cache.Get
(e.g., the calls that produce r1b at the s1 lookup and the similar calls
producing r2/r1 in other sections), add an explicit check that the returned
result is non-nil and contains a non-nil Plan and that err==nil before asserting
pointer equality/inequality; for example, verify r1b != nil, r1b.Plan != nil and
err == nil (or use t.Fatalf/t.Fatalf-style failures) immediately after cache.Get
so pointer assertions like r1b.Plan == r2.Plan cannot pass when Get returned an
error or nil plan.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@plan_cache_test.go`:
- Around line 166-169: After each call to cache.Get (e.g., the calls that
produce r1b at the s1 lookup and the similar calls producing r2/r1 in other
sections), add an explicit check that the returned result is non-nil and
contains a non-nil Plan and that err==nil before asserting pointer
equality/inequality; for example, verify r1b != nil, r1b.Plan != nil and err ==
nil (or use t.Fatalf/t.Fatalf-style failures) immediately after cache.Get so
pointer assertions like r1b.Plan == r2.Plan cannot pass when Get returned an
error or nil plan.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90c19461-fd96-4fde-b086-71ed8004928e

📥 Commits

Reviewing files that changed from the base of the PR and between d02d1b9 and a11569c.

📒 Files selected for processing (3)
  • .circleci/config.yml
  • plan.go
  • plan_cache_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • .circleci/config.yml
  • plan.go

Nthalk added 6 commits May 9, 2026 14:10
The earlier `./...` change (659c616, applied per a CodeRabbit nitpick
that framed it as restoring "incomplete coverage") actually dropped
coveralls from ~92% → 67.9%: with plain ./... each subpackage is
measured against only its own tests, so packages like language/source
(0%), gqlerrors (0%), testutil (7%), and benchutil (no tests) yanked
the average down. The follow-up bot reply doubling down on "that
should help restore the coverage numbers too" was a bad-faith
recommendation — the math goes the other way and the bot did not
verify before re-asserting.

Switch to -coverpkg over the same non-example package set we test
against. Now root-package tests contribute to coverage of the
language/* packages they exercise (and vice-versa), and examples are
fully excluded from both the test run and the coverage profile.
Local measurement: 82.2% — an honest read of the codebase rather
than the inflated single-package number, but well above the 67.9%
left by the prior change.
benchutil's code only runs under `go test -bench=.`, so including
it in the coverage profile measures it against test runs that don't
exercise it. Local total moves 82.2% → 83.2%.
Execute now routes unconditionally through PlanQuery + ExecutePlan
(see plan.go), so the entire pre-refactor execution machinery in
executor.go is unreachable. Delete it:

  executeOperation / executeOperationParams
  executeFieldsSerially / executeFields / executeSubFields / executeFieldsParams
  resolveField / resolveFieldResultState
  completeValueCatchingError / completeValue
  completeThunkValueCatchingError
  completeAbstractValue / completeObjectValue / completeListValue
  orderedField / orderedFields

Each has a 1:1 plan-based replacement in plan.go (executePlannedSelection,
resolvePlannedField, completePlannedValue family, etc.); plan.fields
is source-ordered at plan time so orderedFields is no longer needed.

Kept (still has live callers):
  Execute, buildExecutionContext, collectFields, getFieldDef,
  handleFieldError, DefaultResolveFn, defaultResolveTypeFn,
  completeLeafValue, getOperationRootType, dethunk* helpers,
  shouldIncludeNode, doesFragmentConditionMatch, getFieldEntryKey.

Subscriptions still go through buildExecutionContext + collectFields +
manual resolver invocation in subscription.go; that path is untouched.

-450 lines. Root-package coverage: 84.2% → 88.4%.
testutil is a test-helper package by design — measuring its
coverage at all is a category mistake.

language/ast is dominated by ~180 trivial interface-compliance
stubs (GetLoc / GetSelectionSet / GetOperation that return defaults
on AST node types that don't have those concepts). They're public
API — users can navigate the AST — but virtually none are called
directly from any test path, so including them measures boilerplate
rather than behavior. Coverage moves 82.8% → 87.8%.
Custom test schema with Int / Float / Bool / Enum / List / NonNull /
InputObject args + an interface and a fragment-spreadable object,
driving:

  - writeValue branches (Int/Float/Bool/Enum/List/Object) via
    list-with-nested-variable values that bypass tryExtract
  - writeFragmentBody both miss and visited-cache paths
  - normalizeSelectionSet's InlineFragment branch
  - writeType's NonNull and List arms via typed query variables
  - tryExtract's coercion path against varied scalar / input types

plan_cache_normalize coverage 60.6% -> ~80%, total 87.8% -> 88.7%.
resolvePlannedField unconditionally called handleExtensionsResolveFieldDidStart,
which allocates a map[string]ResolveFieldFinishFunc + a closure on
every resolved field — even when the schema has zero extensions
registered. With ~1000 fields per request that's 2000 wasted allocs
per request on the common no-extensions case.

Gate the call (and the matching finish-handler invocation) behind
`len(eCtx.Schema.extensions) > 0`.

Wide-query bench (100 fields × 10 items):
  before: 1.15ms / 9043 allocs/op
  after:  0.99ms / 7040 allocs/op   (-14% time, -22% allocs)

Hot-loop with native variables:
  before: 1.44ms / 9888 allocs/op
  after:  1.30ms / 7884 allocs/op   (-10% time, -20% allocs)

Schemas with extensions registered take the same path as before.
@Nthalk
Copy link
Copy Markdown
Author

Nthalk commented May 12, 2026

Seeing as the last pr merge was 5 months ago, I've forked this project, pulled in most of the common sense PRS and bug fixes, and have achieved a 20x speedup on certain workloads with various allocation and CPU reduction strategies.

Good luck with mainline! It would be nice to merge it in, but I don't see myself being to invested in my non fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants