Skip to content

(fix + perf): Recover Cubic PeriodicBC oneshot regression#143

Merged
mgyoo86 merged 16 commits into
masterfrom
fix/cubic_periodic_regression
May 19, 2026
Merged

(fix + perf): Recover Cubic PeriodicBC oneshot regression#143
mgyoo86 merged 16 commits into
masterfrom
fix/cubic_periodic_regression

Conversation

@mgyoo86
Copy link
Copy Markdown
Member

@mgyoo86 mgyoo86 commented May 18, 2026

Summary

This PR addresses the performance regression observed in Cubic PeriodicBC oneshot evaluation on the master branch. The regression was traced to per-iteration evaluation-kernel costs that dominate at larger query sizes.

Function barriers and type stability improvements fully recover performance for :inclusive periodic boundary conditions and halve the regression for :exclusive conditions. The remaining gap in :exclusive evaluations is tied to _get_h indirection through _ExclusivePeriodicAxis and is deferred to a future update.

Additionally, a hint propagation bug at seam boundaries is fixed, and the scalar cubic_interp! method is removed for consistency.

Size: 15 commits, 32 files, +321 / −591.

Implementation

  • 1D Batch Loop Function Barriers: _check_domain for boundary-extrapolating interpolations (Clamp/Fill/Wrap) returns a Union{InBounds, E}. Processing this union inline caused a fallback to dynamic dispatch per iteration. All 1D batch loops (Cubic, Linear, Quadratic, Hermite-family) are now split into outer and inner components. The outer loop resolves the Union once, restoring full type stability in the inner loop.
  • Searcher Simplification: The BC type parameter was removed from Searcher, changing it to Searcher{P,H}. Periodic seam handling is now entirely self-contained within _ExclusivePeriodicAxis, eliminating redundant compiler specializations.
  • WrapExtrap Minor Cleanup:
    • Merged _wrap_to_domain overloads into a single definition using _extract_primal.
    • Hoisted an unnecessary _check_domain block out of the Cubic periodic scalar entry path.
    • Removed stale WrapExtrap comments and outdated legacy constructor calls.

Correctness Fixes

  • Hint Propagation Bug: Fixed a logic error where search_interval(::Searcher, ::_ExclusivePeriodicAxis, xq) short-circuited at the seam but failed to update the search hint. A new _write_hint! helper ensures the hint is correctly updated without stalling at stale indices during monotonic streaming.

Breaking Changes

  • Removed Scalar In-place cubic_interp!: The scalar signature cubic_interp!(output, x_or_cache, y, xq::Real) that mutated a length-1 buffer was undocumented and inconsistent with other 1D methods. Users should rely on the allocating scalar entry point v = cubic_interp(x, y, 0.5) instead.

Performance

Measured on M1 Pro, Julia 1.12.6.

Loop-dominated regime, nq = 1000 (min μs):

Method Grid v0.4.9 master this PR Δ vs v0.4.9
PeriodicBC(:inclusive) Range 3.12 3.23 3.07 −2% (parity)
PeriodicBC(:inclusive) Vector 2.83 3.12 2.87 +1% (parity)
PeriodicBC(:exclusive) Range 3.12 3.85 3.53 +13%
PeriodicBC(:exclusive) Vector 2.87 4.29 3.76 +31%
NoBC (control) - 2.98 2.99 3.04 flat

Setup-dominated regime, nq = 20 (min ns):
Confirms no significant setup overhead.

Method Grid v0.4.9 master this PR Δ vs v0.4.9
PeriodicBC(:inclusive) Range 555 573 563 +1%
PeriodicBC(:inclusive) Vector 645 655 655 +2%
PeriodicBC(:exclusive) Range 579 571 572 −1%
PeriodicBC(:exclusive) Vector 701 667 668 −5%
NoBC (control) - 491 494 494 +1%

mgyoo86 added 15 commits May 17, 2026 20:52
…ad helpers

Three cleanups landing together in the Cubic 1D periodic / oneshot path,
each removing redundant indirection without behavior change:

1. `cubic_interp` (scalar) and `cubic_interp!` (vector batch) no longer
   thread `bc` into `_resolve_search`. Search at eval time runs against
   `cache.x` (the cache pool's wrapped axis form), so the seam fires via
   `periodic_axis.jl` axis-level dispatch that reads `g.period` directly
   — `Searcher.bc.period` is unused in this path. Flips Cubic's Searcher
   across all BCs to `Searcher{P,H,NoBC}`, shrinking specialization tree.

2. Drop the duplicate `_check_periodic_endpoints(bc, y)` call from
   `_cubic_periodic_solve!`. `_resolve_data(y, bc)` already performs the
   same `:inclusive` `y[1] ≈ y[end]` validation two lines later.

3. Inline `z_workspace[n+1] = z_workspace[1]` at the periodic seam in
   place of `_finalize_z_periodic_seam!` (3 identical method bodies for
   `:inclusive` / `:exclusive` / `:extended`).

Tier-1 unit tests (test_periodic_bc, test_wrapextrap_closed_boundary,
test_cubic_anchor, test_cubic_adjoint, test_1d_bc_consistency,
test_complex_cubic, test_1d_adjoint_periodic) all pass. Perf-neutral on
fresh-process per-case benches (the regression vs v0.4.9 has another
root cause — under investigation).
…1 of B-full)

21 call sites across 8 1D method oneshot files: the 5-arg
`_resolve_search(grid, q, search, hint, bc)` form is replaced by the
4-arg form. In every case below, the bc threaded into the Searcher
type-param is not consulted at search time:

 - axis is wrapped (`_resolve_axis(x, bc)`) so seam handling already
   fires via the axis-level dispatch in `periodic_axis.jl` that reads
   `g.period` directly (Cardinal/Akima/Pchip OnTheFly, Linear/Constant
   1D, Linear/Constant Series in-bounds entry).
 - axis is the extended-cycle output of `_periodic_extend_1d` and bc
   is `:extended` (Cardinal/Akima/Pchip PreCompute periodic) — never
   the `:exclusive` variant that `search.jl:879` dispatches on.
 - `cache.x` is wrapped from the cache pool (cubic_oneshot_series
   PreCompute periodic).

Net effect: every Searcher in these paths becomes
`Searcher{P, H, NoBC}`, sharing one specialization tree. No semantic
change. Tier-1 tests pass: test_periodic_bc, test_akima_periodic,
test_cardinal_periodic, test_constant_periodic, test_1d_bc_consistency,
test_wrapextrap_closed_boundary.

This is Phase 1 of a 3-phase cleanup to remove the BC type-param from
Searcher entirely. Phase 2 migrates Series oneshot raw-x periodic
entries to the axis-wrap pattern. Phase 3 drops the BC type-param from
the Searcher struct and removes the now-dead `<:PeriodicBC{:exclusive}`
dispatch on `AbstractVector` in search.jl.
…rch (Phase 2 of B-full)

Six Series oneshot entries (linear/constant/cubic scalar series, two
entries each) plus the ND `_search_axis_stencil` helper no longer pass
`bc` into `_resolve_search`. In every case the search target downstream
is a wrapped axis (`_resolve_axis(x, bc)` inside the periodic helper
for Linear/Constant, `cache.x` from the cache pool for Cubic, or
`grids_eff = map(_resolve_axis, grids, bcs)` for ND), so seam handling
already fires via the axis-level dispatch in `periodic_axis.jl`. The
`Searcher.bc.period` field was never consulted on these paths.

After this commit, no caller in the codebase invokes the 5-arg
`_resolve_search(grid, q, search, hint, bc)` form. The form itself
(and the BC type-param on `Searcher`) is removed in Phase 3.

`_search_axis_stencil` keeps its `bc` arg in the signature (renamed
`_bc` to mark unused) for caller compatibility; the signature itself
is collapsed in Phase 3 alongside `_search_all_intervals_stencil`'s
`bcs` parameter.

Tier-1 tests pass: test_periodic_bc, test_complex_cubic_series,
test_1d_bc_consistency.
After Phases 1 & 2, no caller threads `bc` into `_resolve_search` or
into `_to_searcher`. With the only consumer of `Searcher.bc.period`
gone, the BC type-param can be removed from the Searcher itself:

  Searcher{P, H, BC}  →  Searcher{P, H}

Net removal across `src/core/search.jl`, `src/core/periodic_axis.jl`,
`src/core/nd_utils.jl`, and the two ND oneshot files / hetero callers:

 - `Searcher`'s `bc::BC` field, `BC` type-param, and BC-aware outer
   ctors (`Searcher{P, H}(hint, bc::BC)` collapsed to single ctor).
 - `_to_searcher(...)` 6 method bodies — `bc` arg dropped.
 - `_resolve_search(grid, q, search, hint, bc)` 5-arg form — `bc`
   was unused; only the 4-arg form remains. `_resolve_searcher_for_grid`
   loses its BC-preservation branch.
 - `search_interval(::Searcher{...,<:PeriodicBC{:exclusive}}, ::AbstractVector, ::Real)`
   dispatch (search.jl) and the equivalent `<:PeriodicBC{:exclusive}` axis
   override (periodic_axis.jl) — both dead, collapsed into the generic
   `Searcher`/`AbstractVector` (or `_ExclusivePeriodicAxis`) variants.
 - `_writeback_seam_hint` — was used only inside the now-removed
   `:exclusive` AbstractVector seam fast path.
 - `_search_axis_stencil(grid, q, search, hint, _bc)` and
   `_search_all_intervals_stencil(..., bcs)` lose their unused `bcs`
   parameter. Callers in `linear_nd_oneshot.jl`, `constant_nd_oneshot.jl`,
   `hetero_eval.jl`, `hetero_oneshot.jl` drop their `nobcs = ntuple(_ -> NoBC(), Val(N))`
   / `bcs = map(_bc_for_periodic_check, …)` plumbing where the value was
   only used here.

`_resolve_bc_period` (in search.jl) is **retained** — it is still used by
`cached_vector.jl` and `cached_range.jl` to normalize unresolved periods
when constructing wrapped axes. Its role here was for the Searcher
seam field; now it is purely an axis-construction helper.

Net effect: ~140 line reduction. Tier-1 tests pass: test_periodic_bc,
test_wrapextrap_closed_boundary, test_1d_bc_consistency,
test_cubic_anchor, test_complex_cubic_series, test_1d_adjoint_periodic,
test_hetero_oneshot, test_cubic_nd_oneshot.
Reintroduce the periodic specialization of `_cubic_vector_loop!` that was
dropped during the WrapExtrap closed-domain refactor. The generic loop
dispatches per query to `_eval_cubic_at_point(..., ::WrapExtrap)`, which
calls `_wrap_to_domain` on every query even when all queries are in
domain — a noticeable overhead on batches where the wrap branch never
fires.

The periodic specialization hoists the domain check once: compute
`(qmin, qmax)` once and compare against the closed wrap domain
`[first(cache.x), last(cache.x)]`. When all queries are in domain,
dispatch to `_eval_cubic_at_point(..., ::InBounds, …)` and skip the
`_wrap_to_domain` branch entirely. Only the slow path pays per-query
wrap cost. Matches v0.4.9's behavior.

Measured on `scripts/inclusive_vs_cubicfit.jl` (fresh-process per case,
n=100, nq=1000, all in-bounds):

  Vector :inclusive  oneshot  3412 → 3068 ns  (~-10%, +7.6% vs v0.4.9)
  Vector :exclusive  oneshot  4446 → 3854 ns  (~-13%, +35% vs v0.4.9)
  Range  :inclusive  oneshot  3448 → 3234 ns  (~-6%,  +7%   vs v0.4.9)
  Range  :exclusive  oneshot  3964 → 3682 ns  (~-7%)
  CubicFit (Vector / Range)   unchanged (control)

Closes most of the `:inclusive` regression that prompted the
investigation (`:exclusive`'s residual gap stems from `_get_h`'s
seam-aware wrapping — structural, not addressed here).

Tier-1 tests pass: test_periodic_bc, test_wrapextrap_closed_boundary,
test_cubic_anchor, test_1d_bc_consistency. Method-ambiguity-free
against the generic `_cubic_vector_loop!` (matched `X<:AbstractVector{Tg}`
bound + `::E where E<:AbstractExtrap` shape on both methods).
…iodic spec

Replace the periodic specialization of `_cubic_vector_loop!` (added in
25b926e) with a function-barrier pattern in the single generic loop.
Both achieve the same effect — guarantee that the per-query inner loop
sees a concrete `ev` type — but the function barrier is more general
and reuses `_check_domain`'s existing in-bounds promotion (no manual
`qmin/qmax` check needed).

Mechanism:
  _cubic_vector_loop!(...)  # outer (caller-facing)
    ev_eff = _check_domain(cache.x, x_query, ev)  # Union{InBounds, E}
    return _cubic_vector_loop_inner!(..., ev_eff, ...)   # call-site union split

  _cubic_vector_loop_inner!(..., ev::E, ...) where E    # inner (concrete ev)
    @inbounds for k in eachindex(...)
        output[k] = _eval_cubic_at_point(cache.x, y, z, x_query[k], ev, op, searcher)
    end

Julia splits the Union at the function-call site (dispatch-level, not
loop-hoisting-level) — robust regardless of inner kernel size. The
previous all-in-one generic loop kept `ev::Union` flowing through the
per-query body, where LLVM's loop-hoisting heuristic gave up on the
union split (Cubic kernel size exceeds the threshold), producing
per-iteration dispatch overhead.

Ablation measurements (fresh-process per case, `inclusive_vs_cubicfit.jl`,
median of 5 trials, n=100, nq=1000, all in-bounds):

                       pre-fix  H1 spec  H1+barrier  barrier only
  Vector_CubicFit       2884    2931     2898        2866
  Vector_Inc            3412    3057     3005        3062
  Vector_Exc            4446    3896     3891        3901
  Range_CubicFit        3148    3151     3141        3141
  Range_Inc             3448    3167     3198        3224
  Range_Exc             3964    3672     3641        3588

Function barrier alone (last column) matches the H1-specialization
result within trial noise (~50-100 ns) — confirms the H1 spec is
redundant given the barrier pattern. Removing the specialization
collapses two methods into one + one inner, net -10 lines.

Tier-1 tests pass: test_periodic_bc, test_wrapextrap_closed_boundary,
test_cubic_anchor, test_1d_bc_consistency, test_complex_cubic_series,
test_cubic_nd_oneshot.
…apExtrap spec

Same pattern as a74893c (Cubic batch loop unification): replace the
`_linear_interp_loop!(::WrapExtrap)` 2-stage specialization with a
function-barrier in the single generic loop. `_check_domain`'s existing
in-bounds promotion (`Union{InBounds, E}`) flows through a function call
where Julia splits it at the dispatch site — the inner kernel sees a
concrete `extrap` type.

Mechanism:
  _linear_interp_loop!(...)  # outer
    extrap_eff = _check_domain(x, x_targets, extrap)  # Union{InBounds, E}
    return _linear_interp_loop_inner!(..., extrap_eff, ...)  # union split here

  _linear_interp_loop_inner!(..., extrap::E, ...) where E
    @inbounds for i in eachindex(...)
        output[i] = _linear_eval_at_point(x, y, x_targets[i], extrap, op, searcher)
    end

Ablation measurements (3 trials median, n=100, nq=1000):

                       A (with spec)  B (barrier only)  Δ
  Vector_Wrap_inbounds    1388              1417        +2.1% (borderline)
  Range_Wrap_inbounds     1212              1183        -2.4% (noise)
  Vector_Wrap_OOB         3932              3656        -7.0% ✓
  Range_Wrap_OOB          3536              3417        -3.4%

B is equal-or-better on 3 of 4 cases (OOB notably faster) — the OOB gain
likely from `_linear_eval_at_point(::WrapExtrap)`'s `xq → _wrap_to_domain
→ InBounds` chain optimizing better than the spec's manual `_wrap_to_domain`
+ `ExtendExtrap` dispatch (`ExtendExtrap` re-checks boundary per call).
Net -23 LOC, simpler dispatch table.

Tier-1 tests pass: test_periodic_bc, test_wrapextrap_closed_boundary,
test_1d_bc_consistency.
…rdinal/Akima/Pchip)

Same pattern as a74893c (Cubic) and f04d927 (Linear): outer
`_hermite_vector_loop!` resolves `_check_domain`, inner kernel sees a
concrete `extrap` type. Applied to both variants:

  1. pre-baked slopes: `_hermite_vector_loop!(..., dy::AbstractVector, ...)`
  2. on-the-fly slopes: `_hermite_vector_loop!(..., sm::AbstractSlopeMethod, ...)`

Both are shared by Cardinal/Akima/Pchip 1D oneshot+persistent paths.

Measurements (single-trial, n=100, nq=1000):

                              pre     post    Δ
  Cardinal_Wrap_inbounds      2713    2556    -5.8%
  Akima_Wrap_inbounds         2833    2625    -7.3%
  Pchip_Wrap_inbounds         3010    2810    -6.6%
  Cardinal_Wrap_OOB           4970    4958    -0.2% (noise)

Consistent 6-7% improvement on the in-bounds WrapExtrap fast path —
matches the union-splitting hypothesis (LLVM was failing to hoist the
Union out of the per-query loop for the Hermite kernels too).

Note: Constant kernel was also a candidate but showed a borderline
~3% regression on inbounds (3 trials) — likely because its kernel is
small enough that the function-call overhead of the barrier exceeds
the union-split saving. Constant left unchanged.

Tier-1 tests pass: test_akima_periodic, test_cardinal_periodic,
test_periodic_bc.
… _check_domain call

`_cubic_interp_periodic_scalar` was calling `_check_domain(cache.x, xq,
WrapExtrap())` whose result was discarded — the 3-arg AbstractExtrap
fallback returns `nothing` (an `@inline` no-op LLVM elides), so the
call was structurally dead. Then dispatched into
`_eval_cubic_at_point(..., ::WrapExtrap, …)`, which always routes
through `_wrap_to_domain` regardless of whether the query is in the
closed wrap domain.

Hoist the domain check explicitly so the in-domain query takes the
`InBounds` path directly and skips the `WrapExtrap` dispatch chain
(`_wrap_to_domain` → `_eval_cubic_at_point(::InBounds)`). OOB queries
fall to the wrap path unchanged. Mirrors the batch loop's
function-barrier pattern (a74893c).

Measured perf delta: noise-floor (~0 ns / 1 ns per call) — the
LLVM-elided `_check_domain` cost nothing and the wrap dispatch's fast
path was already getting inlined. Change is cleanup value only: dead
code removed and intent expressed explicitly.

Tier-1 tests pass: test_periodic_bc, test_cubic_anchor,
test_wrapextrap_closed_boundary.
Collapse the `(xi::Tg, x_min::Tg, x_max::Tg)` and `(xi::Real, x_min::Tg,
x_max::Tg)` overloads into a single `(xi::Real, ...)` method.
`_extract_primal` is the identity on plain numerics, so the homogeneous-Tg
case pays nothing extra; the in-domain branch returns the original `xi`
to preserve AD `Dual` carriers, and the cold `mod()` arithmetic propagates
the Dual through naturally (`d/dx[mod(x,p)] = 1`).

Also inline the previously hoisted `_wrap_to_domain_slow` back into the
body — it was already `@inline`, so the split was an organizational
artifact (the "@noinline" wording in the old comment was stale). Trim
the docstring to the closed-domain rule + AD-carrier note; the longer
`:inclusive`/`:exclusive` invariance discussion lives in the
closed-boundary test pin.

Net: ~50 lines -> ~25 lines, two overloads -> one. Test suite green.
`WrapExtrap` has been a tag struct (no type parameters, no `_x_min`/`_x_max`
fields) since the axis-as-truth refactor — the axis owns the wrap domain
via `first/last`, and eval kernels read it directly through
`_wrap_to_domain(xq, x)`. But many in-code comments still referenced the
old "materialize WrapExtrap{Nothing} → WrapExtrap{T}" wording, which
suggests an internal storage form that no longer exists.

Sweep across src/ and test/:
- src/cubic/{cubic_oneshot,cubic_interpolant,cubic_series_interp,nd/cubic_nd_oneshot}.jl
- src/quadratic/{quadratic_oneshot,quadratic_interpolant,nd/quadratic_nd_oneshot}.jl
- src/hetero/hetero_oneshot.jl
- src/core/{periodic,periodic_axis,nd_utils}.jl
- test/{test_allocation,test_linear_periodic}.jl

Replace stale wording with current intent:
- `_resolve_extrap(extrap, cache.bc, cache.x)` → "BC-aware extrap: PeriodicBC
  forces WrapExtrap, otherwise passthrough"
- `_resolve_extrap(extrap, x, Tv)` → "promote FillExtrap value type to Tv"
- `_resolve_extrap(extrap, grid)` → "passthrough (tag-struct identity)"

Also drop two leftover back-compat calls inside the library itself
(`WrapExtrap(cache.x)` → `WrapExtrap()`); the `(::AbstractVector)` ctor
shim stays in eval_ops.jl for external call sites.

No behavior change — comments + one redundant constructor argument only.
`cubic_interp!` had two scalar-query overloads that took an output vector
and wrote the single result to `output[1]`:

  cubic_interp!(output, cache, y, x_query::Tg; ...)        # 1
  cubic_interp!(output, x,     y, x_query::Real; ...)      # 2

Both were thin wrappers over `cubic_interp(cache/x, y, scalar; ...)` /
`cubic_interp_scalar(...)` that just forwarded and unpacked into a
length-1 buffer.

Why remove:
- No other 1D method (linear/quadratic/akima/pchip/cardinal/constant)
  exposes this shape — the scalar-returning `*_interp(...)` is the
  standard scalar API across the package.
- The forced `output = zeros(1); _interp!(output, ...); output[1]`
  pattern is strictly worse than `_interp(...)` at the call site.
- Not documented in `docs/src/api/cubic.md` ("In-place version" in the
  public table implies vector query).

Test cleanup: the three `@testset`s under "Cubic Spline - Uncovered Paths"
that existed *to cover these two methods* are dropped, plus the scalar
half of the "Real type wrappers" testset. Equivalent semantic coverage
is already present elsewhere:
  - `cubic_interp(cache, y, 1.5; extrap=ClampExtrap())` — test_cubic.jl:164
  - `cubic_interp(x, y, 0.5; bc=PeriodicBC())` — test_periodic_bc.jl:381,
    test_wrapextrap_closed_boundary.jl:81
  - `cubic_interp(x, y, 0.25; autocache=false)` — test_cubic_autocache.jl:159,
    test_thread_safety.jl:106

Series `cubic_interp!(out, x, s::Series, xq::Real)` is unaffected —
that variant produces one output per series and is a distinct API.

Net: -83 lines (src + test), no behavior change on any other path.
…ch loops

Linear and Quadratic `_*_vector_loop!` were doing `extrap = _check_domain(...)`
and looping in place — same pre-barrier shape that was fixed for Cubic /
Hermite-family / Linear-oneshot earlier in this branch.

`_check_domain` for Clamp/Fill/Wrap returns `Union{InBounds, E}`. With the
loop inline, the per-iteration `_*_eval_at_point` sees the union and pays
a runtime dispatch per element. Splitting into outer (resolves union) +
inner (sees concrete `extrap::E`) lets Julia union-split the inner loop,
giving one specialization per concrete extrap.

Files:
- src/linear/linear_interpolant.jl    : `_linear_vector_loop!` → `_linear_vector_loop_inner!`
- src/quadratic/quadratic_interpolant.jl : `_quadratic_vector_loop!` → `_quadratic_vector_loop_inner!`

Both mirror the established `_cubic_vector_loop!` / `_hermite_vector_loop!`
pattern. Quadratic's comment block already claimed "function barrier" —
now the implementation matches the comment.

Tests green on test_linear.jl + test_quadratic.jl.
The seam fast path in `search_interval(::Searcher, ::_ExclusivePeriodicAxis,
::Real)` returned the seam-cell 4-tuple immediately without going through
`_search_interval_real`, which is the function that updates `RefHint.idx[]`
for LinearSearch / LinearBinarySearch policies. Monotone hinted streaming
across the seam therefore left the hint stuck at a stale interior index,
breaking the O(1) locality guarantee on subsequent queries that returned
to the interior.

Cause: the BC-aware Searcher dispatch existed pre-`9ee83fd69` and the
seam handler in that overload did the write. When BC moved from Searcher
to the axis level, the seam shortcut was lifted to `_ExclusivePeriodicAxis`
without the write-back.

Fix:
- src/core/search.jl: add type-dispatched `_write_hint!(::NoHint, _) = nothing`
  / `_write_hint!(h::RefHint, idx) = (h.idx[] = idx; nothing)`. Trivial
  inline for NoHint paths, idx-write for RefHint paths. Reusable by any
  fast path that short-circuits `_search_interval_real`.
- src/core/periodic_axis.jl: call `_write_hint!(s.hint, n)` before the seam
  return.

Test cleanup:
- test/test_linear_periodic.jl: the existing regression guard at line 347
  ("Exclusive — seam hint write-back updates RefHint (P2 review)") was
  itself broken since `9ee83fd69` — it called the removed 5-arg
  `_resolve_search(x, xq, search, ref, bc)` form, so the testset errored
  at compile time and never reached its assertion. Migrate the setup
  to the axis-level API (`_resolve_axis(x, bc)` → `_ExclusivePeriodicAxis`,
  then 4-arg `_resolve_search`). With this commit the test transitions
  RED (assertion `1 == 5`) → GREEN, properly guarding against the
  regression it was originally written for.
- test/test_periodic_search_4value.jl: removed. The whole file tested the
  removed `Searcher{P, H, BC}` 3-param ctor and BC-aware `search_interval`
  dispatch. Both abstractions are gone — BC dispatch now lives on the axis.
  Seam 4-tuple contract is already covered end-to-end by
  test_linear_periodic.jl + test_wrapextrap_closed_boundary.jl.

Verified: test_linear_periodic.jl + test_wrapextrap_closed_boundary.jl both
green.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR recovers the Cubic PeriodicBC oneshot performance regression introduced on master, while simplifying the Searcher type and fixing a hint write-back bug at the periodic seam.

Changes:

  • Outer/inner function-barrier split in all 1D batch loops (Cubic/Linear/Quadratic/Hermite-family) so _check_domain's Union{InBounds, E} is resolved once and the inner kernel is type-stable.
  • Drops the BC type parameter from Searcher (now Searcher{P,H}); periodic seam handling is consolidated into _ExclusivePeriodicAxis, with a new _write_hint! helper ensuring RefHint is updated on the seam fast path.
  • Cleans up WrapExtrap (single unified _wrap_to_domain, no WrapExtrap{T} materialization), hoists the in-domain check in the scalar periodic cubic entry, and removes the undocumented scalar cubic_interp!(output, ..., xq::Real) method.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/core/search.jl Drops BC type-param from Searcher; simplifies _to_searcher/_resolve_search; adds _write_hint!.
src/core/periodic_axis.jl Wrapper-level search_interval now writes the hint on the seam short-circuit.
src/core/periodic.jl Merges _wrap_to_domain overloads via _extract_primal; trims stale comments.
src/core/nd_utils.jl Removes the per-axis bcs arg from _search_all_intervals_stencil.
src/cubic/cubic_oneshot.jl Drops scalar cubic_interp!; hoists in-domain check in periodic scalar path; removes redundant _check_periodic_endpoints call.
src/cubic/cubic_oneshot_series.jl Drops bc arg from _resolve_search.
src/cubic/cubic_eval.jl Splits vector loop into outer/inner barriers.
src/cubic/cubic_interpolant.jl, cubic_series_interp.jl Stores plain WrapExtrap() (no materialization).
src/cubic/cubic_solver.jl Inlines the periodic z[n+1] = z[1] mirror, drops the per-BC helper.
src/cubic/nd/cubic_nd_oneshot.jl Updates comment for _resolve_extrap semantics.
src/linear/linear_oneshot.jl, linear_interpolant.jl, linear_oneshot_series.jl, nd/linear_nd_oneshot.jl Function-barrier split; drops bc/nobcs plumbing.
src/quadratic/quadratic_interpolant.jl, quadratic_oneshot.jl, nd/quadratic_nd_oneshot.jl Function-barrier split; comment updates.
src/hermite/hermite_eval.jl Function-barrier split for both slope and pre-baked-slope variants.
src/hetero/hetero_eval.jl, hetero_oneshot.jl Removes bcs arg from _search_all_intervals_stencil calls.
src/pchip/pchip_oneshot.jl, src/cardinal/cardinal_oneshot.jl, src/akima/akima_oneshot.jl Drops bc arg from _resolve_search.
src/constant/constant_oneshot.jl, constant_oneshot_series.jl, nd/constant_nd_oneshot.jl Drops bc/nobcs plumbing.
test/test_cubic.jl, test_allocation.jl, test_linear_periodic.jl, test_periodic_axis.jl Updates tests for removed scalar cubic_interp! and new BC-less Searcher/_resolve_search.
test/test_periodic_search_4value.jl Deletes BC-aware 4-tuple unit tests (now obsolete with BC removed from Searcher).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

FastInterpolations.jl Benchmarks

All benchmarks (50 total, click to expand)
Benchmark Current: c2f0e9f Previous Imm. Ratio Grad. Ratio
10_nd_construct/bicubic_2d 37440.0 ns 39720.0 ns 0.943 0.991
10_nd_construct/bilinear_2d 627.2 ns 941.0 ns 0.666 0.909
10_nd_construct/tricubic_3d 352335.0 ns 334524.0 ns 1.053 1.034
10_nd_construct/trilinear_3d 1725.4 ns 2155.5 ns 0.8 0.909
11_nd_eval/bicubic_2d_batch 1436.7 ns 1526.3 ns 0.941 0.863
11_nd_eval/bicubic_2d_scalar 16.2 ns 15.9 ns 1.019 1.023
11_nd_eval/bilinear_2d_scalar 7.2 ns 7.0 ns 1.029 1.034
11_nd_eval/tricubic_3d_batch 3236.1 ns 3450.2 ns 0.938 0.937
11_nd_eval/tricubic_3d_scalar 33.7 ns 35.8 ns 0.939 0.967
11_nd_eval/trilinear_3d_scalar 13.7 ns 12.7 ns 1.08 1.061
12_cubic_eval_gridquery/range_random 4225.8 ns 4650.4 ns 0.909 0.942
12_cubic_eval_gridquery/range_sorted 4215.7 ns 4646.4 ns 0.907 0.942
12_cubic_eval_gridquery/vec_random 9529.2 ns 10250.0 ns 0.93 0.975
12_cubic_eval_gridquery/vec_sorted 3230.4 ns 3179.0 ns 1.016 1.009
13_nd_oneshot_gridquery/bicubic_2d_rand_rand 65863.3 ns 66431.0 ns 0.991 -
13_nd_oneshot_gridquery/bicubic_2d_sort_rand 62703.3 ns 61673.9 ns 1.017 -
13_nd_oneshot_gridquery/bicubic_2d_sort_sort 58746.0 ns 58907.7 ns 0.997 -
13_nd_oneshot_gridquery/bilinear_2d_rand_rand 15918.4 ns 16624.0 ns 0.958 -
13_nd_oneshot_gridquery/bilinear_2d_sort_rand 9241.7 ns 9404.2 ns 0.983 -
13_nd_oneshot_gridquery/bilinear_2d_sort_sort 5728.2 ns 4857.9 ns 1.179 -
14_series_oneshot_batch/constant_inplace_vec_k8_q1000_rand 18623.9 ns 19313.1 ns 0.964 -
14_series_oneshot_batch/linear_inplace_vec_k8_q1000_rand 17490.8 ns 18302.6 ns 0.956 -
1_cubic_oneshot/q00001 546.0 ns 571.7 ns 0.955 1.145
1_cubic_oneshot/q10000 43700.8 ns 47949.3 ns 0.911 0.7
2_cubic_construct/g0100 1401.8 ns 1535.3 ns 0.913 0.998
2_cubic_construct/g1000 12727.8 ns 14448.8 ns 0.881 0.943
3_cubic_eval/q00001 21.2 ns 23.5 ns 0.903 0.913
3_cubic_eval/q00100 443.4 ns 484.7 ns 0.915 0.94
3_cubic_eval/q10000 42733.9 ns 47033.9 ns 0.909 0.944
4_linear_oneshot/q00001 25.6 ns 27.1 ns 0.941 0.947
4_linear_oneshot/q10000 18645.9 ns 19410.3 ns 0.961 0.975
5_linear_construct/g0100 35.3 ns 44.3 ns 0.796 0.932
5_linear_construct/g1000 279.1 ns 377.3 ns 0.74 1.008
6_linear_eval/q00001 10.4 ns 10.3 ns 1.01 1.036
6_linear_eval/q00100 198.0 ns 209.7 ns 0.944 0.993
6_linear_eval/q10000 18548.8 ns 19824.9 ns 0.936 0.983
7_cubic_range/scalar_query 8.3 ns 9.1 ns 0.912 1.098
7_cubic_vec/scalar_query 10.4 ns 10.7 ns 0.972 0.966
8_cubic_multi/construct_s001_q100 643.4 ns 690.8 ns 0.931 1.113
8_cubic_multi/construct_s010_q100 4475.0 ns 5155.8 ns 0.868 0.97
8_cubic_multi/construct_s100_q100 40131.0 ns 44714.4 ns 0.897 0.946
8_cubic_multi/eval_s001_q100 820.7 ns 830.6 ns 0.988 1.125
8_cubic_multi/eval_s010_q100 1773.5 ns 1833.0 ns 0.968 1.003
8_cubic_multi/eval_s010_q100_scalar_loop 2336.6 ns 2483.3 ns 0.941 0.973
8_cubic_multi/eval_s100_q100 11364.3 ns 11632.6 ns 0.977 0.98
8_cubic_multi/eval_s100_q100_scalar_loop 3336.3 ns 3603.5 ns 0.926 0.966
9_nd_oneshot/bicubic_2d 45031.3 ns 46649.2 ns 0.965 1.122
9_nd_oneshot/bilinear_2d 540.6 ns 592.5 ns 0.912 0.924
9_nd_oneshot/tricubic_3d 415373.2 ns 388200.7 ns 1.07 1.125
9_nd_oneshot/trilinear_3d 1038.0 ns 1066.6 ns 0.973 0.959

⚠️ Performance Regression Confirmed ⚠️

After re-running 7 flagged benchmark(s) 10 time(s), 6 regression(s) confirmed.

Benchmark Current Previous Imm. Ratio Grad. Ratio Tier
13_nd_oneshot_gridquery/bilinear_2d_sort_sort 5728.2 ns 4857.9 ns 1.179 - immediate
1_cubic_oneshot/q00001 546.0 ns 571.7 ns 0.955 1.145 gradual
8_cubic_multi/construct_s001_q100 643.4 ns 690.8 ns 0.931 1.113 gradual
8_cubic_multi/eval_s001_q100 820.7 ns 830.6 ns 0.988 1.125 gradual
9_nd_oneshot/bicubic_2d 45031.3 ns 46649.2 ns 0.965 1.122 gradual
9_nd_oneshot/tricubic_3d 415373.2 ns 388200.7 ns 1.07 1.125 gradual

Thresholds: immediate > 1.1x (vs latest master), gradual > 1.1x (vs sliding window)

This comment was automatically generated by Benchmark workflow.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 97.93814% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.45%. Comparing base (b325b41) to head (c2f0e9f).

Files with missing lines Patch % Lines
src/core/search.jl 90.90% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
+ Coverage   96.43%   96.45%   +0.02%     
==========================================
  Files         143      143              
  Lines       11994    11957      -37     
==========================================
- Hits        11566    11533      -33     
+ Misses        428      424       -4     
Files with missing lines Coverage Δ
src/akima/akima_oneshot.jl 100.00% <100.00%> (ø)
src/cardinal/cardinal_oneshot.jl 100.00% <100.00%> (ø)
src/constant/constant_oneshot.jl 100.00% <100.00%> (ø)
src/constant/constant_oneshot_series.jl 100.00% <100.00%> (ø)
src/constant/nd/constant_nd_oneshot.jl 97.14% <100.00%> (-0.08%) ⬇️
src/core/nd_utils.jl 93.35% <100.00%> (ø)
src/core/periodic.jl 95.52% <ø> (+1.80%) ⬆️
src/core/periodic_axis.jl 94.59% <100.00%> (-0.35%) ⬇️
src/cubic/cubic_eval.jl 100.00% <100.00%> (ø)
src/cubic/cubic_interpolant.jl 98.66% <100.00%> (ø)
... and 17 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mgyoo86 mgyoo86 merged commit 5a4b99d into master May 19, 2026
14 checks passed
@mgyoo86 mgyoo86 deleted the fix/cubic_periodic_regression branch May 19, 2026 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants