(fix + perf): Recover Cubic PeriodicBC oneshot regression#143
Conversation
…ad helpers
Three cleanups landing together in the Cubic 1D periodic / oneshot path,
each removing redundant indirection without behavior change:
1. `cubic_interp` (scalar) and `cubic_interp!` (vector batch) no longer
thread `bc` into `_resolve_search`. Search at eval time runs against
`cache.x` (the cache pool's wrapped axis form), so the seam fires via
`periodic_axis.jl` axis-level dispatch that reads `g.period` directly
— `Searcher.bc.period` is unused in this path. Flips Cubic's Searcher
across all BCs to `Searcher{P,H,NoBC}`, shrinking specialization tree.
2. Drop the duplicate `_check_periodic_endpoints(bc, y)` call from
`_cubic_periodic_solve!`. `_resolve_data(y, bc)` already performs the
same `:inclusive` `y[1] ≈ y[end]` validation two lines later.
3. Inline `z_workspace[n+1] = z_workspace[1]` at the periodic seam in
place of `_finalize_z_periodic_seam!` (3 identical method bodies for
`:inclusive` / `:exclusive` / `:extended`).
Tier-1 unit tests (test_periodic_bc, test_wrapextrap_closed_boundary,
test_cubic_anchor, test_cubic_adjoint, test_1d_bc_consistency,
test_complex_cubic, test_1d_adjoint_periodic) all pass. Perf-neutral on
fresh-process per-case benches (the regression vs v0.4.9 has another
root cause — under investigation).
…1 of B-full)
21 call sites across 8 1D method oneshot files: the 5-arg
`_resolve_search(grid, q, search, hint, bc)` form is replaced by the
4-arg form. In every case below, the bc threaded into the Searcher
type-param is not consulted at search time:
- axis is wrapped (`_resolve_axis(x, bc)`) so seam handling already
fires via the axis-level dispatch in `periodic_axis.jl` that reads
`g.period` directly (Cardinal/Akima/Pchip OnTheFly, Linear/Constant
1D, Linear/Constant Series in-bounds entry).
- axis is the extended-cycle output of `_periodic_extend_1d` and bc
is `:extended` (Cardinal/Akima/Pchip PreCompute periodic) — never
the `:exclusive` variant that `search.jl:879` dispatches on.
- `cache.x` is wrapped from the cache pool (cubic_oneshot_series
PreCompute periodic).
Net effect: every Searcher in these paths becomes
`Searcher{P, H, NoBC}`, sharing one specialization tree. No semantic
change. Tier-1 tests pass: test_periodic_bc, test_akima_periodic,
test_cardinal_periodic, test_constant_periodic, test_1d_bc_consistency,
test_wrapextrap_closed_boundary.
This is Phase 1 of a 3-phase cleanup to remove the BC type-param from
Searcher entirely. Phase 2 migrates Series oneshot raw-x periodic
entries to the axis-wrap pattern. Phase 3 drops the BC type-param from
the Searcher struct and removes the now-dead `<:PeriodicBC{:exclusive}`
dispatch on `AbstractVector` in search.jl.
…rch (Phase 2 of B-full) Six Series oneshot entries (linear/constant/cubic scalar series, two entries each) plus the ND `_search_axis_stencil` helper no longer pass `bc` into `_resolve_search`. In every case the search target downstream is a wrapped axis (`_resolve_axis(x, bc)` inside the periodic helper for Linear/Constant, `cache.x` from the cache pool for Cubic, or `grids_eff = map(_resolve_axis, grids, bcs)` for ND), so seam handling already fires via the axis-level dispatch in `periodic_axis.jl`. The `Searcher.bc.period` field was never consulted on these paths. After this commit, no caller in the codebase invokes the 5-arg `_resolve_search(grid, q, search, hint, bc)` form. The form itself (and the BC type-param on `Searcher`) is removed in Phase 3. `_search_axis_stencil` keeps its `bc` arg in the signature (renamed `_bc` to mark unused) for caller compatibility; the signature itself is collapsed in Phase 3 alongside `_search_all_intervals_stencil`'s `bcs` parameter. Tier-1 tests pass: test_periodic_bc, test_complex_cubic_series, test_1d_bc_consistency.
After Phases 1 & 2, no caller threads `bc` into `_resolve_search` or
into `_to_searcher`. With the only consumer of `Searcher.bc.period`
gone, the BC type-param can be removed from the Searcher itself:
Searcher{P, H, BC} → Searcher{P, H}
Net removal across `src/core/search.jl`, `src/core/periodic_axis.jl`,
`src/core/nd_utils.jl`, and the two ND oneshot files / hetero callers:
- `Searcher`'s `bc::BC` field, `BC` type-param, and BC-aware outer
ctors (`Searcher{P, H}(hint, bc::BC)` collapsed to single ctor).
- `_to_searcher(...)` 6 method bodies — `bc` arg dropped.
- `_resolve_search(grid, q, search, hint, bc)` 5-arg form — `bc`
was unused; only the 4-arg form remains. `_resolve_searcher_for_grid`
loses its BC-preservation branch.
- `search_interval(::Searcher{...,<:PeriodicBC{:exclusive}}, ::AbstractVector, ::Real)`
dispatch (search.jl) and the equivalent `<:PeriodicBC{:exclusive}` axis
override (periodic_axis.jl) — both dead, collapsed into the generic
`Searcher`/`AbstractVector` (or `_ExclusivePeriodicAxis`) variants.
- `_writeback_seam_hint` — was used only inside the now-removed
`:exclusive` AbstractVector seam fast path.
- `_search_axis_stencil(grid, q, search, hint, _bc)` and
`_search_all_intervals_stencil(..., bcs)` lose their unused `bcs`
parameter. Callers in `linear_nd_oneshot.jl`, `constant_nd_oneshot.jl`,
`hetero_eval.jl`, `hetero_oneshot.jl` drop their `nobcs = ntuple(_ -> NoBC(), Val(N))`
/ `bcs = map(_bc_for_periodic_check, …)` plumbing where the value was
only used here.
`_resolve_bc_period` (in search.jl) is **retained** — it is still used by
`cached_vector.jl` and `cached_range.jl` to normalize unresolved periods
when constructing wrapped axes. Its role here was for the Searcher
seam field; now it is purely an axis-construction helper.
Net effect: ~140 line reduction. Tier-1 tests pass: test_periodic_bc,
test_wrapextrap_closed_boundary, test_1d_bc_consistency,
test_cubic_anchor, test_complex_cubic_series, test_1d_adjoint_periodic,
test_hetero_oneshot, test_cubic_nd_oneshot.
Reintroduce the periodic specialization of `_cubic_vector_loop!` that was
dropped during the WrapExtrap closed-domain refactor. The generic loop
dispatches per query to `_eval_cubic_at_point(..., ::WrapExtrap)`, which
calls `_wrap_to_domain` on every query even when all queries are in
domain — a noticeable overhead on batches where the wrap branch never
fires.
The periodic specialization hoists the domain check once: compute
`(qmin, qmax)` once and compare against the closed wrap domain
`[first(cache.x), last(cache.x)]`. When all queries are in domain,
dispatch to `_eval_cubic_at_point(..., ::InBounds, …)` and skip the
`_wrap_to_domain` branch entirely. Only the slow path pays per-query
wrap cost. Matches v0.4.9's behavior.
Measured on `scripts/inclusive_vs_cubicfit.jl` (fresh-process per case,
n=100, nq=1000, all in-bounds):
Vector :inclusive oneshot 3412 → 3068 ns (~-10%, +7.6% vs v0.4.9)
Vector :exclusive oneshot 4446 → 3854 ns (~-13%, +35% vs v0.4.9)
Range :inclusive oneshot 3448 → 3234 ns (~-6%, +7% vs v0.4.9)
Range :exclusive oneshot 3964 → 3682 ns (~-7%)
CubicFit (Vector / Range) unchanged (control)
Closes most of the `:inclusive` regression that prompted the
investigation (`:exclusive`'s residual gap stems from `_get_h`'s
seam-aware wrapping — structural, not addressed here).
Tier-1 tests pass: test_periodic_bc, test_wrapextrap_closed_boundary,
test_cubic_anchor, test_1d_bc_consistency. Method-ambiguity-free
against the generic `_cubic_vector_loop!` (matched `X<:AbstractVector{Tg}`
bound + `::E where E<:AbstractExtrap` shape on both methods).
…iodic spec Replace the periodic specialization of `_cubic_vector_loop!` (added in 25b926e) with a function-barrier pattern in the single generic loop. Both achieve the same effect — guarantee that the per-query inner loop sees a concrete `ev` type — but the function barrier is more general and reuses `_check_domain`'s existing in-bounds promotion (no manual `qmin/qmax` check needed). Mechanism: _cubic_vector_loop!(...) # outer (caller-facing) ev_eff = _check_domain(cache.x, x_query, ev) # Union{InBounds, E} return _cubic_vector_loop_inner!(..., ev_eff, ...) # call-site union split _cubic_vector_loop_inner!(..., ev::E, ...) where E # inner (concrete ev) @inbounds for k in eachindex(...) output[k] = _eval_cubic_at_point(cache.x, y, z, x_query[k], ev, op, searcher) end Julia splits the Union at the function-call site (dispatch-level, not loop-hoisting-level) — robust regardless of inner kernel size. The previous all-in-one generic loop kept `ev::Union` flowing through the per-query body, where LLVM's loop-hoisting heuristic gave up on the union split (Cubic kernel size exceeds the threshold), producing per-iteration dispatch overhead. Ablation measurements (fresh-process per case, `inclusive_vs_cubicfit.jl`, median of 5 trials, n=100, nq=1000, all in-bounds): pre-fix H1 spec H1+barrier barrier only Vector_CubicFit 2884 2931 2898 2866 Vector_Inc 3412 3057 3005 3062 Vector_Exc 4446 3896 3891 3901 Range_CubicFit 3148 3151 3141 3141 Range_Inc 3448 3167 3198 3224 Range_Exc 3964 3672 3641 3588 Function barrier alone (last column) matches the H1-specialization result within trial noise (~50-100 ns) — confirms the H1 spec is redundant given the barrier pattern. Removing the specialization collapses two methods into one + one inner, net -10 lines. Tier-1 tests pass: test_periodic_bc, test_wrapextrap_closed_boundary, test_cubic_anchor, test_1d_bc_consistency, test_complex_cubic_series, test_cubic_nd_oneshot.
…apExtrap spec Same pattern as a74893c (Cubic batch loop unification): replace the `_linear_interp_loop!(::WrapExtrap)` 2-stage specialization with a function-barrier in the single generic loop. `_check_domain`'s existing in-bounds promotion (`Union{InBounds, E}`) flows through a function call where Julia splits it at the dispatch site — the inner kernel sees a concrete `extrap` type. Mechanism: _linear_interp_loop!(...) # outer extrap_eff = _check_domain(x, x_targets, extrap) # Union{InBounds, E} return _linear_interp_loop_inner!(..., extrap_eff, ...) # union split here _linear_interp_loop_inner!(..., extrap::E, ...) where E @inbounds for i in eachindex(...) output[i] = _linear_eval_at_point(x, y, x_targets[i], extrap, op, searcher) end Ablation measurements (3 trials median, n=100, nq=1000): A (with spec) B (barrier only) Δ Vector_Wrap_inbounds 1388 1417 +2.1% (borderline) Range_Wrap_inbounds 1212 1183 -2.4% (noise) Vector_Wrap_OOB 3932 3656 -7.0% ✓ Range_Wrap_OOB 3536 3417 -3.4% B is equal-or-better on 3 of 4 cases (OOB notably faster) — the OOB gain likely from `_linear_eval_at_point(::WrapExtrap)`'s `xq → _wrap_to_domain → InBounds` chain optimizing better than the spec's manual `_wrap_to_domain` + `ExtendExtrap` dispatch (`ExtendExtrap` re-checks boundary per call). Net -23 LOC, simpler dispatch table. Tier-1 tests pass: test_periodic_bc, test_wrapextrap_closed_boundary, test_1d_bc_consistency.
…rdinal/Akima/Pchip) Same pattern as a74893c (Cubic) and f04d927 (Linear): outer `_hermite_vector_loop!` resolves `_check_domain`, inner kernel sees a concrete `extrap` type. Applied to both variants: 1. pre-baked slopes: `_hermite_vector_loop!(..., dy::AbstractVector, ...)` 2. on-the-fly slopes: `_hermite_vector_loop!(..., sm::AbstractSlopeMethod, ...)` Both are shared by Cardinal/Akima/Pchip 1D oneshot+persistent paths. Measurements (single-trial, n=100, nq=1000): pre post Δ Cardinal_Wrap_inbounds 2713 2556 -5.8% Akima_Wrap_inbounds 2833 2625 -7.3% Pchip_Wrap_inbounds 3010 2810 -6.6% Cardinal_Wrap_OOB 4970 4958 -0.2% (noise) Consistent 6-7% improvement on the in-bounds WrapExtrap fast path — matches the union-splitting hypothesis (LLVM was failing to hoist the Union out of the per-query loop for the Hermite kernels too). Note: Constant kernel was also a candidate but showed a borderline ~3% regression on inbounds (3 trials) — likely because its kernel is small enough that the function-call overhead of the barrier exceeds the union-split saving. Constant left unchanged. Tier-1 tests pass: test_akima_periodic, test_cardinal_periodic, test_periodic_bc.
… _check_domain call `_cubic_interp_periodic_scalar` was calling `_check_domain(cache.x, xq, WrapExtrap())` whose result was discarded — the 3-arg AbstractExtrap fallback returns `nothing` (an `@inline` no-op LLVM elides), so the call was structurally dead. Then dispatched into `_eval_cubic_at_point(..., ::WrapExtrap, …)`, which always routes through `_wrap_to_domain` regardless of whether the query is in the closed wrap domain. Hoist the domain check explicitly so the in-domain query takes the `InBounds` path directly and skips the `WrapExtrap` dispatch chain (`_wrap_to_domain` → `_eval_cubic_at_point(::InBounds)`). OOB queries fall to the wrap path unchanged. Mirrors the batch loop's function-barrier pattern (a74893c). Measured perf delta: noise-floor (~0 ns / 1 ns per call) — the LLVM-elided `_check_domain` cost nothing and the wrap dispatch's fast path was already getting inlined. Change is cleanup value only: dead code removed and intent expressed explicitly. Tier-1 tests pass: test_periodic_bc, test_cubic_anchor, test_wrapextrap_closed_boundary.
Collapse the `(xi::Tg, x_min::Tg, x_max::Tg)` and `(xi::Real, x_min::Tg, x_max::Tg)` overloads into a single `(xi::Real, ...)` method. `_extract_primal` is the identity on plain numerics, so the homogeneous-Tg case pays nothing extra; the in-domain branch returns the original `xi` to preserve AD `Dual` carriers, and the cold `mod()` arithmetic propagates the Dual through naturally (`d/dx[mod(x,p)] = 1`). Also inline the previously hoisted `_wrap_to_domain_slow` back into the body — it was already `@inline`, so the split was an organizational artifact (the "@noinline" wording in the old comment was stale). Trim the docstring to the closed-domain rule + AD-carrier note; the longer `:inclusive`/`:exclusive` invariance discussion lives in the closed-boundary test pin. Net: ~50 lines -> ~25 lines, two overloads -> one. Test suite green.
`WrapExtrap` has been a tag struct (no type parameters, no `_x_min`/`_x_max`
fields) since the axis-as-truth refactor — the axis owns the wrap domain
via `first/last`, and eval kernels read it directly through
`_wrap_to_domain(xq, x)`. But many in-code comments still referenced the
old "materialize WrapExtrap{Nothing} → WrapExtrap{T}" wording, which
suggests an internal storage form that no longer exists.
Sweep across src/ and test/:
- src/cubic/{cubic_oneshot,cubic_interpolant,cubic_series_interp,nd/cubic_nd_oneshot}.jl
- src/quadratic/{quadratic_oneshot,quadratic_interpolant,nd/quadratic_nd_oneshot}.jl
- src/hetero/hetero_oneshot.jl
- src/core/{periodic,periodic_axis,nd_utils}.jl
- test/{test_allocation,test_linear_periodic}.jl
Replace stale wording with current intent:
- `_resolve_extrap(extrap, cache.bc, cache.x)` → "BC-aware extrap: PeriodicBC
forces WrapExtrap, otherwise passthrough"
- `_resolve_extrap(extrap, x, Tv)` → "promote FillExtrap value type to Tv"
- `_resolve_extrap(extrap, grid)` → "passthrough (tag-struct identity)"
Also drop two leftover back-compat calls inside the library itself
(`WrapExtrap(cache.x)` → `WrapExtrap()`); the `(::AbstractVector)` ctor
shim stays in eval_ops.jl for external call sites.
No behavior change — comments + one redundant constructor argument only.
`cubic_interp!` had two scalar-query overloads that took an output vector
and wrote the single result to `output[1]`:
cubic_interp!(output, cache, y, x_query::Tg; ...) # 1
cubic_interp!(output, x, y, x_query::Real; ...) # 2
Both were thin wrappers over `cubic_interp(cache/x, y, scalar; ...)` /
`cubic_interp_scalar(...)` that just forwarded and unpacked into a
length-1 buffer.
Why remove:
- No other 1D method (linear/quadratic/akima/pchip/cardinal/constant)
exposes this shape — the scalar-returning `*_interp(...)` is the
standard scalar API across the package.
- The forced `output = zeros(1); _interp!(output, ...); output[1]`
pattern is strictly worse than `_interp(...)` at the call site.
- Not documented in `docs/src/api/cubic.md` ("In-place version" in the
public table implies vector query).
Test cleanup: the three `@testset`s under "Cubic Spline - Uncovered Paths"
that existed *to cover these two methods* are dropped, plus the scalar
half of the "Real type wrappers" testset. Equivalent semantic coverage
is already present elsewhere:
- `cubic_interp(cache, y, 1.5; extrap=ClampExtrap())` — test_cubic.jl:164
- `cubic_interp(x, y, 0.5; bc=PeriodicBC())` — test_periodic_bc.jl:381,
test_wrapextrap_closed_boundary.jl:81
- `cubic_interp(x, y, 0.25; autocache=false)` — test_cubic_autocache.jl:159,
test_thread_safety.jl:106
Series `cubic_interp!(out, x, s::Series, xq::Real)` is unaffected —
that variant produces one output per series and is a distinct API.
Net: -83 lines (src + test), no behavior change on any other path.
…ch loops
Linear and Quadratic `_*_vector_loop!` were doing `extrap = _check_domain(...)`
and looping in place — same pre-barrier shape that was fixed for Cubic /
Hermite-family / Linear-oneshot earlier in this branch.
`_check_domain` for Clamp/Fill/Wrap returns `Union{InBounds, E}`. With the
loop inline, the per-iteration `_*_eval_at_point` sees the union and pays
a runtime dispatch per element. Splitting into outer (resolves union) +
inner (sees concrete `extrap::E`) lets Julia union-split the inner loop,
giving one specialization per concrete extrap.
Files:
- src/linear/linear_interpolant.jl : `_linear_vector_loop!` → `_linear_vector_loop_inner!`
- src/quadratic/quadratic_interpolant.jl : `_quadratic_vector_loop!` → `_quadratic_vector_loop_inner!`
Both mirror the established `_cubic_vector_loop!` / `_hermite_vector_loop!`
pattern. Quadratic's comment block already claimed "function barrier" —
now the implementation matches the comment.
Tests green on test_linear.jl + test_quadratic.jl.
The seam fast path in `search_interval(::Searcher, ::_ExclusivePeriodicAxis,
::Real)` returned the seam-cell 4-tuple immediately without going through
`_search_interval_real`, which is the function that updates `RefHint.idx[]`
for LinearSearch / LinearBinarySearch policies. Monotone hinted streaming
across the seam therefore left the hint stuck at a stale interior index,
breaking the O(1) locality guarantee on subsequent queries that returned
to the interior.
Cause: the BC-aware Searcher dispatch existed pre-`9ee83fd69` and the
seam handler in that overload did the write. When BC moved from Searcher
to the axis level, the seam shortcut was lifted to `_ExclusivePeriodicAxis`
without the write-back.
Fix:
- src/core/search.jl: add type-dispatched `_write_hint!(::NoHint, _) = nothing`
/ `_write_hint!(h::RefHint, idx) = (h.idx[] = idx; nothing)`. Trivial
inline for NoHint paths, idx-write for RefHint paths. Reusable by any
fast path that short-circuits `_search_interval_real`.
- src/core/periodic_axis.jl: call `_write_hint!(s.hint, n)` before the seam
return.
Test cleanup:
- test/test_linear_periodic.jl: the existing regression guard at line 347
("Exclusive — seam hint write-back updates RefHint (P2 review)") was
itself broken since `9ee83fd69` — it called the removed 5-arg
`_resolve_search(x, xq, search, ref, bc)` form, so the testset errored
at compile time and never reached its assertion. Migrate the setup
to the axis-level API (`_resolve_axis(x, bc)` → `_ExclusivePeriodicAxis`,
then 4-arg `_resolve_search`). With this commit the test transitions
RED (assertion `1 == 5`) → GREEN, properly guarding against the
regression it was originally written for.
- test/test_periodic_search_4value.jl: removed. The whole file tested the
removed `Searcher{P, H, BC}` 3-param ctor and BC-aware `search_interval`
dispatch. Both abstractions are gone — BC dispatch now lives on the axis.
Seam 4-tuple contract is already covered end-to-end by
test_linear_periodic.jl + test_wrapextrap_closed_boundary.jl.
Verified: test_linear_periodic.jl + test_wrapextrap_closed_boundary.jl both
green.
There was a problem hiding this comment.
Pull request overview
This PR recovers the Cubic PeriodicBC oneshot performance regression introduced on master, while simplifying the Searcher type and fixing a hint write-back bug at the periodic seam.
Changes:
- Outer/inner function-barrier split in all 1D batch loops (Cubic/Linear/Quadratic/Hermite-family) so
_check_domain'sUnion{InBounds, E}is resolved once and the inner kernel is type-stable. - Drops the
BCtype parameter fromSearcher(nowSearcher{P,H}); periodic seam handling is consolidated into_ExclusivePeriodicAxis, with a new_write_hint!helper ensuringRefHintis updated on the seam fast path. - Cleans up
WrapExtrap(single unified_wrap_to_domain, noWrapExtrap{T}materialization), hoists the in-domain check in the scalar periodic cubic entry, and removes the undocumented scalarcubic_interp!(output, ..., xq::Real)method.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/core/search.jl | Drops BC type-param from Searcher; simplifies _to_searcher/_resolve_search; adds _write_hint!. |
| src/core/periodic_axis.jl | Wrapper-level search_interval now writes the hint on the seam short-circuit. |
| src/core/periodic.jl | Merges _wrap_to_domain overloads via _extract_primal; trims stale comments. |
| src/core/nd_utils.jl | Removes the per-axis bcs arg from _search_all_intervals_stencil. |
| src/cubic/cubic_oneshot.jl | Drops scalar cubic_interp!; hoists in-domain check in periodic scalar path; removes redundant _check_periodic_endpoints call. |
| src/cubic/cubic_oneshot_series.jl | Drops bc arg from _resolve_search. |
| src/cubic/cubic_eval.jl | Splits vector loop into outer/inner barriers. |
| src/cubic/cubic_interpolant.jl, cubic_series_interp.jl | Stores plain WrapExtrap() (no materialization). |
| src/cubic/cubic_solver.jl | Inlines the periodic z[n+1] = z[1] mirror, drops the per-BC helper. |
| src/cubic/nd/cubic_nd_oneshot.jl | Updates comment for _resolve_extrap semantics. |
| src/linear/linear_oneshot.jl, linear_interpolant.jl, linear_oneshot_series.jl, nd/linear_nd_oneshot.jl | Function-barrier split; drops bc/nobcs plumbing. |
| src/quadratic/quadratic_interpolant.jl, quadratic_oneshot.jl, nd/quadratic_nd_oneshot.jl | Function-barrier split; comment updates. |
| src/hermite/hermite_eval.jl | Function-barrier split for both slope and pre-baked-slope variants. |
| src/hetero/hetero_eval.jl, hetero_oneshot.jl | Removes bcs arg from _search_all_intervals_stencil calls. |
| src/pchip/pchip_oneshot.jl, src/cardinal/cardinal_oneshot.jl, src/akima/akima_oneshot.jl | Drops bc arg from _resolve_search. |
| src/constant/constant_oneshot.jl, constant_oneshot_series.jl, nd/constant_nd_oneshot.jl | Drops bc/nobcs plumbing. |
| test/test_cubic.jl, test_allocation.jl, test_linear_periodic.jl, test_periodic_axis.jl | Updates tests for removed scalar cubic_interp! and new BC-less Searcher/_resolve_search. |
| test/test_periodic_search_4value.jl | Deletes BC-aware 4-tuple unit tests (now obsolete with BC removed from Searcher). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
FastInterpolations.jl BenchmarksAll benchmarks (50 total, click to expand)
|
| Benchmark | Current | Previous | Imm. Ratio | Grad. Ratio | Tier |
|---|---|---|---|---|---|
13_nd_oneshot_gridquery/bilinear_2d_sort_sort |
5728.2 ns |
4857.9 ns |
1.179 |
- |
immediate |
1_cubic_oneshot/q00001 |
546.0 ns |
571.7 ns |
0.955 |
1.145 |
gradual |
8_cubic_multi/construct_s001_q100 |
643.4 ns |
690.8 ns |
0.931 |
1.113 |
gradual |
8_cubic_multi/eval_s001_q100 |
820.7 ns |
830.6 ns |
0.988 |
1.125 |
gradual |
9_nd_oneshot/bicubic_2d |
45031.3 ns |
46649.2 ns |
0.965 |
1.122 |
gradual |
9_nd_oneshot/tricubic_3d |
415373.2 ns |
388200.7 ns |
1.07 |
1.125 |
gradual |
Thresholds: immediate > 1.1x (vs latest master), gradual > 1.1x (vs sliding window)
This comment was automatically generated by Benchmark workflow.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #143 +/- ##
==========================================
+ Coverage 96.43% 96.45% +0.02%
==========================================
Files 143 143
Lines 11994 11957 -37
==========================================
- Hits 11566 11533 -33
+ Misses 428 424 -4
🚀 New features to boost your workflow:
|
Summary
This PR addresses the performance regression observed in Cubic
PeriodicBConeshot evaluation on themasterbranch. The regression was traced to per-iteration evaluation-kernel costs that dominate at larger query sizes.Function barriers and type stability improvements fully recover performance for
:inclusiveperiodic boundary conditions and halve the regression for:exclusiveconditions. The remaining gap in:exclusiveevaluations is tied to_get_hindirection through_ExclusivePeriodicAxisand is deferred to a future update.Additionally, a hint propagation bug at seam boundaries is fixed, and the scalar
cubic_interp!method is removed for consistency.Size: 15 commits, 32 files, +321 / −591.
Implementation
_check_domainfor boundary-extrapolating interpolations (Clamp/Fill/Wrap) returns aUnion{InBounds, E}. Processing this union inline caused a fallback to dynamic dispatch per iteration. All 1D batch loops (Cubic, Linear, Quadratic, Hermite-family) are now split into outer and inner components. The outer loop resolves theUniononce, restoring full type stability in the inner loop.BCtype parameter was removed fromSearcher, changing it toSearcher{P,H}. Periodic seam handling is now entirely self-contained within_ExclusivePeriodicAxis, eliminating redundant compiler specializations.WrapExtrapMinor Cleanup:_wrap_to_domainoverloads into a single definition using_extract_primal._check_domainblock out of the Cubic periodic scalar entry path.WrapExtrapcomments and outdated legacy constructor calls.Correctness Fixes
search_interval(::Searcher, ::_ExclusivePeriodicAxis, xq)short-circuited at the seam but failed to update the search hint. A new_write_hint!helper ensures the hint is correctly updated without stalling at stale indices during monotonic streaming.Breaking Changes
cubic_interp!: The scalar signaturecubic_interp!(output, x_or_cache, y, xq::Real)that mutated a length-1 buffer was undocumented and inconsistent with other 1D methods. Users should rely on the allocating scalar entry pointv = cubic_interp(x, y, 0.5)instead.Performance
Measured on M1 Pro, Julia 1.12.6.
Loop-dominated regime, nq = 1000 (min μs):
PeriodicBC(:inclusive)PeriodicBC(:inclusive)PeriodicBC(:exclusive)PeriodicBC(:exclusive)Setup-dominated regime, nq = 20 (min ns):
Confirms no significant setup overhead.
PeriodicBC(:inclusive)PeriodicBC(:inclusive)PeriodicBC(:exclusive)PeriodicBC(:exclusive)