(fix + refac): Method-aware output-eltype via kernel-shape op trait by mgyoo86 · Pull Request #146 · ProjectTorreyPines/FastInterpolations.jl

mgyoo86 · 2026-05-20T18:41:17Z

Closes #144.

Summary

linear_interp(x, y::Vector{SVector{3,Float64}})(xq_dual_vec) raised MethodError: Float64(::Dual). Root cause: the output-eltype trait predicted buffer types via a promote_type + _PromotableValue enumeration that couldn't see through SVector × Dual and similar non-promote_rule carrier chains. Allocated buffers under-predicted the kernel's actual return type.

This PR closes the carrier-propagation contract end-to-end and replaces the hand-coded enumeration with a method-aware kernel-shape op trait: each method declares its kernel's representative shape, and Julia inference (Base.promote_op) predicts the exact return type.

What this PR introduces

Kernel-shape op trait

@inline function _output_eltype(kernel_op::F, ::Type{Tv}, types::Type...) where {F, Tv}
    Top = Base.promote_op(kernel_op, Tv, types...)
    (Top === Union{} || Top === Any) && return Tv
    return Top
end

Two shapes cover every method:

_arithmetic_kernel_shape(h, yv, dL) = yv + yv*(dL/h) — Linear/Cubic/Quadratic/Hermite-family. Division by h Float-upgrades Int chains naturally.
_constant_kernel_shape(yv, q) = yv * one(q) — Constant (selection kernel; Int chains stay Int).

Wiring:

Default _output_eltype(::AbstractInterpolant1D, ::Type{Tq}) (and ND mirror) routes through _arithmetic_kernel_shape.
ConstantInterpolant 1D + ND override route through _constant_kernel_shape.
AbstractHermiteInterpolant1D override applies a two-call pattern over Tv and eltype(itp.dy) — disjoint promote chains so a duck-typed dy (e.g., Float y + Dual dy from AD on slopes) doesn't poison the y-chain.
All output-buffer sites canonically routed through the method-aware kernel-shape form. After the follow-up cleanup in this PR (linear series ×4, cubic cache oneshot ×1, cubic/quadratic ND scalar ×2, 1D + ND adjoint allocators ×6), no output-buffer allocation remains on the legacy 2-arg fallback. _series_output_type helper deleted (orphaned).

Net result: _PromotableValue enumeration is no longer reached from any output-buffer allocation path. The trait's prediction is kernel-faithful by construction.

Duck-Tv × duck-Tq end-to-end

Carrier propagation closed across every allocator that previously collapsed for non-promote_rule ducks:

1D + ND persistent batch (protocol-level)
1D + ND one-shot batch (3-arg)
Cubic/Linear/Quadratic/Constant series persistent + one-shot
Hermite one-shot for Float y + Vector{Dual} dy (AD on slopes)
ND deriv-zero short-circuits — per-axis carrier via prod(one, query), so heterogeneous queries like (Float, Dual) + mixed-deriv propagate the non-axis-1 carrier
ConstantInterpolant + QuadraticInterpolant anchored-vector callable

Sample-first allocator removed

Protocol's persistent batch path no longer evaluates the kernel for the first query just to read typeof(first_val) — trait-only sizing (n+1 → n evaluations per batch).

Behavior changes

constant_interp with Int y + Float xq returns Float (was Int); fully-Int chain stays Int. Scalar and batch paths agree.
hermite_interp(x, y::Float64, dy::Vector{Dual}, xq) no longer raises MethodError on setindex!.
CubicSeriesInterpolant persistent callable handles Series(Vector{SVector{Float}}) × Dual.
Heterogeneous-axis ND deriv-zero queries propagate the Dual carrier per-axis.
Rational/Rational/Rational arithmetic chain predicts Rational{Int} at the trait level (kernel-faithful — Rational/Rational = Rational). User-visible Vector{Rational} is still blocked by storage Float-forcing — see Out of scope. Cubic 3-arg batch oneshot is a partial outlier: the output buffer IS sized as Vector{Rational{Int}} (trait correct), but values are bit-exact convert(Rational, ::Float64) of the kernel's Float64 results — pinned by a semantic @test_broken (value equality, not just isa).

Out of scope (explicit follow-ups)

The kernel-shape trait now predicts kernel-faithful return types, but two storage-side mechanisms still apply manual Float-forcing — surfacing as a trait-vs-storage gap:

_promote_grid_float: arithmetic-method constructors (Linear/Cubic/Quadratic/Hermite-family) lift Rational/Int grids to Float64 at construction time. Relaxing this requires search/cache pathways (_CachedRange.inv_h etc.) to handle non-Float types.
Internal coefficient eltype (Tz for cubic z, Tc for quadratic a/d, hetero coefficients): still use the legacy _output_eltype(Tv, Tg) enumeration. Same theme as storage Float-forcing — natural promote via method-specific coefficient-shape op is achievable.

Both are pinned via @test_broken; they auto-surface as now passing once the follow-up lands. Performance benchmarking for non-Float storage paths is part of that follow-up's scope.

Treat Constant as `y[idx] + 0·(xq - x[idx])` so `Tq`'s carrier (Dual, Measurement, SVector, …) propagates naturally — the same rule Linear / Cubic / Quadratic / Hermite already follow. Drop the scalar `convert(_output_eltype, val)` wrapper and replace the persistent allocating batch + 1D Constant oneshot allocators with sample-first so the container fits whatever the kernel actually produces. Constant `_output_eltype` overrides and inline `T_out` expressions removed; ND Constant scalar oneshot drops its `::Tv` typeassert. Every method now uses the generic `_output_eltype(Tv, Tg, Tq)`. Constant kernels multiply the selection by `one(dL)` (carrier propagation); derivative branches use `0 * y_left * one(dL)` to preserve NaN/Inf propagation and keep the minimal `*(::Int, ::Tv)` + `*(::Tv, ::Real)` duck-type contract. Behavior change: `constant_interp` on `Int y` with a `Float xq` now returns `Float` (carrier propagates through zero-slope arithmetic). The fully-Int chain `(Int x, Int y, Int xq)` still returns `Int`. The 52a1f91 raw-Tv-for-promotable-Tq contract from #138 is intentionally relaxed to keep Constant on the same machinery as every other method — no Constant-specific helpers. Series / ND-oneshot-3-arg / hetero allocators remain on the old path and will migrate in a follow-up. New `test_duck_tv_dual_tq.jl` pins the Tv × Tq cross product across methods. Existing Constant / Complex / precision / promotion tests updated to the natural-promote contract. Closes #144.

Follow-up to #144. Close the deferred one-shot allocator gap (1D batch, ND batch, Hetero ND, Series) without touching the 12 individual call sites — and patch the Constant scalar/anchor short-circuits that broke `@inferred itp(1.0)` with Int y. * `_output_eltype` (src/core/utils.jl): the duck-type fallback now queries `Base.promote_op` on the universal kernel-shape op `yv * q + yv`, matching the Julia idiom used by `LinearAlgebra.mul` (`promote_op(matprod, eltype(A), eltype(B))`) and `Base.map` (`_return_type(f, eltype(A))`). Inference resolves the call to a constant at compile time (zero runtime cost — verified via LLVM IR: the function body is a single `ret`). For `SVector{Float64} × Dual` this returns `SVector{Dual}` where `promote_type` returned the non-concrete `Any` and the old fallback silently returned `Tv`. When the op is undefined (e.g., custom duck-type with no `*`), the legacy `Tv` fallback is preserved. * Constant short-circuits (constant_oneshot.jl, constant_anchor.jl): at `xi == last(x)` the scalar/anchor paths used to `return last(y)` raw — bypassing the kernel's `selected * one(dL)` carrier propagation. With `Int y` and `Float xq` that disagrees with the kernel's `Float` return, so `@inferred itp(1.0)` saw `Union{Float64, Int64}`. Short-circuits now multiply by `one(xi)` to match the kernel, restoring full type stability across both the point-based scalar path and the four anchor-dispatch branches. * linear_oneshot.jl: `_output_eltype(eltype(y), Tg)` was missing `Tq` entirely, so the trait couldn't see the carrier chain even after the duck fallback was correct. Now passes `eltype(x_targets)`. * hermite_oneshot.jl: dropping `eltype(dy)` from the trait call — including it collapses `promote_type(...)` to `Any` on duck inputs, which prevents the new `promote_op` fallback from resolving the carrier. `eltype(dy)` is typically the same shape as `Tv` so it contributes no extra carrier information. Side-effect fix: `_output_eltype(SVector{Int}, Float64, Float64)` now returns `SVector{Float64}` (was `SVector{Int}`). The linear/cubic/etc. kernels promote Int→Float internally via interval-slope division, so the old prediction caused `convert(::SVector{Int}, ::SVector{Float64})` errors that no test exercised — now correct. Tests: test_duck_tv_dual_tq.jl reorganized into three logical test items (Persistent path, Oneshot path, Type stability + raw-eltype contracts) covering scalar/batch/ND × duck-Tv × duck-Tq, the ForwardDiff.derivative MWE, `@inferred` Union-free assertions for the Constant short-circuit, the `_output_eltype` type table, and the Int→Int raw-eltype non-regression. Verified: smoke 42/42 PASS (master: 38/42, v0.4.9: 26/42); 35,587 tests pass on latest Julia. LTS run blocked by a pre-existing test/Manifest.toml resolution issue unrelated to this patch.

Surfaced by extended smoke covering Series / ND / Adjoint paths against duck-Tv (SVector) × duck-Tq (Dual) combinations. Three regressions fixed: * Constant ND oneshot batch (src/constant/nd/constant_nd_oneshot.jl): the allocator was `Vector{Tv}(undef, …)` and the outer wrapper carried a `::Vector{Tv}` typeassert — both bypass kernel carrier propagation, so `constant_interp((xg,yg), data_sv, queries_dual)` raised `Float64(::Dual)`. Replaced with sample-first allocation matching the 1D Constant pattern: pull `first_q` via `_extract_query_point`, run the scalar oneshot once, size from `typeof(first_val)`. The deriv-zero short-circuit now sizes from `0 * first(data) * one(eltype(first_q))` for the same reason. * 1D adjoint allocators (src/core/adjoint_protocol.jl:115, 125): `Tv = promote_type(eltype(y_bar), Tg)` returned `Any` for `SVector{3,Float64} × Float64` (no `promote_rule` between Base and StaticArrays), so `zeros(Any, n)` raised `zero(::Type{Any})`. Routing through `_output_eltype` reuses the `Base.promote_op` duck fallback from b675e17 and resolves to `SVector{3,Float64}`. Linear, Constant, Cubic, Quadratic, Hermite, Cardinal, PCHIP, Akima adjoints now all accept SVector y_bar. * ND adjoint allocators (src/core/adjoint_protocol.jl:353, 368): same root cause, same `promote_type → _output_eltype` fix. Linear, Constant, Cubic, Quadratic, Hetero ND adjoints all accept SVector y_bar (`Matrix{SVector{3,Float64}}` return). Tests: - test_duck_tv_dual_tq.jl gains an Adjoint-path testitem covering all 7 1D adjoints + 4 ND adjoints × {SVector, Dual} y_bar. - test_constant_eltype.jl updates two assertions that previously pinned the broken ND oneshot return as a known limitation; they now pin the carrier-propagated Float64 return. Verified: extended smoke 116/116 PASS (Series × 4 methods + ND forward + ND/1D adjoints × duck-Tv + slope-family OnTheFly); all affected adjoint test files still pass. Out of scope (separate follow-up): the ND scalar oneshot path (`constant_interp((xg,yg), data, query_tuple)`) returns `Float64` for all-Int inputs while the persistent path returns `Int` — pre-existing asymmetry unrelated to this patch.

Stage 3 of the Issue #144 cleanup. Series one-shot and persistent callable allocators previously composed two traits as `_series_output_type(_output_eltype(Tv, Tg), Tq)`. For duck-Tv × duck-Tq combinations (e.g., `Series(SVector{3,Float64}, …)` queried with a ForwardDiff.Dual) the outer step computes `promote_type(SVector{Float}, Dual) = Any`, falls through to its fallback (`Tv`), and the kernel's actual `SVector{Dual}` return then either errors on `setindex!(::Vector{SVector{Float}}, ::SVector{Dual})` or — for Constant's selection kernel — silently produces `Vector{Vector{Any}}`. Switching to a single `_output_eltype(Tv, Tg, Tq)` call uses the b675e17 `Base.promote_op` fallback, which resolves `SVector{Float} × Dual → SVector{Dual}` at compile time. Six sites across linear/cubic/quadratic series (scalar and batch, persistent and one-shot) migrate the chain to the unified form. Constant series keeps its raw-Tv contract for promotable Tq (Int y + Float xq → Vector{Int}, preserved) but routes duck Tq through `_output_eltype(Tv, Tq)` so SVector × Dual no longer collapses to `Vector{Any}`. Four allocator sites updated. Regression coverage added to test_duck_tv_dual_tq.jl: a new `Series path — duck-Tv × duck-Tq carrier propagates` testitem pins the four methods × {scalar, batch} × {SVector y, Dual xq} grid, and the existing Oneshot testitem now also asserts the Constant ND oneshot batch path (closed in acfc95a). Verified: extended smoke 132/132 PASS (was 126/132 with 6 series duck-Tv × duck-Tq fails before this commit); all touched test files green on latest Julia.

Follow-up to the original Issue #144 PR, addressing five user-visible paths where carrier propagation still collapsed despite the unified `_output_eltype` trait. All share the same root-cause family but live in branches the prior commits did not reach. Sites addressed: - `cubic_series_interp.jl` — `CubicSeriesInterpolant` build-then-call path still chained `_series_output_type(_output_eltype(Tv, Tg), Tq)`, collapsing `promote_type(SVector{F}, Dual)` to `Any`. Migrated to the unified `_output_eltype(Tv, Tg, Tq)` (mirrors the prior 6-site migration). - `interpolant_protocol.jl::_eval_nd_at_point` deriv-zero short-circuit and `constant_nd_eval.jl::_eval_at_cell` deriv branch and `constant_nd_oneshot.jl` scalar + batch deriv branches — all returned `0 * <data>` without query-carrier propagation. Now multiply by `prod(one, query)` so per-axis carriers (homogeneous Dual, heterogeneous `(Float, Dual)`) survive — symmetric with the forward kernel's per-axis `* one(dL_d)`. Affects every ND method at 2nd+ derivative paths. - `hermite_oneshot.jl` — original PR dropped `eltype(dy)` to avoid `promote_type` collapse on SVector + Dual inputs, which re-introduced the MethodError commit b3a3379 fixed for `Float64 y + Vector{Dual} dy`. Replaced with a two-call `_output_eltype` pattern keeping `Tv` and `eltype(dy)` in disjoint promote chains — handles both cases. Coverage: - `CubicSeriesInterpolant` persistent path × SVector × Dual (scalar + batch) - Constant ND deriv-zero × persistent + one-shot scalar - Heterogeneous-axis (Float, Dual) × mixed-deriv: Constant ND persistent scalar + one-shot scalar + one-shot batch, Linear ND persistent 2nd-deriv - Hermite Float32 y + Float64 dy precision pin - Hermite Float64 y + Vector{Dual} dy + ForwardDiff.derivative on slopes

…tput_type Constant and Quadratic 1D `itp(aq_vec)` (batch eval with pre-built anchors) allocated `Vector{Tg}` — only the grid type, ignoring `Tq` in the anchor parameterization. The Constant kernel now multiplies by `* one(dL::Tq)` and the Quadratic kernel mixes anchored deltas with `Tq`, so a duck-typed `Tq` (Dual, …) raised `MethodError` on `setindex!`. Linear's path already used `_output_eltype(Tv, Tg, Tq)`; mirrored here. `_series_output_type` is no longer called from any src file after the Cubic Series persistent path migrated to the unified `_output_eltype(Tv, Tg, Tq)` in commit 440be0b. Removed; its narrowed `promote_type → Tv` fallback is superseded by the duck-aware `Base.promote_op` fallback in `_output_eltype`. Coverage added to test/test_duck_tv_dual_tq.jl exercises both methods' anchored-vector callable with `_ConstantAnchoredQuery{Tg, Dual}` and `_QuadraticAnchoredQuery{Tg, Dual}` via `_fill_anchors!`.

…tion The PR series that restored duck-Tv × duck-Tq carrier propagation changed Constant's behavior from "selection kernel returns raw `Tv`" to "kernel multiplies by `* one(dL::Tq)` so the return widens via promote_type". Storage stays raw `{Tg, Tv}` (no `_promote_grid_float` indirection) — only the return type widens. Comments referencing the old "raw-Tv contract" / "Int in → Int out" return behavior, or the deleted `_output_eltype(itp, Tq)` override in `constant_nd_types.jl`, are updated to reflect the post-restoration model: storage is raw, the kernel handles return-type widening per-callable. No behavior change. Pure documentation alignment.

…fied) The protocol's 1D + ND batch allocating callables, and Constant's 1D + ND one-shot batch wrappers, called the scalar/persistent kernel for the first query just to read `typeof(first_val)` for the output buffer. That extra evaluation is per-batch overhead on a hot path — measurable at small N and strictly redundant once the trait predicts correctly. `_output_eltype(itp, Tq)` (and the 3-arg `_output_eltype(Tv, Tg, Tq)` used by one-shot wrappers) now drives all allocators directly: empty-batch and populated paths share one expression. The kernel still produces the same runtime values; only the container eltype is sized from types, not from sampling. Constant follows the same natural-promotion rule as every other method (no Constant-specific helper or override per the original Issue #144 PR's direction). The kernel's `* one(dL)` still preserves scalar-path semantics (Int×Int×Int scalar callable returns Int via kernel direct); batch and one-shot wrappers widen via the trait's Int→Float upgrade. Tests pinning the old sample-first `Vector{Int}` for fully-Int Constant batch are updated to `Vector{Float64}`. Locations: - src/core/interpolant_protocol.jl:60-79 (1D batch) - src/core/interpolant_protocol.jl:255-272 (ND batch) - src/constant/constant_oneshot.jl:277-298 (1D one-shot batch) - src/constant/nd/constant_nd_oneshot.jl:98-115 (ND one-shot batch)

@inline

`_output_eltype` previously enumerated standard numeric types via the `_PromotableValue` union and hand-applied an `Int→Float` upgrade — a manual proxy for "the arithmetic kernels divide". Selection kernels (Constant) don't divide, so the upgrade was wrong for them and was masked by sample-first allocators (one extra scalar eval per batch). Add a method-shape overload that lets the kernel itself drive inference: @inline _output_eltype(kernel_op::F, ::Type{Tv}, types::Type...) where {F, Tv} = ... Base.promote_op(kernel_op, Tv, types...) ... Constant declares its kernel shape and routes its trait through the overload: @inline _constant_kernel_shape(yv, q) = yv * one(q) @inline _output_eltype(::ConstantInterpolant{Tg, Tv}, ::Type{Tq}) where {Tg, Tv, Tq} = _output_eltype(_constant_kernel_shape, Tv, Tq) Julia infers the exact kernel return type — no `_PromotableValue` enumeration, no Float upgrade. Int×Int×Int matches the scalar callable (Int), SVector × Dual still resolves to SVector{Dual}. Constant ND mirrors the override. The `_PromotableValue` definition stays (other paths — grid promotion, autocache gating, FillExtrap auto-convert — use it independently). Other methods (Linear/Cubic/Quadratic/Hermite/etc.) keep the existing trait — their kernels divide, so the Float upgrade matches reality. Migrating them to their own kernel shape op is a follow-up. Tests: - Constant Int contract reverted: `itp([2,5,8]) isa Vector{Int}` again, matching `itp(3) === 30`. No scalar/batch asymmetry. - New `@testitem "Type stability — duck-Tv × duck-Tq inference"`: `@inferred` coverage for 1D persistent/oneshot × 5 methods (scalar + batch), ND persistent/oneshot × 3 methods, ND deriv-zero per-axis carrier, Hermite mixed eltype, CubicSeriesInterpolant persistent, anchored-vector callable. Catches silent Union returns and `Any` fallbacks `isa` checks would miss.

Linear/Cubic/Quadratic/Hermite all share the same kernel-shape pattern at the type level — every arithmetic kernel divides by `h`, which Float-upgrades Int chains and propagates duck carriers through `(dL / h)`. Declare one shared `_arithmetic_kernel_shape(yv, dL, h) = yv + yv * (dL / h)` and let Linear opt into it, validating the pattern at a second method beyond Constant. Migrated: - `LinearInterpolant` + `LinearInterpolantND` `_output_eltype(itp, Tq)` → `_output_eltype(_arithmetic_kernel_shape, Tv, Tq, Tg)` - 1D + ND one-shot wrappers (`linear_interp(x, y, x_targets)` and the ND scalar/batch variants) — buffer eltype from kernel-shape probe - Anchored-vector callable (`itp(aq_vec)`) Behavior parity for standard numerics (Int chain → Float, Float×Float → Float, Complex{Int} → ComplexF64, mixed Int×Float → Float — all match the previous `_PromotableValue`-driven Float upgrade). The trait's prediction now exactly matches the kernel reality, so empirical kernel output and trait-allocated buffer always agree. A side benefit: `Rational × Rational × Rational` chain now predicts `Rational{Int}` rather than `Float64` — the old trait over-predicted Float because `_PromotableValue` enumeration included `Rational`, but Julia's `Rational/Rational = Rational` (no Float upgrade) means the kernel actually preserves Rational. No existing test pinned the old over-prediction. Cubic/Quadratic/Hermite/PCHIP/Cardinal/Akima migration deferred to a follow-up — same shape op, mechanical override addition per method type.

… Rational `_arithmetic_kernel_shape` arguments reordered from `(yv, dL, h)` to `(h, yv, dL)` so callsites read `_output_eltype(shape, Tg, Tv, Tq)` — matching the codebase's standard `{Tg, Tv, Tq}` type-parameter order (`Interpolant{Tg, Tv, ...}`, where-clauses, etc.). Five Linear callsites updated correspondingly. Body unchanged — purely cosmetic. Tests pin the trait's Rational behavior at two layers: - Trait-level (`@inferred`): `_output_eltype(_arithmetic_kernel_shape, Rational{Int}, Rational{Int}, Rational{Int}) === Rational{Int}`, plus Int/Float/Complex baselines. - Constant end-to-end: storage is raw `{Tg, Tv}` so the trait's Rational prediction flows through to the user-visible Vector eltype. - Linear end-to-end: `@test_broken` — `_promote_grid_float` in the constructor lifts grid/values to Float64 at build time, shadowing the trait's Rational prediction. Marked broken so a future relaxation (storage following kernel reality) auto-surfaces as `now passing`. The `@test_broken` pin documents the design gap explicitly: trait-level correctness diverges from user-visible behavior at Linear/Cubic/etc. because the constructor's Float-forcing predates the kernel-shape trait. Resolving this is deferred to a follow-up.

Extend the kernel-shape op pattern (validated at Constant + Linear) to every remaining arithmetic method (Cubic, Quadratic, Hermite, PCHIP, Cardinal, Akima). One shared shape captures all of them — the division `y + y*(dL/h)` that Float-upgrades Int chains and propagates duck carriers. Trait wiring: - `interpolant_protocol.jl`: default `_output_eltype(::AbstractInterpolant1D, Tq)` (and ND mirror) now routes through `_arithmetic_kernel_shape`. Every method's persistent-batch path is migrated by inheritance. - `hermite_interpolant.jl`: `AbstractHermiteInterpolant1D` override applies the two-call pattern over `Tv` and `eltype(itp.dy)`, keeping disjoint promote chains so a duck-typed `dy` widens the result without poisoning the `y` chain. - `linear_interpolant.jl` + `linear_nd_types.jl`: explicit Linear overrides removed — now redundant with the default. Oneshot/series wrapper migrations (13 callsites): swap legacy `_output_eltype(Tv, Tg, Tq)` → `_output_eltype(_arithmetic_kernel_shape, Tg, Tv, Tq)`. Hermite oneshot's two-call already had the right shape; just swap to the shared shape op. Net effect: `_PromotableValue` enumeration + hand-coded Float upgrade are no longer reached from any output-buffer allocation path. Internal storage sites (coefficient/cache matrices, `Tz`/`Tc`) keep the legacy 2-arg trait — their role is different (intermediate compute, not kernel output). A latent consequence the pattern surfaces: Rational chains predict `Rational{Int}` rather than `Float64` at the trait level (kernel reality — Julia's `Rational/Rational = Rational` doesn't Float-upgrade). User-visible preservation is still blocked by `_promote_grid_float` in non-Constant constructors (pinned as `@test_broken` in test_duck_tv_dual_tq.jl) — that gap is the deferred follow-up.

- utils.jl 2-arg _output_eltype docstring: replace the inaccurate "arithmetic-kernel allocators (Linear/Cubic/Quadratic/Hermite)" audience claim with the actual current users (internal coefficient eltype, adjoint allocators, hetero ND legacy paths). Direct readers to the method-aware kernel-op overload for output-buffer sizing. - constant_nd_eval.jl: drop reference to the deleted "sample-first allocator"; the protocol now uses trait-only sizing via _output_eltype. - constant_series_interp.jl: ConstantSeriesInterpolant scalar callable docstring claimed `promote_type` widening for ducks; the code now uses `_output_eltype` (Base.promote_op on the kernel shape) so SVector × Dual resolves correctly. Doc-only — no runtime behavior change.

Two missed migration sites prevented correct output-eltype propagation under the kernel-shape op trait introduced earlier in this branch. - constant_anchor.jl: the `ConstantInterpolant(aq_vec)` anchored-vector callable was routing through the legacy 2-arg `_output_eltype(Tv, Tg, Tq)` which Float-upgrades any `Tr <: _PromotableValue && !<:AbstractFloat`. For a fully-Int chain (`Tv = Int, Tq = Int`) this allocated `Vector{Float64}` while the kernel actually returns `Int` — silent conversion and a contract violation. Every other Constant allocator uses `_output_eltype(_constant_kernel_shape, Tv, Tq)`. Aligned. - constant_series_interp.jl: `ConstantSeriesInterpolant`'s batch in-place dispatch had `outputs::AbstractVector{<:AbstractVector{Tv}}`. The parallel out-of-place form correctly widens `T_out` for duck `Tq` (e.g., `Vector{Vector{SVector{3, Dual}}}` when `Tv = SVector{3,Float64}`, `Tq = Dual`), but the widened buffer can't dispatch back into the in-place form. Linear's and Quadratic's parallel forms already used the relaxed `<:AbstractVector` constraint. Aligned. Regression coverage added: - "Constant Int-chain preservation" testset pins the anchored callable contract (`Vector{Int}` output for fully-Int chain). - "Linear/Quadratic/Constant SeriesInterpolant persistent — SVector × Dual carrier" testitem covers all 3 series persistent paths; the Constant block is the dispatch regression.

Additive coverage extending the duck-Tv × duck-Tq test surface to methods the PR description named but the test file missed. - "Hermite-family one-shot — duck-Tq carrier through trait": PCHIP/ Cardinal/Akima 3-arg oneshot allocators were rewired to `_output_eltype(_arithmetic_kernel_shape, ...)` in this branch but weren't exercised by any test. PCHIP/Akima slope formulas use scalar `sign`/`abs` on slope differences, so they can't accept `SVector y` (StaticArrays doesn't define `sign(::SVector)`); their `Tq` duck path is exercised via Float y + Dual xq. Cardinal's slopes are pure averages — kept the stronger SVector × Dual case there. - "PCHIP/Akima — custom scalar duck-Tv interface": minimal `DuckFloat` wrapper around Float64 implementing the required slope-formula interface (`sign`, `abs`, `zero`, `+`, `-`, `*`, `/`, ordering). Demonstrates that PCHIP/Akima impose NO `<: AbstractFloat` constraint on `Tv` — any duck type satisfying the interface flows through and the trait predicts the duck wrapper type as the kernel return. - "Cubic/Quadratic/Hermite-family — Rational user-visible blocked (BROKEN)" testsets mirror the Linear @test_broken pins. Cubic's 3-arg oneshot line is omitted with a note: its output buffer is correctly sized `Vector{Rational{Int}}` by the trait, but the kernel's internal z coefficients are still Float64; the result only round-trips through Rational for inputs whose Float values happen to be exactly representable. Asserting `isa Vector{Rational{Int}}` would pass for the wrong reason — pin removed until the coefficient-eltype follow-up lands.

The previous comment in the Cubic @test_broken block was imprecise: it suggested the 3-arg oneshot's `isa Vector{Rational{Int}}` would pass "by accident" for exactly-representable Floats. Empirical trace shows the actual mechanism is sharper and uniform — but the contract is still broken. Mechanism (verified via @inferred + runtime trace): - The kernel-shape trait `_output_eltype(_arithmetic_kernel_shape, Tg, Tv, Tq)` predicts `Rational{Int}` purely from input types at compile time. Type-stable, @inferred passes. - The buffer is allocated as `Vector{Rational{Int}}`. - BUT the in-place `cubic_interp!` builds an internal cache via `_promote_grid_float` which lifts grid + values to Float64. The kernel computes in Float64 throughout (z coefficients are Float64). - `output[i] = kernel_float_result` triggers `convert(Rational{Int}, ::Float64)`, which uses the bit-exact Float64 representation. For dyadic results (k/2 inputs on x²/2 data) the Float is exactly representable as a small-denominator Rational and the result LOOKS clean. For non-dyadic queries the result is a ~2^52-denominator binary fraction approximation, NOT the true Rational arithmetic value. The trait says one thing, storage lift says another — they don't coordinate. The next PR will decide between honoring the trait (relax _promote_grid_float for arithmetic methods) or honoring storage (make the trait aware of the lift). Either resolution is captured by the semantic pin below. Semantic pin: a cubic spline of `y = x` should reproduce `y = x` exactly. The new `@test_broken cubic_interp(x, x, [3//7]) == [3//7]` currently fails (Float64 path produces a 2^52-denom approximation) and auto-surfaces when the storage/trait mismatch is resolved.

…c contract Item A — canonical migration: route all surviving output-buffer sites through the method-aware kernel-shape form. After this commit no output-allocation path remains on the legacy 2/3-arg `_output_eltype(Tv, Tg, ...)` fallback. Migrated callsites: - linear_series_interp.jl: 2 sites (scalar + batch out-of-place callable) - linear_oneshot_series.jl: 2 sites (3-arg scalar + batch oneshot) - cubic_oneshot.jl:286: cache-variant 3-arg batch oneshot - cubic_nd_oneshot.jl:42: ND scalar oneshot (Tv_p binding retained for the separate `_resolve_extrap` use; trait uses raw Tv to match batch peer) - quadratic_nd_oneshot.jl:163: ND scalar oneshot - core/adjoint_protocol.jl: all 6 1D + ND adjoint allocator callables (using `Tg` as both grid and dL slot — Tq is anchor-baked at this layer) Item C — allocation contract pin: the persistent batch allocating callable `itp(xq)` must allocate exactly one `Vector{T_out}` and nothing else. New testitem in test_promotion_alloc.jl iterates over all 7 methods (linear/ cubic/quadratic/pchip/cardinal/akima/constant), comparing the call's `@allocated` against the bare-Vector baseline + ALLOC_THRESHOLD slack. A re-introduced sample-first eval (or any leaked scratch buffer) would exceed this by at least one kernel-result-sized allocation per call.

Replace the baseline-Vector subtraction approach with a direct in-place zero-alloc check. The persistent in-place callable `itp(out, xq)` must be zero-alloc across all methods — that single contract covers both paths, since the allocating form `itp(xq)` is just `Vector{T_out}(undef, n)` + this in-place call. Matches the existing `_bench_linear!` pattern in the same file (in-place + ALLOC_THRESHOLD), avoids baseline-subtraction noise, and pins a stronger invariant.

github-actions · 2026-05-20T18:49:06Z

FastInterpolations.jl Benchmarks

All benchmarks (50 total, click to expand)

Benchmark	Current: `c86ea3c`	Previous	Imm. Ratio	Grad. Ratio
`10_nd_construct/bicubic_2d`	`38749.0` ns	`37996.0` ns	`1.02`	`1.0`
`10_nd_construct/bilinear_2d`	`678.4` ns	`751.5` ns	`0.903`	`1.018`
`10_nd_construct/tricubic_3d`	`332771.0` ns	`337158.0` ns	`0.987`	`0.974`
`10_nd_construct/trilinear_3d`	`1945.9` ns	`1999.8` ns	`0.973`	`1.032`
`11_nd_eval/bicubic_2d_batch`	`1522.3` ns	`1529.2` ns	`0.995`	`0.924`
`11_nd_eval/bicubic_2d_scalar`	`15.9` ns	`15.9` ns	`1.0`	`0.985`
`11_nd_eval/bilinear_2d_scalar`	`7.0` ns	`7.0` ns	`1.0`	`0.972`
`11_nd_eval/tricubic_3d_batch`	`3452.2` ns	`3457.1` ns	`0.999`	`1.009`
`11_nd_eval/tricubic_3d_scalar`	`35.8` ns	`35.8` ns	`1.0`	`1.025`
`11_nd_eval/trilinear_3d_scalar`	`13.0` ns	`13.0` ns	`1.0`	`0.981`
`12_cubic_eval_gridquery/range_random`	`4650.8` ns	`4648.9` ns	`1.0`	`1.037`
`12_cubic_eval_gridquery/range_sorted`	`4647.8` ns	`4645.5` ns	`1.001`	`1.039`
`12_cubic_eval_gridquery/vec_random`	`10516.4` ns	`10054.4` ns	`1.046`	`1.094`
`12_cubic_eval_gridquery/vec_sorted`	`3186.0` ns	`3200.1` ns	`0.996`	`0.992`
`13_nd_oneshot_gridquery/bicubic_2d_rand_rand`	`65106.2` ns	`65912.3` ns	`0.988`	`-`
`13_nd_oneshot_gridquery/bicubic_2d_sort_rand`	`61168.1` ns	`61742.1` ns	`0.991`	`-`
`13_nd_oneshot_gridquery/bicubic_2d_sort_sort`	`58328.9` ns	`58935.1` ns	`0.99`	`-`
`13_nd_oneshot_gridquery/bilinear_2d_rand_rand`	`18019.8` ns	`16653.4` ns	`1.082`	`-`
`13_nd_oneshot_gridquery/bilinear_2d_sort_rand`	`9320.6` ns	`9097.8` ns	`1.024`	`-`
`13_nd_oneshot_gridquery/bilinear_2d_sort_sort`	`4962.9` ns	`4859.0` ns	`1.021`	`-`
`14_series_oneshot_batch/constant_inplace_vec_k8_q1000_rand`	`19184.9` ns	`20604.4` ns	`0.931`	`-`
`14_series_oneshot_batch/linear_inplace_vec_k8_q1000_rand`	`20042.2` ns	`18261.0` ns	`1.098`	`-`
`1_cubic_oneshot/q00001`	`565.3` ns	`567.0` ns	`0.997`	`1.18`
`1_cubic_oneshot/q10000`	`47895.2` ns	`47867.7` ns	`1.001`	`0.767`
`2_cubic_construct/g0100`	`1539.3` ns	`1535.9` ns	`1.002`	`1.092`
`2_cubic_construct/g1000`	`14128.3` ns	`14363.2` ns	`0.984`	`1.045`
`3_cubic_eval/q00001`	`23.9` ns	`23.3` ns	`1.026`	`1.064`
`3_cubic_eval/q00100`	`484.7` ns	`483.7` ns	`1.002`	`1.038`
`3_cubic_eval/q10000`	`47046.9` ns	`47029.5` ns	`1.0`	`1.039`
`4_linear_oneshot/q00001`	`27.1` ns	`27.1` ns	`1.0`	`0.998`
`4_linear_oneshot/q10000`	`19356.2` ns	`19406.7` ns	`0.997`	`1.011`
`5_linear_construct/g0100`	`34.4` ns	`40.0` ns	`0.862`	`0.897`
`5_linear_construct/g1000`	`251.3` ns	`270.2` ns	`0.93`	`0.934`
`6_linear_eval/q00001`	`10.8` ns	`10.3` ns	`1.048`	`1.046`
`6_linear_eval/q00100`	`208.1` ns	`199.7` ns	`1.042`	`1.046`
`6_linear_eval/q10000`	`19833.9` ns	`19164.3` ns	`1.035`	`1.05`
`7_cubic_range/scalar_query`	`9.2` ns	`9.2` ns	`1.0`	`1.041`
`7_cubic_vec/scalar_query`	`10.3` ns	`10.2` ns	`1.01`	`0.988`
`8_cubic_multi/construct_s001_q100`	`679.4` ns	`680.8` ns	`0.998`	`1.173`
`8_cubic_multi/construct_s010_q100`	`4879.4` ns	`4983.4` ns	`0.979`	`1.059`
`8_cubic_multi/construct_s100_q100`	`44345.8` ns	`44607.8` ns	`0.994`	`1.044`
`8_cubic_multi/eval_s001_q100`	`815.6` ns	`856.3` ns	`0.953`	`1.119`
`8_cubic_multi/eval_s010_q100`	`1850.6` ns	`1832.7` ns	`1.01`	`1.05`
`8_cubic_multi/eval_s010_q100_scalar_loop`	`2481.3` ns	`2474.3` ns	`1.003`	`1.031`
`8_cubic_multi/eval_s100_q100`	`11810.8` ns	`11814.5` ns	`1.0`	`1.024`
`8_cubic_multi/eval_s100_q100_scalar_loop`	`3508.2` ns	`3508.1` ns	`1.0`	`0.991`
`9_nd_oneshot/bicubic_2d`	`44711.5` ns	`46253.3` ns	`0.967`	`1.108`
`9_nd_oneshot/bilinear_2d`	`588.5` ns	`587.1` ns	`1.002`	`1.005`
`9_nd_oneshot/tricubic_3d`	`384598.0` ns	`386974.7` ns	`0.994`	`1.053`
`9_nd_oneshot/trilinear_3d`	`1066.6` ns	`1073.5` ns	`0.994`	`0.985`

⚠️ Performance Regression Confirmed ⚠️

After re-running 5 flagged benchmark(s) 10 time(s), 4 regression(s) confirmed.

Benchmark	Current	Previous	Imm. Ratio	Grad. Ratio	Tier
`1_cubic_oneshot/q00001`	`565.3` ns	`567.0` ns	`0.997`	`1.18`	gradual
`8_cubic_multi/construct_s001_q100`	`679.4` ns	`680.8` ns	`0.998`	`1.173`	gradual
`8_cubic_multi/eval_s001_q100`	`815.6` ns	`856.3` ns	`0.953`	`1.119`	gradual
`9_nd_oneshot/bicubic_2d`	`44711.5` ns	`46253.3` ns	`0.967`	`1.108`	gradual

Thresholds: immediate > 1.1x (vs latest master), gradual > 1.1x (vs sliding window)

This comment was automatically generated by Benchmark workflow.

Copilot

Pull request overview

This PR fixes incorrect output-element-type prediction for “duck” value/query type combinations (notably SVector × ForwardDiff.Dual), by replacing the legacy promote_type/_PromotableValue-based trait with a method-aware kernel-shape approach using Base.promote_op. It also refactors allocators to size output buffers from the trait (removing “sample-first” evaluation) and expands tests to pin carrier propagation, inference, and allocation guarantees.

Changes:

Introduces kernel-shape-based output eltype inference (_arithmetic_kernel_shape, _constant_kernel_shape) and routes allocator sizing through Base.promote_op.
Updates Constant interpolation to propagate query carriers via * one(dL) (and adjusts boundary/derivative short-circuits accordingly), plus ND deriv-zero carrier folding.
Adds extensive new test coverage for duck Tv × Tq (StaticArrays + ForwardDiff), allocation contracts, and Constant eltype behavior changes.

Reviewed changes

Copilot reviewed 44 out of 44 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
test/test_promotion_alloc.jl	Updates Constant batch eltype expectations; adds zero-allocation contract test for persistent in-place batch calls.
test/test_precision_vector_queries.jl	Updates Constant vector-call eltype expectation to track query carrier.
test/test_duck_tv_dual_tq.jl	New comprehensive duck-typing + AD + inference regression suite (SVector/Dual, Series, adjoints, ND carrier).
test/test_constant.jl	Updates Constant scalar behavior expectations for Int data + Float query; adds fully-Int-chain preservation checks.
test/test_constant_eltype.jl	Reworks Constant eltype contracts across 1D/ND, periodic modes, FillExtrap, and oneshot vs persistent behavior.
test/test_complex_constant.jl	Updates Constant Complex{Int} + Float query expectation (carrier propagation → `ComplexF64`).
test/test_bc_complex_int.jl	Updates Constant Complex{Int} + Float query expectation (carrier propagation → `ComplexF64`).
src/core/utils.jl	Refactors output-eltype trait logic and adds kernel-shape helpers for method-aware inference.
src/core/interpolant_protocol.jl	Routes default output trait through `_arithmetic_kernel_shape`; removes scalar-path `convert`; adjusts ND deriv-zero fill carrier folding.
src/core/adjoint_protocol.jl	Switches adjoint allocation eltype inference to kernel-shape-based `_output_eltype`.
src/linear/linear_interpolant.jl	Notes Linear uses default arithmetic kernel shape route.
src/linear/linear_anchor.jl	Anchored-vector allocator now uses kernel-shape output-eltype inference.
src/linear/linear_oneshot.jl	One-shot linear batch allocator uses kernel-shape output-eltype inference.
src/linear/linear_series_interp.jl	Linear Series allocators use kernel-shape output-eltype inference.
src/linear/linear_oneshot_series.jl	Linear Series oneshot allocators use kernel-shape output-eltype inference.
src/linear/nd/linear_nd_types.jl	Adds note that Linear ND inherits default arithmetic kernel shape route.
src/linear/nd/linear_nd_oneshot.jl	ND oneshot allocators use kernel-shape output-eltype inference.
src/quadratic/quadratic_anchor.jl	Quadratic anchored-vector callable now sizes output via kernel-shape trait.
src/quadratic/quadratic_oneshot.jl	Quadratic oneshot batch allocator uses kernel-shape output-eltype inference.
src/quadratic/quadratic_series_interp.jl	Quadratic Series allocators use kernel-shape output-eltype inference.
src/quadratic/quadratic_oneshot_series.jl	Quadratic Series oneshot allocators use kernel-shape output-eltype inference.
src/quadratic/nd/quadratic_nd_oneshot.jl	Quadratic ND oneshot allocators use kernel-shape output-eltype inference.
src/cubic/cubic_oneshot.jl	Cubic oneshot allocators use kernel-shape output-eltype inference.
src/cubic/cubic_series_interp.jl	Cubic Series allocators use kernel-shape output-eltype inference.
src/cubic/cubic_oneshot_series.jl	Cubic Series oneshot allocators use kernel-shape output-eltype inference.
src/cubic/nd/cubic_nd_oneshot.jl	Cubic ND oneshot allocators use kernel-shape output-eltype inference.
src/hermite/hermite_interpolant.jl	Adds Hermite-family `_output_eltype` override using a two-chain kernel-shape pattern over `y` and `dy`.
src/hermite/hermite_oneshot.jl	Mirrors Hermite-family two-chain output-eltype sizing in oneshot path.
src/pchip/pchip_oneshot.jl	PCHIP oneshot allocator uses kernel-shape output-eltype inference.
src/cardinal/cardinal_oneshot.jl	Cardinal oneshot allocator uses kernel-shape output-eltype inference.
src/akima/akima_oneshot.jl	Akima oneshot allocator uses kernel-shape output-eltype inference.
src/constant/constant_types.jl	Updates Constant storage commentary (raw storage; widening handled at evaluation).
src/constant/constant_interpolant.jl	Defines `_constant_kernel_shape` and routes Constant output-eltype trait through kernel shape.
src/constant/constant_kernels.jl	Changes Constant kernels to multiply selected value by `one(dL)` to propagate query carrier.
src/constant/constant_oneshot.jl	Updates right-edge short-circuit to preserve carrier; allocator sizes output via kernel-shape trait; updates docs.
src/constant/constant_anchor.jl	Ensures anchor short-circuits preserve carrier; anchored-vector allocator sizes output via kernel-shape trait.
src/constant/constant_series_interp.jl	Improves duck-query sizing for Series constant path; loosens in-place output container typing.
src/constant/constant_oneshot_series.jl	Uses `_output_eltype` fallback for duck queries in Constant Series oneshot.
src/constant/constant_adjoint.jl	Comment updates noting buffer eltype comes from adjoint protocol.
src/constant/nd/constant_nd_interpolant.jl	Comment updates on raw storage and kernel-based widening.
src/constant/nd/constant_nd_types.jl	Constant ND `_output_eltype` trait now routes through `_constant_kernel_shape`.
src/constant/nd/constant_nd_eval.jl	Propagates per-axis carrier in selection result; derivative-zero uses `prod(one, q_eval)`.
src/constant/nd/constant_nd_oneshot.jl	Derivative branches fold per-axis carrier; batch allocator eltype uses kernel-shape trait; removes overly-narrow type assertions.
src/constant/nd/constant_nd_adjoint.jl	Comment update to align with protocol-sized adjoint buffers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov · 2026-05-20T19:10:04Z

Codecov Report

❌ Patch coverage is 96.03960% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.43%. Comparing base (51d3afb) to head (c86ea3c).

Files with missing lines	Patch %	Lines
src/core/utils.jl	81.25%	3 Missing ⚠️
src/constant/constant_interpolant.jl	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #146      +/-   ##
==========================================
- Coverage   96.45%   96.43%   -0.02%     
==========================================
  Files         143      143              
  Lines       11957    11978      +21     
==========================================
+ Hits        11533    11551      +18     
- Misses        424      427       +3

Files with missing lines	Coverage Δ
src/akima/akima_oneshot.jl	`100.00% <100.00%> (ø)`
src/cardinal/cardinal_oneshot.jl	`100.00% <100.00%> (ø)`
src/constant/constant_adjoint.jl	`100.00% <ø> (ø)`
src/constant/constant_anchor.jl	`98.59% <100.00%> (+0.02%)`	⬆️
src/constant/constant_kernels.jl	`100.00% <100.00%> (ø)`
src/constant/constant_oneshot.jl	`100.00% <100.00%> (ø)`
src/constant/constant_oneshot_series.jl	`100.00% <100.00%> (ø)`
src/constant/constant_series_interp.jl	`97.57% <100.00%> (+0.01%)`	⬆️
src/constant/constant_types.jl	`100.00% <ø> (ø)`
src/constant/nd/constant_nd_adjoint.jl	`100.00% <ø> (ø)`
... and 27 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…rait form `_constant_kernel_shape(yv, q) = yv * one(q)` only saw the query type, so `Dual` grid + `Float` xq paths under-predicted the kernel's true return type — `_constant_vector_loop!` then tried to store a `Dual` kernel result into a `Vector{Float64}` buffer (`MethodError` on Julia LTS Extensions CI). Mirror the actual kernel arithmetic: `(xL, yv, xq) = yv * one(xq - xL)`. `Base.promote_op` now sees `xq - xL` and infers the real `dL` carrier, so duck grids (Dual, Measurement, …) flow through `promote_type(Tg, Tq)` naturally. All callsites collapse to the canonical 4-arg `(kernel_op, Tg, Tv, Tq)` form that Linear's `_arithmetic_kernel_shape` already used — Constant was the lone outlier on the 2-arg `(Tv, Tq)` shape. Int×Int×Int still infers `Int`; SVector × Dual still infers SVector{Dual}; the Rational chain pin and the Int batch-vs-scalar contract are unchanged (only the trait's arg count changes, not its predictions).

…-arg trait `test/ext/test_series_dual_grid.jl` on Julia LTS still tripped `MethodError: Float64(::Dual)` after the kernel-shape rename — Constant series oneshot + persistent paths were the last holdouts of the `Tq <: _PromotableValue ? Tv : _output_eltype(Tv, Tq)` guard. That gate ignores `Tg`, so `Dual` grid + `Float` xq short-circuited to `Vector{Tv}` (Float-only) while the kernel returned `Dual` via `xq - xL`. Migrate all 4 series sites to the canonical 4-arg trait `_output_eltype(_constant_kernel_shape, Tg, Tv, Tq)`: - `constant_oneshot_series.jl:83` — scalar oneshot - `constant_oneshot_series.jl:288` — vector oneshot - `constant_series_interp.jl:409` — persistent scalar callable - `constant_series_interp.jl:461` — persistent vector callable Linear/Cubic/Quadratic series already use the canonical form; Hermite adds a disjoint Tdy promote chain on top. Constant was the only family still on the legacy gate. Test pins updated to match the new Series contract (1D plain path's `Int y + Float xq → Float` already established in PR #144; Series now agrees instead of pinning a stale `Vector{Int}` raw-Tv hold-out): - `Vector{Int}` → `Vector{Float64}` for `Float xq` cases (lines 496, 622-624) - Added explicit pin: fully-Int chain (`Int grid + Int y + Int xq`) stays `Vector{Int}` — Series ↔ plain agree on both regimes. PR description's "Out of scope" item that called out a Series storage Float-forcing follow-up is now closed for Constant. Cubic/Quadratic Tz/Tc coefficient widening is still on the legacy enumeration as documented.

PR #146 reviews (Codex P2 #1/#2 + Copilot) flagged that scalar OOB extrap paths returned raw `Tv` for non-Number value types — `SVector` y with `ClampExtrap` / `FillExtrap` + `Dual` xq returned `SVector{Float64}` scalar but `SVector{Dual}` from the trait-sized batch buffer, breaking the scalar/batch consistency the PR is meant to restore. Root cause: `_promote_extrap_val(val::Number, xq::Number)` carrier- propagates via `val + zero(xq) * zero(val)`, but the non-Number fallback `_promote_extrap_val(val, xq) = val` returns raw — the actual `SVector` boundary value passes through `_eval_extrapolation` → `_promote_extrap_val` → raw return, with no carrier added. ND scalar takes the same path via `_try_fill_oob` → `_fill_extrap_result` → `_promote_extrap_val`. Add an `AbstractArray` overload that broadcasts the same carrier-prop pattern (`val .+ zero(xq) .* zero(eltype(val))`); same idea for the derivative branch's `_promote_extrap_zero`. `SVector × Dual` now resolves to `SVector{Dual}` on scalar OOB, matching in-domain kernel output and the trait-sized batch buffer. Constant ND oneshot derivative empty-fast-path also missed the query carrier — `nq == 0 && return Vector{Tv}(undef, 0)` ignored Tq. The non-empty branch already synthesises `0 * first(data) * prod(one, first_q)`. Route the empty path through the same kernel-shape trait so eltypes agree. RED-then-GREEN pins added under "OOB carrier — scalar/batch consistency for non-Number Tv" + "ND oneshot derivative — empty vs non-empty query eltype" in `test/test_duck_tv_dual_tq.jl`. 36 OOB combinations (Linear/Cubic/Quadratic/Constant × Clamp/Fill × 1D scalar/batch + ND analogues) all green. Follow-up: the `AbstractArray` overload here is a stop-gap. The discussion under design item 1 in `claudedocs/TODO/output_eltype_carrier_followups.md` moves to a `Tr`-driven `_eval_extrapolation` that uses `convert(Tr, val)` instead of arithmetic carrier inference — eliminates broadcasting overhead, removes the `zero(Tr)` requirement, preserves NaN propagation via the Constant-kernel pattern `0 * convert(Tr, val)`. Out of scope here.

…pe contract Three pins in `test_complex_constant_series.jl` predated the Series canonical 4-arg trait migration (commit d28d7b0) and asserted the legacy raw-Tv contract: - "Integer grid + Complex values" (line 65, 69): `sitp(5.5)` with `Int` grid + `Complex{Int}` y → kernel-shape `yv * one(xq - xL)` promotes via `Float - Int = Float` → `ComplexF64`. Was pinned as `Vector{Complex{Int}}` / `5 + 10im`. Updated to `Vector{ComplexF64}` / `5.0 + 10.0im`. Added explicit fully-Int chain pin (`sitp(5)` → stays `Vector{Complex{Int}}`) mirroring the 1D plain-path pin in `test_constant_eltype.jl`. - "Eltype contract: selection kernel preserves Tv" (line 284): Float32 grid + Float32 y + Float64 xq → trait promotes to Float64. Was pinned as Float32 (legacy raw-Tv claim). Renamed to "kernel-shape trait promotes via xq - xL" and split into two pins: Float64 xq → Float64, Float32 xq → Float32 (fully-Float32 chain stays Float32). No source change — the new Series promote behavior was already established in commit d28d7b0; these tests just hadn't been refreshed. Local `test_complex_constant_series.jl` now passes on both Julia 1.x and the LTS-failure scenario reported on CI.

…contract Two more pins predated the Series canonical 4-arg trait migration (commit d28d7b0) and asserted the legacy raw-Tv contract under a Float-grid + Int-y + Float-xq chain — that chain now (correctly) promotes to Float via `xq - xL`. - "Eltype contract: Integer series stays Int" (line 144): `Float grid + Int y + Float xq` was pinned `Vector{Int}`. Renamed to "fully-Int chain stays Int" and migrated setup to `Int grid + Int y + Int xq` — matches the intent ("Int chain preserved") and the 1D plain-path pin in test_constant_eltype.jl's "Fully-Int chain — Series stays Int" testset. - "FillExtrap fill_value must be compatible with y eltype" (line 155): the InexactError contract requires an `Int` output buffer to catch the Float fill_value mismatch. Under the new promote rule, the original `Float grid + Int y + Float xq` setup gives a Float buffer that silently absorbs the 0.5 fill — exactly what this contract was trying to catch. Migrated to fully-Int chain so the Int buffer reinstates the strict-on-assign behavior. No source change — same theme as `test_complex_constant_series.jl` commit 82085fc.

mgyoo86 added 19 commits May 19, 2026 15:57

Runic formatting

198fef6

mgyoo86 requested a review from Copilot May 20, 2026 18:41

Copilot started reviewing on behalf of mgyoo86 May 20, 2026 18:41 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread src/core/adjoint_protocol.jl

Comment thread src/core/adjoint_protocol.jl

Comment thread src/core/interpolant_protocol.jl

Comment thread src/constant/constant_kernels.jl

Comment thread src/constant/nd/constant_nd_oneshot.jl Outdated

mgyoo86 added 6 commits May 20, 2026 12:20

Runic formatting

c86ea3c

mgyoo86 mentioned this pull request May 20, 2026

fix: add specialized promotion rule for SArray and MArray #145

Closed

mgyoo86 merged commit c47519b into master May 20, 2026
14 checks passed

mgyoo86 deleted the fix/output-eltype-duck-restoration branch May 20, 2026 21:51

mgyoo86 referenced this pull request May 20, 2026

bump version to 0.4.11 and update benchmark results in README.md

616b106

mgyoo86 mentioned this pull request May 27, 2026

(fix+refac): Cell-local NaN + Tq carrier through deriv-zero paths #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(fix + refac): Method-aware output-eltype via kernel-shape op trait#146

(fix + refac): Method-aware output-eltype via kernel-shape op trait#146
mgyoo86 merged 25 commits into
masterfrom
fix/output-eltype-duck-restoration

mgyoo86 commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mgyoo86 commented May 20, 2026

Summary

What this PR introduces

Kernel-shape op trait

Duck-Tv × duck-Tq end-to-end

Sample-first allocator removed

Behavior changes

Out of scope (explicit follow-ups)

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FastInterpolations.jl Benchmarks

⚠️ Performance Regression Confirmed ⚠️

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 20, 2026 •

edited

Loading

codecov Bot commented May 20, 2026 •

edited

Loading