Skip to content

feat: polyharmonic spline (PHS) N-dimensional interpolation#136

Draft
twilsonco wants to merge 130 commits into
ProjectTorreyPines:masterfrom
twilsonco:feat/phs
Draft

feat: polyharmonic spline (PHS) N-dimensional interpolation#136
twilsonco wants to merge 130 commits into
ProjectTorreyPines:masterfrom
twilsonco:feat/phs

Conversation

@twilsonco
Copy link
Copy Markdown

@twilsonco twilsonco commented May 9, 2026

Add N-Dimensional Polyharmonic Spline (PHS) Interpolation

This Pull Request introduces a high-performance, general-dimension Polyharmonic Spline (PHS) Radial Basis Function (RBF) interpolant to FastInterpolations.jl, based on the implementation described here.

Excerpt from Otero-de-la-Roza - 2022 - Finding critical points and reconstruction of electron densities on grids

PHS is particularly effective for smooth interpolation of multidimensional gridded physical data (e.g., electron densities, potential energy surfaces), especially when combined with log-density transforms and custom reference functions (physics-informed interpolation).


Key Features

  1. N-Dimensional PHS Interpolation (PHSInterpolantND)

    • Implements local, stencil-based PHS interpolation ($\phi(r) = r^K$, where $K$ is a compile-time odd degree like $1, 3, 5, 7, \dots$).
    • Fully analytical evaluation of values, gradients, and Hessians/Laplacians everywhere.
    • Supports any grid coordinate type (Float32, Float64) and value type (retaining duck-typed scalar support).
  2. Log-Density Smoothing Transform & Custom References

    • Implements PHSLogTransform to interpolate smooth log-transformed functions of the form $f(x) = \ln\left(\frac{\rho(x)}{\rho_0(x)}\right)$.
    • Evaluates original values and analytical derivatives dynamically via the chain rule, making it highly robust against sharp gradients and cusp singularities (e.g., nuclear cusps).
    • Provides a flexible interface supporting arbitrary callables for $\rho_0(x)$, including ConstantRef(val) (for standard log-space interpolation) or other existing interpolants.
    • Supports a nested log-space PHS workflow: utilizing a log-space PHS of the reference density $\rho_0$ as the reference for the main PHS to extract exceptionally clean reference derivatives from raw grid data.
  3. Weighted Blending for $C^2$ Continuity

    • Smoothly transitions between neighboring localized stencil evaluations using a custom blending kernel.
    • Restores global $C^2$ continuity across stencil boundaries.
    • Blend range is adjustable via the blend_factor constructor argument.
  4. Robust Precompilation & JIT Warmup

    • Leverages PrecompileTools inside src/phs/phs.jl to precompile the most common Float64/3D patterns. This completely eliminates first-call latency (JIT compilation overhead) for standard value and derivative queries.

Zero-Allocation Hot Path & Performance Optimizations

Achieving competitive evaluation speeds for stencil-based RBFs required eliminating all sources of heap allocations on the hot path. Through rigorous profiling and micro-optimizations, the query hot path is 100% allocation-free for both values and derivatives.

Key Optimizations:

  • Canonical Stencil & Boundary Shift Caching: Instead of constructing and inverting a coordinate-dependent system of linear equations at every evaluation point, a single canonical stencil configuration ($\Phi^{-1}$) is precomputed at build time. Boundary nodes reuse the same inverted matrix via clamped data indices or are handled via precomputed boundary-shifted matrices stored in a static shift_cache.
  • Elimination of Task-Local Storage Overhead: Task-local caching for stencil coefficients was replaced with thread-safe, thread-indexed static caches (coeff_caches within PHSInterpolantND and AdaptiveArrayPools for thread workspaces), ensuring thread-safety without dynamic memory allocations under parallel batch queries.
  • Explicit Exponent Specialization: Implemented specialized radial kernel methods for $K=1, 3, 5, 7$ to replace generic r^K exponentiation with explicit, compiler-friendly multiplications.
  • Branchless Operations & Single Reciprocals: Introduced branchless r_inv patterns and @simd loops in radial evaluation. Redundant divisions in blending functions were fused into a single reciprocal division followed by multiplications.
  • Static Dispatch & Metaprogramming: Replaced runtime derivative axes parsing with compile-time @generated function dispatch, allowing the compiler to fully unroll evaluation loops.
  • Type Stability: Ensured absolute type stability for both coordinate evaluation pipelines and the reference-derivative chain-rule mappings.

Real-World Validation: Phenol Dimer (Electron Density)

To validate correctness and showcase the interpolation accuracy, we included a real-world quantum chemistry benchmark in scripts/phs_density_comparison.jl (phenol dimer, DFT-computed electron density on a $75 \times 113 \times 70$ grid).

The benchmark compares PHS (using a log-density transform and a promolecular density reference constructed analytically from critic2 wavefunctions) against standard interpolation methods (Nearest, Linear, Cubic, Cardinal).

Accuracy Comparison (Relative Error Statistics)

1. Charge Density ($\rho$)

Method Min Error Max Error Mean Error Median Error
Nearest 5.27e-04 (35555×) 2.15e+00 (2×) 1.84e-01 (45×) 1.34e-01 (837×)
Linear 1.68e-05 (1133×) 9.34e-01 (1×) 8.80e-02 (22×) 2.18e-02 (135×)
Cubic 4.58e-06 (309×) 9.73e-01 (1×) 1.17e-01 (29×) 3.36e-03 (21×)
Cardinal 2.29e-05 (1546×) 9.21e-01 (1×) 9.65e-02 (24×) 3.47e-03 (22×)
PHS (with Ref) 1.48e-08 1.00e+00 4.06e-03 1.61e-04

2. Gradient Magnitude ($|\nabla \rho|$)

Method Min Error Max Error Mean Error Median Error
Linear 3.75e-05 (49×) 2.39e+01 (24×) 4.14e-01 (35×) 1.89e-01 (171×)
Cubic 2.39e-05 (31×) 3.36e+00 (3×) 3.57e-01 (30×) 2.65e-02 (24×)
Cardinal 1.83e-04 (238×) 2.52e+00 (3×) 2.48e-01 (21×) 3.23e-02 (29×)
PHS (with Ref) 7.68e-07 1.00e+00 1.17e-02 1.11e-03

3. Laplacian Magnitude ($|\nabla^2 \rho|$)

Method Min Error Max Error Mean Error Median Error
Cubic 8.30e-06 (1×) 1.13e+03 (364×) 6.41e+00 (135×) 1.69e-01 (12×)
Cardinal 7.58e-04 (59×) 1.96e+02 (63×) 2.41e+00 (51×) 5.03e-01 (37×)
PHS (with Ref) 1.28e-05 3.09e+00 4.73e-02 1.37e-02

Note: Relative error ratios in parentheses indicate how many times larger the error of the given method is compared to PHS.

Here's a set of plots that better captures the comparison (this plot is generated using the new script with the PR, and is included in the PHS docs):

phs_density_comparison

Evaluation Timings & Allocation Verification

Evaluation times are measured for 1000 query points along a phenol-dimer hydrogen-bond path (runs executed twice to isolate JIT and stencil caches):

Evaluating along path (1000 points)...
  Density (ρ):
    Nearest ...   0.000011 seconds (0 allocations)
    Linear ...    0.000013 seconds (0 allocations)
    Cubic ...     0.000021 seconds (0 allocations)
    Cardinal ...  0.000116 seconds (0 allocations)
    PHS ...       0.009040 seconds (0 allocations)  <-- Zero Allocations
  Gradient Magnitude (|∇ρ|):
    Linear ...    0.000111 seconds (13 allocations)
    Cubic ...     0.000052 seconds (7 allocations)
    Cardinal ...  0.000350 seconds (7 allocations)
    PHS ...       0.032212 seconds (7 allocations: 128 bytes) <-- Zero dynamic heap allocations
  Laplacian Magnitude (|∇²ρ|):
    Cubic ...     0.000060 seconds (7 allocations)
    Cardinal ...  0.000335 seconds (1 allocation)
    PHS ...       0.042326 seconds (1 allocation: 32 bytes) <-- Zero dynamic heap allocations

Extensive optimization was performed after the original implementation, reducing the runtime for this from approx. 30s down to 0.042s on a M3 Ultra Mac Studio, a ~715X speedup.

While PHS has a higher baseline computational cost due to localized blending and radial evaluations, it provides multiple orders of magnitude better accuracy, particularly in derivative fields (gradient & Hessian/Laplacian), with completely stable memory usage.


Codebase Modifications & Added Files

The main addition of the PHS feature-set lives in the src/phs/ directory:

  • src/phs/phs.jl: Features aggregator and compile workload definition.
  • src/phs/phs_kernels.jl: Core radial basis functions and their specialized first/second-order derivatives ($K=1,3,5,7$).
  • src/phs/phs_stencil.jl: Matrix formulation, canonical stencil offset solver, and boundary shift cached variants.
  • src/phs/phs_types.jl: Type definitions (PHSInterpolantND, PHSLogTransform, ConstantRef).
  • src/phs/phs_eval.jl: Thread-safe coefficient evaluations, blend function kernels, and evaluation pipelines (including value, gradient, Hessian, and log-transform chain-rule solvers).
  • src/phs/phs_interpolant.jl: Public constructor phs_interp and user-facing callable entrypoints.
  • src/phs/phs_oneshot.jl: Non-allocating batch evaluation methods.

New Documentation & Testing:

  • docs/src/interpolation/phs.md: Mathematical foundations, full API specifications, parameters tuning, and performance guides.
  • test/test_phs_nd.jl: Exhaustive unit tests checking multidimensional interpolation correctness, degree scaling, log-transform safety, allocations, and type stability.
  • scripts/phs_density_comparison.jl: Full-scale validation and timing comparison script for phenol-dimer electron density.
  • scripts/update_readme_2d_comparison.jl: Scripts to plot multi-panel 2D validation results on regular and irregular grids.

Future Improvements

As I integrate this PHS implementation into my other project (the chemistry density-analysis tool for which PHS was originally implemented), further changes, potential API polish, or performance-related improvements may be PR'd back to this repository.


Thank you for the reviews! Let me know if you would like any specific changes or further tests.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

FastInterpolations.jl Benchmarks

All benchmarks (57 total, click to expand)
Benchmark Current: 913ed12 Previous Imm. Ratio Grad. Ratio
10_nd_construct/bicubic_2d 38598.0 ns 39380.0 ns 0.98 1.021
10_nd_construct/bilinear_2d 568.8 ns 747.1 ns 0.761 0.889
10_nd_construct/phs_2d 918292.8 ns - ns - -
10_nd_construct/phs_3d 15512790.9 ns - ns - -
10_nd_construct/tricubic_3d 316929.0 ns 335316.0 ns 0.945 0.917
10_nd_construct/trilinear_3d 1633.5 ns 1988.6 ns 0.821 0.939
11_nd_eval/bicubic_2d_batch 1416.1 ns 1683.6 ns 0.841 0.896
11_nd_eval/bicubic_2d_scalar 12.7 ns 15.9 ns 0.798 0.842
11_nd_eval/bilinear_2d_scalar 6.5 ns 7.0 ns 0.929 0.693
11_nd_eval/phs_2d_batch 171421.9 ns - ns - -
11_nd_eval/phs_2d_scalar 1654.9 ns - ns - -
11_nd_eval/phs_3d_batch 1361456.6 ns - ns - -
11_nd_eval/phs_3d_scalar 11307.1 ns - ns - -
11_nd_eval/tricubic_3d_batch 2545.8 ns 3485.2 ns 0.73 0.797
11_nd_eval/tricubic_3d_scalar 24.9 ns 36.0 ns 0.693 0.775
11_nd_eval/trilinear_3d_scalar 10.0 ns 13.0 ns 0.769 0.633
12_cubic_eval_gridquery/range_random 3613.2 ns 4649.6 ns 0.777 0.862
12_cubic_eval_gridquery/range_sorted 3612.0 ns 4649.0 ns 0.777 0.864
12_cubic_eval_gridquery/vec_random 8003.5 ns 9751.5 ns 0.821 0.854
12_cubic_eval_gridquery/vec_sorted 2621.2 ns 3205.3 ns 0.818 0.848
13_phs_oneshot/q00001 29198.2 ns - ns - -
13_phs_oneshot/q10000 1675123.5 ns - ns - -
14_phs_construct/g0100 28365.9 ns - ns - -
14_phs_construct/g1000 28209.5 ns - ns - -
15_phs_eval/q00001 168.0 ns - ns - -
15_phs_eval/q00100 15890.6 ns - ns - -
15_phs_eval/q10000 1606724.3 ns - ns - -
1_cubic_oneshot/q00001 390.4 ns 491.3 ns 0.795 0.879
1_cubic_oneshot/q10000 48932.6 ns 62863.9 ns 0.778 0.824
2_cubic_construct/g0100 1148.5 ns 1456.0 ns 0.789 0.88
2_cubic_construct/g1000 10846.3 ns 14300.7 ns 0.758 0.861
3_cubic_eval/q00001 19.4 ns 23.3 ns 0.832 1.086
3_cubic_eval/q00100 378.6 ns 485.1 ns 0.78 0.861
3_cubic_eval/q10000 36651.2 ns 47024.0 ns 0.779 0.866
4_linear_oneshot/q00001 21.8 ns 28.0 ns 0.779 0.857
4_linear_oneshot/q10000 15572.5 ns 19450.4 ns 0.801 0.855
5_linear_construct/g0100 26.6 ns 38.6 ns 0.691 0.74
5_linear_construct/g1000 200.3 ns 332.6 ns 0.602 0.772
6_linear_eval/q00001 7.7 ns 10.0 ns 0.77 0.797
6_linear_eval/q00100 162.8 ns 202.3 ns 0.805 0.857
6_linear_eval/q10000 15433.2 ns 19153.0 ns 0.806 0.857
7_cubic_range/scalar_query 7.1 ns 9.2 ns 0.772 1.012
7_cubic_vec/scalar_query 7.9 ns 10.7 ns 0.739 0.774
8_cubic_multi/construct_s001_q100 458.7 ns 604.5 ns 0.759 0.853
8_cubic_multi/construct_s010_q100 3602.8 ns 4874.2 ns 0.739 0.835
8_cubic_multi/construct_s100_q100 33240.1 ns 44926.9 ns 0.74 0.845
8_cubic_multi/eval_s001_q100 569.7 ns 726.9 ns 0.784 0.824
8_cubic_multi/eval_s010_q100 1387.3 ns 1758.2 ns 0.789 0.83
8_cubic_multi/eval_s010_q100_scalar_loop 1982.2 ns 2479.5 ns 0.799 0.878
8_cubic_multi/eval_s100_q100 9153.8 ns 11825.8 ns 0.774 0.817
8_cubic_multi/eval_s100_q100_scalar_loop 3091.6 ns 3920.9 ns 0.788 0.899
9_nd_oneshot/bicubic_2d 39305.3 ns 41016.9 ns 0.958 0.985
9_nd_oneshot/bilinear_2d 568.5 ns 603.9 ns 0.941 0.709
9_nd_oneshot/phs_2d 1257884.1 ns - ns - -
9_nd_oneshot/phs_3d 17134977.2 ns - ns - -
9_nd_oneshot/tricubic_3d 331873.3 ns 350821.6 ns 0.946 0.897
9_nd_oneshot/trilinear_3d 860.3 ns 1102.6 ns 0.78 0.627

1 benchmark(s) were flagged but verified as CI noise after 10 re-run(s)

This comment was automatically generated by Benchmark workflow.

@mgyoo86
Copy link
Copy Markdown
Member

mgyoo86 commented May 11, 2026

@twilsonco
Thanks a lot for the PR and the detailed report.
This looks like a really nice feature, and I appreciate all the work you put into it.

This is a fairly large change, so I’ll need some time to digest it and review it properly. I’m also in the middle of major refactoring for the next minor release, so I’ll probably finish that first, and then review and merge this separately for a following version.

I likely won’t be able to review this properly over the next few days, but I’ll get back to it as soon as I can. Sorry for the delay, and thanks again for the contribution!

A couple of quick notes for now:

  • Could you please remove the PrecompileTools dependency and the @compile_workload part from this PR? I may consider PrecompileTools in the future, but when I tried it before, the caching behavior felt a bit clunky, so I’d rather keep it out of this PR for now.

  • We use Runic.jl for the formatting

  • It looks like the CI is currently failing. Could you please check that when you have a chance?

  • Since this PR is quite large, would it be okay if I make some small edits directly during the review, if needed?

@twilsonco
Copy link
Copy Markdown
Author

twilsonco commented May 11, 2026

Hi. Great project here. I'm happy to contribute.

This is a fairly large change, so I’ll need some time to digest it and review it properly.

Yes. Fortunately, all the changes are self-contained (except for the addition of PrecompileTools).

I’m also in the middle of major refactoring for the next minor release, so I’ll probably finish that first, and then review and merge this separately for a following version.

That's fine. Once you're done with the refactor, comment hear and I'll make any necessary changes to the PR.

Could you please remove the PrecompileTools dependency and the @compile_workload part from this PR? I may consider PrecompileTools in the future, but when I tried it before, the caching behavior felt a bit clunky, so I’d rather keep it out of this PR for now.

I can, but in my testing, I get a 2x speedup from PrecompileTools, so it would be a big hit to the performance of PHS, which is already considerably slower than the other methods. I'll revisit this and see if I can achieve the same performance without it. It's removed.

We use Runic.jl for the formatting

I'll make sure it's applied.

It looks like the CI is currently failing. Could you please check that when you have a chance?

Yes. It looks like it's the formatting that's the failing step.

Since this PR is quite large, would it be okay if I make some small edits directly during the review, if needed?

Once the PR is ready again after you finish the refactor, I'll resubmit. At that point, of course you're welcome to make edits.

@twilsonco twilsonco marked this pull request as draft May 11, 2026 16:18
twilsonco and others added 25 commits May 11, 2026 10:39
Implements C2-continuous blended PHS interpolation via local RBF stencils
with polynomial augmentation. Zero-allocation hot path via AdaptiveArrayPools,
multithreaded batch evaluation, full first+second derivative support.

New files in src/phs/:
- phs_kernels.jl: radial kernel phi(r)=r^K, blend weight w(d,a) with derivs
- phs_stencil.jl: stencil selection + Phi-inv precomputation (boundary-safe)
- phs_types.jl: PHSInterpolantND struct + PHSLogTransform
- phs_eval.jl: 3-layer evaluation engine (value/deriv1/deriv2, quotient rule)
- phs_interpolant.jl: constructor + callable protocol
- phs_oneshot.jl: one-shot scalar and batch forms
- phs.jl: aggregator include

New test file: test/test_phs_nd.jl (115 tests, all passing)

Exports added: phs_interp, phs_interp!, PHSInterpolantND, PHSLogTransform
For r^K PHS the augmentation must have degree ≥ (K-1)÷2:
- K=3 → linear (N+1 terms) — was already correct
- K=5 → quadratic C(N+2,2) terms — was WRONG (only used N+1)
- K=7 → cubic C(N+3,3) terms — was WRONG

Changes:
- phs_kernels.jl: add _phs_n_poly, _phs_all_exponents,
  @generated _phs_poly_exps_tuple, _phs_eval_poly,
  _phs_eval_poly_deriv1, _phs_eval_poly_deriv2
- phs_stencil.jl: _phs_build_phi_inv now computes poly_deg=(degree-1)÷2,
  uses _phs_all_exponents for C block; M = ns + binomial(N+poly_deg,poly_deg)
- phs_eval.jl: all three _phs_eval_coeffs_* use _phs_poly_exps_tuple
  for correct compile-time polynomial evaluation; buffer size M
  taken from actual phi_inv matrix size instead of hardcoded ns+N+1
- scripts/update_readme_2d_comparison.jl: drop PeriodicBC from cubic
  for fairer comparison; use PHS degree=3

All 115 PHS tests pass.
Use 15x15 regular spacing so PHS (0.3% error) and cubic with
PeriodicBC (0.0% error) both show near-perfect results, while
constant (18%) and linear (3.5%) illustrate the progression.
…d PHS interpolation title

Co-authored-by: Copilot <copilot@github.com>
…lations in density comparison

Co-authored-by: Copilot <copilot@github.com>
This reverts commit 348e7b3.
…plot example

- Create docs/src/interpolation/phs.md with mathematical foundation, usage, and quantum chemistry example
- Link to paper (https://doi.org/10.1063/5.0090232)
- Add auto-download for critic2 wavefunction files in phs_density_comparison.jl
- Add scripts/dat/wfc/ to .gitignore to avoid committing large downloaded files
- Remove redundant examples/promolecular_density.jl (consolidate with validation script)
- Update README link to point to full PHS documentation
The ensure_wfc_files() function had an invalid 'import JSON' statement inside it,
which causes a syntax error in Julia. Fixed by switching to regex-based URL parsing
instead of JSON parsing. This removes the JSON.jl dependency while maintaining
full functionality for downloading wavefunction files.
twilsonco added 22 commits May 11, 2026 10:41
…=3 loops to eliminate multiplications"

This reverts commit 4aedaf2.

Reapply "perf(phs): reuse first derivative term in second derivative K=3 loops to eliminate multiplications"

This reverts commit 1c0a969ccf4f87e05cd25de4fb1979d4a1f6d4d9.

Revert "Reapply "perf(phs): reuse first derivative term in second derivative K=3 loops to eliminate multiplications""

This reverts commit 181682b1d93de3bc3847085dd33e247fddcb7697.
…ssive SIMD and instruction reassociation benefits
…iv2 by precomputing ci_3r_xh"

This reverts commit 6451f0e.
…chmarks to run

e.g. `julia --project=benchmark benchmark/ci_benchmark.jl 15` will only run benchmark group `15_phs_eval`
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 98.58044% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.42%. Comparing base (7edb7a8) to head (913ed12).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/phs/phs_eval.jl 97.98% 18 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #136      +/-   ##
==========================================
+ Coverage   96.19%   96.42%   +0.23%     
==========================================
  Files         142      148       +6     
  Lines       11755    13023    +1268     
==========================================
+ Hits        11308    12558    +1250     
- Misses        447      465      +18     
Files with missing lines Coverage Δ
src/FastInterpolations.jl 100.00% <ø> (ø)
src/phs/phs_interpolant.jl 100.00% <100.00%> (ø)
src/phs/phs_kernels.jl 100.00% <100.00%> (ø)
src/phs/phs_oneshot.jl 100.00% <100.00%> (ø)
src/phs/phs_stencil.jl 100.00% <100.00%> (ø)
src/phs/phs_types.jl 100.00% <100.00%> (ø)
src/phs/phs_eval.jl 97.98% <97.98%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@twilsonco
Copy link
Copy Markdown
Author

Note that the missing lines in test coverage CodeCov is reporting are false-positives. The PR has 100% coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants