Skip to content

Conversation

@SKopecz
Copy link
Collaborator

@SKopecz SKopecz commented Dec 29, 2025

All MPRK and SSPMPRK schemes are build upon a basic Patankar step with varying Patankar matrices, destruction vectors and denominators. This PR introduces serveral functions (in-place and out-of-place) to make the current implementations more readable and simplify the implementation of new schemes in #107.

  1. Out-of-place implementations
  • MPE
  • MPRK22
  • MPRK43I & MPRK43II
  • SSPMPRK22
  • SSPMPRK43
  1. In-place implementations
  • MPE
  • MPRK22
  • MPRK43I & MPRK43II
  • SSPMPRK22
  • SSPMPRK43

@SKopecz SKopecz marked this pull request as draft December 29, 2025 13:00
@codecov
Copy link

codecov bot commented Dec 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@coveralls
Copy link

coveralls commented Dec 29, 2025

Pull Request Test Coverage Report for Build 21437392042

Details

  • 262 of 262 (100.0%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.3%) to 97.493%

Totals Coverage Status
Change from base Build 20641694759: -0.3%
Covered Lines: 1789
Relevant Lines: 1835

💛 - Coveralls

@SKopecz SKopecz changed the title WIP: Simplified and clearer code using the function basic_patankar_step. WIP: Simplified and cleaner code using the function basic_patankar_step. Dec 29, 2025
@SKopecz
Copy link
Collaborator Author

SKopecz commented Jan 4, 2026

Introduced a function to compute linear combinations for out-of-place computations. It is implemented recursively, but internally unrolled.

using BenchmarkTools, StaticArrays 

# --- Base cases (End of recursion) ---

# For PDSFunctions (with destruction vectors)
@inline lincomb(c1::Number, P1, d1::AbstractArray) = (c1 .* P1, c1 .* d1)

# For ConservativePDSFunctions (without destruction vectors)
@inline lincomb(c1::Number, P1, d1::Nothing) = (c1 .* P1, nothing)


# --- Recursive steps ---

# For PDSFunctions: Processes triplets (coeff, P, d)
@inline function lincomb(c1::Number, P1, d1::AbstractArray, c2, P2, d2, args...)
    P_tail, d_tail = lincomb(c2, P2, d2, args...)
    return (c1 .* P1 .+ P_tail, c1 .* d1 .+ d_tail)
end

# For ConservativePDSFunctions: Processes triplets (coeff, P, nothing)
@inline function lincomb(c1::Number, P1, d1::Nothing, c2, P2, d2, args...)
    P_tail, _ = lincomb(c2, P2, d2, args...)
    return (c1 .* P1 .+ P_tail, nothing)
end

function run_comprehensive_benchmark(N)
    println("\n" * ""^70)
    println("  BENCHMARKING SYSTEM SIZE N = $N")
    println(""^70)

    # Setup test data (3 stages)
    P1, P2, P3 = rand(SMatrix{N,N}), rand(SMatrix{N,N}), rand(SMatrix{N,N})
    d1, d2, d3 = rand(SVector{N}), rand(SVector{N}), rand(SVector{N})
    c1, c2, c3 = 0.5, 0.3, 0.2

    # --- CASE A: Standard PDS (with destruction vectors) ---
    println("\n>>> CASE A: Standard PDS (d is AbstractArray)")
    
    println("\n[A.1] Direct Input (Manual):")
    b_direct_a = @benchmark ($c1 .* $P1 .+ $c2 .* $P2 .+ $c3 .* $P3, 
                             $c1 .* $d1 .+ $c2 .* $d2 .+ $c3 .* $d3)
    display(b_direct_a)

    println("\n[A.2] lincomb Function:")
    b_func_a = @benchmark lincomb($c1, $P1, $d1, $c2, $P2, $d2, $c3, $P3, $d3)
    display(b_func_a)


    # --- CASE B: Conservative PDS (d is nothing) ---
    println("\n" * "-"^70)
    println(">>> CASE B: Conservative PDS (d is nothing)")
    
    println("\n[B.1] Direct Input (Manual):")
    # In manual case, we only calculate P
    b_direct_b = @benchmark ($c1 .* $P1 .+ $c2 .* $P2 .+ $c3 .* $P3, nothing)
    display(b_direct_b)

    println("\n[B.2] lincomb Function:")
    b_func_b = @benchmark lincomb($c1, $P1, nothing, $c2, $P2, nothing, $c3, $P3, nothing)
    display(b_func_b)
    
    return nothing
end

Results:

julia> run_comprehensive_benchmark(2)

██████████████████████████████████████████████████████████████████████
  BENCHMARKING SYSTEM SIZE N = 2
██████████████████████████████████████████████████████████████████████

>>> CASE A: Standard PDS (d is AbstractArray)

[A.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  6.135 ns  183.600 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     7.063 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.266 ns ±   3.170 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▅ █ ▇ ▄  ▁                                                 
  ▇██▃█▃█▃█▆▃█▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▃
  6.14 ns         Histogram: frequency by time        14.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[A.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  5.700 ns  152.339 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     6.569 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.755 ns ±   2.887 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     █ █     ▁ ▂ ▅ ▆                                           
  ▇▅▅█▄█▄█▆▇▅█▅█▃█▄█▄▃▇▃▇▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▂▁▁▂▂▂▂▂▂▁▁▂▂ ▃
  5.7 ns          Histogram: frequency by time        10.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

----------------------------------------------------------------------
>>> CASE B: Conservative PDS (d is nothing)

[B.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min  max):  4.823 ns  174.098 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     5.560 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.677 ns ±   2.636 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅  ▆  ▅  ▂   █                                               
  █▃▄█▃▃█▄▃█▆▃▃█▇▃▃▅▇▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▁▂▂▂▂▂▂▂▁▁▂▂▁▂▂▁▂ ▃
  4.82 ns         Histogram: frequency by time        9.31 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[B.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min  max):  4.823 ns  70.316 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     5.562 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.758 ns ±  1.852 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂█▄▇▄▃▇▃▆█▄▃▆▃▂▁▁▁▁▁                                      ▃
  █████████████████████▇▇█▇▇██▇██████▇▇▇▇▇▇▆▇▆▅▆▆▁▅▅▅▅▄▄▄▃▃▄ █
  4.82 ns      Histogram: log(frequency) by time     10.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> run_comprehensive_benchmark(3)

██████████████████████████████████████████████████████████████████████
  BENCHMARKING SYSTEM SIZE N = 3
██████████████████████████████████████████████████████████████████████

>>> CASE A: Standard PDS (d is AbstractArray)

[A.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):   9.181 ns  153.478 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     10.074 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.712 ns ±   3.368 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▃▆▃▆▂▆▄▁▅▁                                                 ▂
  █████████████▇▇▆▅▆▅▄▆▆▆▇▇▇▅▆▇▇▇▇▆▇▆▆▅▅▆▂▆▅▆▅▅▆▄▅▅▅▅▄▃▃▄▂▄▃▂▃ █
  9.18 ns       Histogram: log(frequency) by time      24.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[A.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  8.308 ns  175.174 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     8.814 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.438 ns ±   3.640 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂▇▃▂▆▂▁▅▁  ▇▂▁▄▅                                           ▂
  ███████████▆██████▇▇▆▆▆▄▄▅▄▅▅▃▅▅▃▄▄▄▂▅▅▄▄▅▄▅▃▃▄▄▄▄▃▄▃▄▂▄▂▄▄ █
  8.31 ns      Histogram: log(frequency) by time      16.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

----------------------------------------------------------------------
>>> CASE B: Conservative PDS (d is nothing)

[B.1] Direct Input (Manual):
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  6.126 ns  166.082 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     6.728 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.217 ns ±   3.308 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂▇▁▆▁▅▅▁▇▁▁▅▁▁                                             ▂
  ███████████████▇▇▇▆▅▆▅▅▆▇▇▅▆▅▆▆▆▆▇▇▆▇▇▆▇▆▆▇▅▅▅▅▆▆▅▅▆▇▅▃▄▅▅▄ █
  6.13 ns      Histogram: log(frequency) by time      14.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

[B.2] lincomb Function:
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min  max):  6.127 ns  149.530 ns  ┊ GC (min  max): 0.00%  0.00%

@SKopecz SKopecz changed the title WIP: Simplified and cleaner code using the function basic_patankar_step. WIP: Simplified and cleaner code. Jan 9, 2026
@SKopecz
Copy link
Collaborator Author

SKopecz commented Jan 9, 2026

The following shows the efficiency of the new build_mprk_matrix method for StaticMatrix input.

using BenchmarkTools
using StaticArrays
using MuladdMacro
using Test
using LinearAlgebra

# --- IMPLEMENTATIONS ---
### OLD #################
function build_mprk_matrix_old(P, sigma, dt, d = nothing)
    # re-use the in-place version implemented below
    M = similar(P)
    build_mprk_matrix!(M, P, sigma, dt, d)

    if P isa StaticArray
        return SMatrix(M)
    else
        return M
    end
end
@muladd function build_mprk_matrix!(M, P, sigma, dt, d = nothing)
    # M[i,i] = (sigma[i] + dt * d[i] + dt * sum_j≠i P[j,i]) / sigma[i]
    # M[i,j] = -dt * P[i,j] / sigma[j]
    # TODO: the performance of this can likely be improved
    Base.require_one_based_indexing(M, P, sigma)
    @assert size(M, 1) == size(M, 2) == size(P, 1) == size(P, 2) == length(sigma)
    if !isnothing(d)
        Base.require_one_based_indexing(d)
        @assert length(sigma) == length(d)
    end

    zeroM = zero(eltype(P))

    # Set sigma on diagonal
    @inbounds for i in eachindex(sigma)
        M[i, i] = sigma[i]
    end
    # Add nonconservative destruction terms to diagonal (PDSFunctions only!)
    if !isnothing(d)
        @inbounds for i in eachindex(d)
            M[i, i] = M[i, i] + dt * d[i]
        end
    end

    # Run through P and fill M accordingly.
    # If P[i,j] ≠ 0 set M[i,j] = -dt*P[i,j] and add dt*P[i,j] to M[j,j].
    @fastmath @inbounds @simd for I in CartesianIndices(P)
        if I[1] != I[2]
            if !iszero(P[I])
                dtP = dt * P[I]
                M[I] = -dtP / sigma[I[2]]
                M[I[2], I[2]] += dtP
            else
                M[I] = zeroM
            end
        end
    end

    # Divide diagonal elements by Patankar weights denominators
    @fastmath @inbounds @simd for i in eachindex(sigma)
        M[i, i] /= sigma[i]
    end

    return nothing
end

### NEW #################
@inline function sum_offdiagonal_col(P::StaticMatrix{N, N, T}, j) where {N, T}
    res = zero(T)
    for i in 1:N
        if i != j
            res += P[i, j]
        end
    end
    return res
end
@muladd @inline function build_mprk_matrix_new(P::StaticMatrix{N, N, T}, sigma, dt,
                                               d = nothing) where {N, T}
    return SMatrix{N, N, T}((i == j) ?
                            # diagonal: sum over P[k, i] where k != i
                            (sigma[i] +
                             dt *
                             (sum_offdiagonal_col(P, i) + (d === nothing ? zero(T) : d[i]))) /
                            sigma[i] :
                            # off-diagonal
                            -(dt * P[i, j]) / sigma[j]
                            for i in 1:N, j in 1:N)
end

# --- TEST & BENCHMARK SUITE ---

function run_benchmark(N)
    println("\n" * ""^70)
    println("  COMPREHENSIVE BENCHMARK N = $N")
    println(""^70)

    # Setup
    P = rand(SMatrix{N, N})
    sigma = rand(SVector{N})
    d_vec = rand(SVector{N})
    dt = 0.1

    # Validation
    println("Checking Correctness...")
    M_old = build_mprk_matrix_old(P, sigma, dt, d_vec)
    M_new = build_mprk_matrix_new(P, sigma, dt, d_vec)
    M_old_cons = build_mprk_matrix_old(P, sigma, dt, nothing)
    M_new_cons = build_mprk_matrix_new(P, sigma, dt, nothing)

    @test M_old  M_new
    @test M_old_cons  M_new_cons
    println("✓ Results are identical.\n")

    # Benchmarks
    println(">>> CASE A: With destruction vector d")
    println("\n[OLD VERSION]")
    display(@benchmark build_mprk_matrix_old($P, $sigma, $dt, $d_vec))

    println("\n[NEW VERSION]")
    display(@benchmark build_mprk_matrix_new($P, $sigma, $dt, $d_vec))

    println("\n" * "-"^70)
    println(">>> CASE B: Conservative (d = nothing)")

    println("\n[OLD VERSION]")
    display(@benchmark build_mprk_matrix_old($P, $sigma, $dt, nothing))

    println("\n[NEW VERSION]")
    display(@benchmark build_mprk_matrix_new($P, $sigma, $dt, nothing))
end
julia> run_benchmark(5)

██████████████████████████████████████████████████████████████████████
  COMPREHENSIVE BENCHMARK N = 5
██████████████████████████████████████████████████████████████████████
Checking Correctness...
✓ Results are identical.

>>> CASE A: With destruction vector d

[OLD VERSION]
BenchmarkTools.Trial: 10000 samples with 988 evaluations per sample.
 Range (min  max):  39.109 ns   11.639 μs  ┊ GC (min  max):  0.00%  99.12%
 Time  (median):     48.098 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   72.646 ns ± 214.892 ns  ┊ GC (mean ± σ):  12.98% ±  6.77%

  ▇▇█▅▄▂▂▃▃▂▂ ▆▆▄▃▂▂▁▁▁▁▁▁  ▁                                  ▂
  █████████████████████████████▇▇▇▆▆▅▆▆▆▅▄▆▆▆▆▅▅▅▅▄▅▄▅▃▄▄▄▄▄▄▄ █
  39.1 ns       Histogram: log(frequency) by time       232 ns <

 Memory estimate: 208 bytes, allocs estimate: 1.

[NEW VERSION]
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min  max):  13.862 ns  183.346 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     13.916 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.972 ns ±   5.200 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▂▂▂▃▃▃ ▁▂▂ ▁▁ ▂ ▁ ▂ ▂                                      ▁
  ████████▇███▆████▆█▆█▆███▆▇▅▅▄▆▆▆▆▆▆▇▇▇▆▇▆▅▄▆▇▆▆▆▅▄▄▃▃▄▃▁▅▅▅ █
  13.9 ns       Histogram: log(frequency) by time      39.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

----------------------------------------------------------------------
>>> CASE B: Conservative (d = nothing)

[OLD VERSION]
BenchmarkTools.Trial: 10000 samples with 991 evaluations per sample.
 Range (min  max):  36.367 ns   11.648 μs  ┊ GC (min  max):  0.00%  98.85%
 Time  (median):     42.436 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   66.981 ns ± 206.688 ns  ┊ GC (mean ± σ):  12.98% ±  6.91%

  ▅█▅▂▂▁▁▂▁   ▄▅▃▂▁▁▁▁▁                                        ▁
  █████████▇▆▇███████████▇█▇▇▆▇▆▆▆▅▆▆▆▅▅▅▅▅▄▄▄▅▅▄▄▄▄▅▃▃▂▄▃▄▃▄▄ █
  36.4 ns       Histogram: log(frequency) by time       232 ns <

 Memory estimate: 208 bytes, allocs estimate: 1.

[NEW VERSION]
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min  max):  13.692 ns  159.564 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     13.731 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.496 ns ±   4.104 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▃▃▂ ▂▂▁▃▃▁▂ ▁ ▁ ▂ ▁▁▂  ▁  ▂  ▁                             ▁
  █████▇███████▇█▅█▅██████▇█▆▆█▅▅█▆▆▆▅▅▅▆▅▄▄▅▁▅▅▅▅▆▆▇▆▇▆▆▆▅▆▅▅ █
  13.7 ns       Histogram: log(frequency) by time      32.3 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@SKopecz
Copy link
Collaborator Author

SKopecz commented Jan 12, 2026

The following shows the efficiency of the new lincomb! methods for in-place computations. One is for AbstractArray and the orther for SparseMatrixCSC.

using BenchmarkTools
using StaticArrays
using SparseArrays
using FastBroadcast
using MuladdMacro
using Test

# --- GENERATED IMPLEMENTATION ---

@muladd @generated function lincomb!(dest::AbstractArray, pairs...)
    n = length(pairs) ÷ 2
    expr = :(pairs[1] * pairs[2])
    for i in 2:n
        expr = :($expr + pairs[$(2 * i - 1)] * pairs[$(2 * i)])
    end
    return quote
        @.. broadcast=false dest=$expr
        return dest
    end
end

@muladd @generated function lincomb!(dest::SparseMatrixCSC, pairs...)
    n = length(pairs) ÷ 2
    nz_names = [Symbol("nz_", i) for i in 1:n]
    setup_block = Expr(:block)
    # We need to keep the structural nonzeros of the production terms.
    # However, this is not guaranteed by broadcasting, see
    # https://github.com/JuliaSparse/SparseArrays.jl/issues/190
    for i in 1:n
        push!(setup_block.args, :($(nz_names[i]) = nonzeros(pairs[$(2 * i)])))
    end
    expr = :(pairs[1] * $(nz_names[1]))
    for i in 2:n
        expr = :($expr + pairs[$(2 * i - 1)] * $(nz_names[i]))
    end
    return quote
        $setup_block
        nz_dest = nonzeros(dest)
        @.. broadcast=false nz_dest=$expr
        return dest
    end
end

# --- BENCHMARK FUNCTION ---

function run_comprehensive_benchmark(N)
    println("\n" * ""^75)
    println("  LINCOMB! FULL PERFORMANCE REPORT (N = $N, 3 Components)")
    println(""^75)

    c1, c2, c3 = 0.5, 0.8, -0.2

    # --- 1. DENSE CASE ---
    println("\n>>> CASE 1: Standard Dense Arrays (Matrix{Float64})")
    A1, A2 = rand(N, N), rand(N, N)
    A3_old = rand(N, N)
    A3_new = copy(A3_old)

    # Validation
    A3_old .= c1 .* A1 .+ c2 .* A2 .+ c3 .* A3_old
    lincomb!(A3_new, c1, A1, c2, A2, c3, A3_new)
    @test A3_old  A3_new
    println("✓ Correctness: OK")

    println("\n[OLD] Manual Dense @..")
    display(@benchmark $A3_old .= $c1 .* $A1 .+ $c2 .* $A2 .+ $c3 .* $A3_old)

    println("\n[NEW] lincomb! Dense")
    display(@benchmark lincomb!($A3_new, $c1, $A1, $c2, $A2, $c3, $A3_new))

    # --- 2. SPARSE CASE ---
    println("\n" * "-"^75)
    println("\n>>> CASE 2: Sparse Matrices (SparseMatrixCSC)")
    S_ref = sprand(N, N, 0.01)
    S1 = SparseMatrixCSC(S_ref.m, S_ref.n, S_ref.colptr, S_ref.rowval, rand(nnz(S_ref)))
    S2 = SparseMatrixCSC(S_ref.m, S_ref.n, S_ref.colptr, S_ref.rowval, rand(nnz(S_ref)))
    S3_old = SparseMatrixCSC(S_ref.m, S_ref.n, S_ref.colptr, S_ref.rowval, rand(nnz(S_ref)))
    S3_new = copy(S3_old)

    # Validation
    nz1, nz2, nz3 = nonzeros(S1), nonzeros(S2), nonzeros(S3_old)
    @.. broadcast=false nz3 = c1 * nz1 + c2 * nz2 + c3 * nz3
    lincomb!(S3_new, c1, S1, c2, S2, c3, S3_new)
    @test S3_old  S3_new
    println("✓ Correctness: OK")

    println("\n[OLD] Manual Sparse nz-broadcast")
    display(@benchmark begin
        nz_1 = nonzeros($S1)
        nz_2 = nonzeros($S2)
        nz_3 = nonzeros($S3_old)
        @.. broadcast=false nz_3 = $c1 * nz_1 + $c2 * nz_2 + $c3 * nz_3
    end)

    println("\n[NEW] lincomb! Sparse")
    display(@benchmark lincomb!($S3_new, $c1, $S1, $c2, $S2, $c3, $S3_new))
end
julia> run_comprehensive_benchmark(1000)

███████████████████████████████████████████████████████████████████████████
  LINCOMB! FULL PERFORMANCE REPORT (N = 1000, 3 Components)
███████████████████████████████████████████████████████████████████████████

>>> CASE 1: Standard Dense Arrays (Matrix{Float64})
✓ Correctness: OK

[OLD] Manual Dense @..
BenchmarkTools.Trial: 2111 samples with 1 evaluation per sample.
 Range (min  max):  1.824 ms    4.594 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     2.084 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.351 ms ± 596.946 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █▃                                                          
  ███▇▇▇▅▄▅▅▄▄▃▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▁▂▂▂▁▁▁▁▂▁▁▁▁▁▁▁ ▂
  1.82 ms         Histogram: frequency by time        4.08 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

[NEW] lincomb! Dense
BenchmarkTools.Trial: 2434 samples with 1 evaluation per sample.
 Range (min  max):  1.437 ms    3.640 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.751 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.035 ms ± 544.079 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▂▁▆██▄                                                   
  ▂▁▂▃████████▇▇▅▄▄▃▃▂▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▃▂▂▂▂▃▃▄▄▅▄▃▃▃▃▃▃▃▃▄▄▄▃ ▃
  1.44 ms         Histogram: frequency by time        3.26 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

---------------------------------------------------------------------------

>>> CASE 2: Sparse Matrices (SparseMatrixCSC)
✓ Correctness: OK

[OLD] Manual Sparse nz-broadcast
BenchmarkTools.Trial: 10000 samples with 9 evaluations per sample.
 Range (min  max):  2.446 μs  26.794 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     2.944 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.266 μs ±  1.174 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▅▅▇█▇▆▄▄▃▃▃▂▃▂▂▂▂▂▃▂▂▁▁                                   ▂
  ███████████████████████████▆█▇▆▇▆▆▅▆▅▆▅▅▅▆▆▄▅▅▄▅▄▄▄▃▃▂▂▄▄▄ █
  2.45 μs      Histogram: log(frequency) by time      8.1 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

[NEW] lincomb! Sparse
BenchmarkTools.Trial: 10000 samples with 9 evaluations per sample.
 Range (min  max):  3.267 μs  20.178 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     3.845 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.050 μs ±  1.044 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃▄▁ ▆▅ █                                                   
  ████▇██▆█▆█▃▃▂▂▃▂▂▃▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  3.27 μs        Histogram: frequency by time        8.24 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

@SKopecz
Copy link
Collaborator Author

SKopecz commented Jan 14, 2026

Tried to optimize loops like

for i in eachindex(b)
    b[i] += dt * P[i, i]
end

for sparse matrices P, but there was no benefit.

    end
    return nothing
end

# --- BENCHMARK RUNNER ---

function run_scaling_analysis()
    # Problem sizes (N) from 10 to 100,000
    Ns = [10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000, 300000, 1000000]
    
    times_naive = Float64[]
    times_opt = Float64[]
    dt = 0.1

    println("Starting scaling benchmarks...")

    for n in Ns
        # Setup: 10 non-zeros per column + diagonal
        b = rand(n)
        P = sprand(n, n, 10/n) + 2I
        
        # Benchmark Naive
        t_naive = @belapsed add_diagonal_naive!($b, $P, $dt)
        push!(times_naive, t_naive)

        # Benchmark Optimized
        t_opt = @belapsed add_diagonal!($b, $P, $dt)
        push!(times_opt, t_opt)
        
        println("Completed N = $n")
    end

    # --- PLOTTING ---
    
    p = plot(Ns, [times_naive, times_opt],
        xlabel = "Problem Size (N)",
        ylabel = "Time (seconds)",
        label = ["Naive (P[i,i])" "Optimized (nzrange)"],
        title = "Scaling of Diagonal Addition: Naive vs Optimized",
        xscale = :log10,
        yscale = :log10,
        legend = :topleft,
        marker = :circle,
        lw = 2,
        grid = true
    )
    
    # Save or display
    display(p)
    return p
end
plot_22

@SKopecz SKopecz marked this pull request as ready for review January 15, 2026 13:45
Copy link
Member

@JoshuaLampert JoshuaLampert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go through everything in detail (especially the definition of lincomb), but I like the idea of bundling common steps in functions and I trust CI that the replacements are correct. I only have one question about type stability.

@ranocha ranocha changed the title WIP: Simplified and cleaner code. Simplified and cleaner code Jan 18, 2026
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like the idea of defining functions for commonly-used functionality. However, this PR is quite long, so it will take some time to review everything in detail.
Did you perform some performance benchmarks for full time integrations (with save_everystep = false)?

@SKopecz
Copy link
Collaborator Author

SKopecz commented Jan 19, 2026

Thanks, I like the idea of defining functions for commonly-used functionality. However, this PR is quite long, so it will take some time to review everything in detail.

Take your time. Once this is pushed, implementing additional schemes will be significantly simplified.

@SKopecz
Copy link
Collaborator Author

SKopecz commented Jan 19, 2026

Did you perform some performance benchmarks for full time integrations (with save_everystep = false)?

Please find the benchmarks below.

SKopecz and others added 3 commits January 19, 2026 16:25
Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>
@SKopecz SKopecz changed the title Simplified and cleaner code Improve code readability and reduce OOP allocations for StaticArrays Jan 21, 2026
@SKopecz
Copy link
Collaborator Author

SKopecz commented Jan 21, 2026

Here's the performance benchmark.

I used the following script to benchmark the different branches.

using PositiveIntegrators
using SparseArrays
using LinearAlgebra: Tridiagonal, I
using BenchmarkTools
using Serialization
using Printf

# [Problem definitions omitted for brevity - e.g., prob_heat_dir, etc.]
# Problems taken from https://numericalmathematics.github.io/PositiveIntegrators.jl/stable/heat_equation_neumann/
# and https://numericalmathematics.github.io/PositiveIntegrators.jl/stable/heat_equation_dirichlet/

# List of test problems
problems = Dict("NPZD" => prob_pds_npzd,
                "Robertson" => prob_pds_robertson,
                "StratReac" => prob_pds_stratreac,
                "Heat Eq. Neumann" => prob_heat_neu,
                "Heat Eq. Neumann Sparse" => prob_heat_neu_sparse,
                "Heat Eq. Neumann Tridiagonal" => prob_heat_neu_tridiagonal,
                "Heat Eq. Dirichlet" => prob_heat_dir,
                "Heat Eq. Dirichlet Sparse" => prob_heat_dir_sparse,
                "Heat Eq. Dirichlet Tridiagonal" => prob_heat_dir_tridiagonal)

# Define the algorithms to be tested
algs = [MPRK22(0.5), MPRK22(1.0), MPRK43I(1.0, 0.5), MPRK43I(0.5, 0.75),
    MPRK43II(2.0 / 3.0), MPRK43II(0.5), SSPMPRK22(0.5, 1.0)]

# Get branch name and handle potential directory slash issues
raw_branch = strip(read(`git rev-parse --abbrev-ref HEAD`, String))
safe_branch = replace(raw_branch, "/" => "_")

println(">>> Starting benchmarks on branch: $raw_branch")

results = Dict()

for (name, prob) in problems
    println("\nTest case: $name")
    results[name] = Dict()

    for (i, alg) in enumerate(algs)
        # We only use the index as the key to keep it simple
        alg_id = "Alg_$i"
        print("  -> Benchmarking $alg_id ... ")

        # Performance measurement with save_everystep=false per requirements
        b = @benchmark solve($prob, $alg; save_everystep = false)

        results[name][alg_id] = (time = median(b).time,
                                 allocs = b.allocs)
        println("Done.")
    end
end

filename = "bench_results_$(safe_branch).dat"
serialize(filename, results)
println("\nResults saved to: $filename")

To compare the results I used:

# ADJUST THESE FILENAMES TO MATCH YOUR GENERATED FILES
file_pr = "bench_results_sk_basic_patankar_step.dat"
file_main = "bench_results_main.dat"

if !isfile(file_pr) || !isfile(file_main)
    error("Benchmark files not found. Check filenames and paths.")
end

res_pr = deserialize(file_pr)
res_main = deserialize(file_main)

println("\n" * "="^55)
println(@sprintf("%-10s | %-12s | %-15s | %-10s", "ID", "Time %", "Time PR (ms)",
                 "Allocs Δ"))
println("="^55)

# Iterate through each problem (e.g., NPZD, Robertson)
for prob_name in sort(collect(keys(res_pr)))
    println("\n[ Problem: $prob_name ]")
    println("-"^55)

    # Extract IDs for this specific problem and sort them numerically
    alg_keys = sort(collect(keys(res_pr[prob_name])),
                    by = x -> parse(Int, split(x, "_")[2]))

    for id in alg_keys
        # Correct nested access: results[problem][algorithm_id]
        d_pr = res_pr[prob_name][id]
        d_main = res_main[prob_name][id]

        ratio = (d_pr.time / d_main.time) * 100
        t_ms = d_pr.time / 1e6
        a_diff = Int(d_pr.allocs - d_main.allocs)

        @printf("%-10s | %10.2f%% | %12.3f ms | %s\n",
                id, ratio, t_ms,
                a_diff == 0 ? "±0" : (a_diff > 0 ? "+$a_diff" : "$a_diff"))
    end
end
println("="^55)

The outcome is as follows:

=======================================================
ID         | Time %       | Time PR (ms)    | Allocs Δ  
=======================================================

[ Problem: Heat Eq. Dirichlet ]
-------------------------------------------------------
Alg_1      |     106.69% |        8.363 ms | ±0
Alg_2      |     106.03% |       11.029 ms | ±0
Alg_3      |     105.27% |        8.427 ms | ±0
Alg_4      |     105.45% |       12.669 ms | ±0
Alg_5      |     105.38% |       11.031 ms | ±0
Alg_6      |     108.41% |       10.908 ms | ±0
Alg_7      |     108.28% |        8.873 ms | ±0

[ Problem: Heat Eq. Dirichlet Sparse ]
-------------------------------------------------------
Alg_1      |      96.37% |        5.403 ms | ±0
Alg_2      |      96.07% |        6.267 ms | ±0
Alg_3      |      98.36% |        5.186 ms | ±0
Alg_4      |      94.26% |        7.135 ms | ±0
Alg_5      |      98.01% |        6.238 ms | ±0
Alg_6      |      98.27% |        6.109 ms | ±0
Alg_7      |      98.44% |        5.587 ms | ±0

[ Problem: Heat Eq. Dirichlet Tridiagonal ]
-------------------------------------------------------
Alg_1      |      95.48% |        0.470 ms | ±0
Alg_2      |      92.46% |        0.425 ms | ±0
Alg_3      |      94.81% |        0.410 ms | ±0
Alg_4      |      96.30% |        0.566 ms | ±0
Alg_5      |      97.45% |        0.697 ms | ±0
Alg_6      |      96.52% |        0.560 ms | ±0
Alg_7      |      97.97% |        0.489 ms | ±0

[ Problem: Heat Eq. Neumann ]
-------------------------------------------------------
Alg_1      |     102.00% |       12.173 ms | ±0
Alg_2      |     102.45% |        4.751 ms | ±0
Alg_3      |     102.86% |        6.047 ms | ±0
Alg_4      |     106.23% |       13.300 ms | ±0
Alg_5      |     105.59% |       12.252 ms | ±0
Alg_6      |     105.67% |       12.084 ms | ±0
Alg_7      |     105.53% |       16.204 ms | ±0

[ Problem: Heat Eq. Neumann Sparse ]
-------------------------------------------------------
Alg_1      |     106.11% |        7.848 ms | ±0
Alg_2      |     105.93% |        3.345 ms | ±0
Alg_3      |     105.79% |        4.159 ms | ±0
Alg_4      |     105.04% |        8.069 ms | ±0
Alg_5      |      95.75% |        6.843 ms | ±0
Alg_6      |      96.74% |        6.769 ms | ±0
Alg_7      |      99.70% |       11.235 ms | ±0

[ Problem: Heat Eq. Neumann Tridiagonal ]
-------------------------------------------------------
Alg_1      |     100.73% |        0.589 ms | ±0
Alg_2      |      97.96% |        0.194 ms | ±0
Alg_3      |      98.42% |        0.273 ms | ±0
Alg_4      |      98.01% |        0.543 ms | ±0
Alg_5      |     100.52% |        0.743 ms | ±0
Alg_6      |      98.35% |        0.577 ms | ±0
Alg_7      |     100.28% |        0.767 ms | ±0

[ Problem: NPZD ]
-------------------------------------------------------
Alg_1      |      91.66% |        0.256 ms | -1290
Alg_2      |      95.68% |        0.302 ms | -1762
Alg_3      |      96.65% |        0.214 ms | -1376
Alg_4      |      96.79% |        0.211 ms | -1316
Alg_5      |      98.41% |        0.312 ms | -1372
Alg_6      |      97.13% |        0.252 ms | -1372
Alg_7      |      97.02% |        0.426 ms | -2018

[ Problem: Robertson ]
-------------------------------------------------------
Alg_1      |      93.90% |        0.239 ms | -1580
Alg_2      |      87.52% |        0.059 ms | -474
Alg_3      |      84.31% |        0.110 ms | -1032
Alg_4      |      85.15% |        0.079 ms | -700
Alg_5      |      88.93% |        0.134 ms | -832
Alg_6      |      86.41% |        0.107 ms | -828
Alg_7      |      93.64% |        0.445 ms | -2824

[ Problem: StratReac ]
-------------------------------------------------------
Alg_1      |      98.33% |        6.282 ms | -13208
Alg_2      |      93.53% |       27.005 ms | -68600
Alg_3      |      93.12% |       27.239 ms | -72808
Alg_4      |      92.23% |       36.602 ms | -97524
Alg_5      |      93.64% |       21.580 ms | -47632
Alg_6      |      92.17% |       33.748 ms | -86564
Alg_7      |      61.86% |      193.555 ms | -669448
=======================================================

Discussion of the results:

The last 3 problems (NPZD, Roberston, StratReac) are out-of-place and use StaticArrays. We see a reduced number of allocations, which is due to the specialization of build_mprk_matrix for StaticMatrix.

The other problems are the heat equation problems from the tutorials. These are in-place and we see that the number of allocations remains unchanged. With respect to computation time, we see a difference of about 8% in both directions. I got similar differences when I compared two different benchmarks of the same branch.

I'd conclude that the PR improves the readability of the algorithms, saves unnecessary allocations when using StaticArrays and has no negative effect on performance.

Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks like a nice cleanup!

@ranocha ranocha requested a review from JoshuaLampert January 24, 2026 20:51
SKopecz and others added 2 commits January 25, 2026 11:38
Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants