Skip to content

Variable Block Compressed Sparse Row (VBCSR) Matrix library with MPI support.

Notifications You must be signed in to change notification settings

floatingCatty/vbcsr

Repository files navigation

Logo


Variable Block Compressed Sparse Row (VBCSR) Matrix library for high-performance distributed sparse matrix operations. It combines the speed of optimized C++ kernels with the flexibility of Python.

Why VBCSR?

  • Hardware-Accelerated Performance: Leveraging SIMD (AVX/AVX2) instructions, precaching and threading, VBCSR delivers state-of-the-art performance for block-sparse matrix operations.
  • Easy Integration: Header-only C++ core for easy inclusion in both Python and C++ projects.
  • Pythonic & Intuitive: Perform complex linear algebra using natural Python syntax (A * x, A + B, A @ B) and standard NumPy arrays.
  • Scalable & Distributed: Built on MPI to handle massive datasets across distributed computing clusters.
  • Seamless Integration: Drop-in compatibility with SciPy solvers (scipy.sparse.linalg) for easy integration into existing workflows.

Installation

First, please ensure your environment have compilers for c and fortran, also BLAS (OpenBLAS, MKL) and OpenMP are installed. The code also need mpi. You can install them using:

conda install -c conda-forge openblas/mkl openmp openmpi mpi4py compilers

or

sudo apt-get install build-essential gfortran libopenblas-dev libmkl-dev libopenmp-dev libopenmpi-dev

Then, you can install the package using:

pip install .

For advanced installation options (BLAS/MKL, OpenMP), please see doc/advanced_installation.md.

Documentation

VBCSR Performance Benchmark

Comparison of Matrix-Vector Multiplication (SpMV) performance between vbcsr (with MKL) and scipy.sparse.csr_matrix.

Test Configuration

  • Block Size: Random [16, 20]
  • Density: Adaptive (ensuring ~200 non-zero blocks per row)
  • Data Type: float64
  • System: Linux, MKL Backend, 32 core 13th Gen Intel(R) Core(TM) i9-13900K

Results

Blocks Approx Rows VBCSR Time (s) SciPy Time (s) Speedup
500 ~9000 0.0049 0.0209 4.21x
1000 ~18000 0.0096 0.0392 4.07x
5000 ~90000 0.0468 0.2029 4.33x
10000 ~180000 0.0931 0.4151 4.46x
20000 ~360000 0.1866 0.8377 4.49x

Visualization

Benchmark Plot

Usage Examples

Basic Usage

import numpy as np
import vbcsr
from mpi4py import MPI

# Create a serial matrix
comm = MPI.COMM_WORLD
mat = vbcsr.VBCSR.create_serial(comm, global_blocks=2, block_sizes=[2, 2], adjacency=[[0, 1], [0, 1]])

# Add blocks and assemble
mat.add_block(0, 0, np.eye(2))
mat.assemble()

# Matrix-Vector Multiplication
v = np.array([1.0, 2.0, 3.0, 4.0])
res = mat.mult(v)
print(res.to_numpy())

Distributed Usage

# Run with: mpirun -np 2 python script.py
import numpy as np
import vbcsr
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

# Define distributed structure (2 blocks total, 1 per rank)
owned_indices = [rank]
block_sizes = [2]
adjacency = [[0, 1]] # Both blocks connected to both

mat = vbcsr.VBCSR.create_distributed(comm, owned_indices, block_sizes, adjacency)

# Fill local blocks
mat.add_block(rank, 0, np.eye(2))
mat.add_block(rank, 1, np.eye(2))
mat.assemble()

v = mat.create_vector()
v.set_constant(1.0)
res = mat.mult(v)
print(f"Rank {rank}: {res.to_numpy()}")

SciPy Integration

from scipy.sparse.linalg import cg

# Use VBCSR as a LinearOperator
x, info = cg(mat, v, rtol=1e-5)

Filtered SpMM

To perform sparse matrix-matrix multiplication with filtering (dropping small blocks), use the spmm method directly:

# C = A * B, dropping blocks with Frobenius norm < 1e-6
C = A.spmm(B, threshold=1e-6)

About

Variable Block Compressed Sparse Row (VBCSR) Matrix library with MPI support.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published