Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
1424c87
Checkpoint from VS Code for cloud agent session
kento Apr 11, 2026
21a0767
Add OpenACC GEMM implementation in src/01_matmul
Copilot Apr 11, 2026
c359704
Merge pull request #60 from kento/copilot/vscode-mnu5s8bg-icce
kento Apr 11, 2026
00dd013
Add Kokkos ports for 5 HeCBench benchmarks + perf study
kento Apr 11, 2026
b95c829
Move kokkos_porting_study.pptx to 02_kokkos/Slides/ structure
kento Apr 11, 2026
b696219
Merge branch 'copilot/vscode-mnu5s8bg-icce'
kento Apr 11, 2026
26ef20a
Merge branch 'master' of https://github.com/kento/HeCBench
kento Apr 11, 2026
b0351d7
Add Kokkos ports for norm2, softmax, and wordcount benchmarks
Copilot Apr 11, 2026
aa55dac
Remove compiled binaries from Kokkos benchmark directories
Copilot Apr 11, 2026
f2eb77c
Fix missing newlines in norm2-kokkos error messages
Copilot Apr 11, 2026
855bf21
Remove compiled binaries again
Copilot Apr 11, 2026
c11000e
Add .gitignore files to Kokkos benchmark directories
Copilot Apr 11, 2026
cfa38db
Add Kokkos ports for stencil1d, michalewicz, and projectile benchmarks
Copilot Apr 11, 2026
a74a2cb
Remove compiled binaries from tracked files
Copilot Apr 11, 2026
d790810
Add Kokkos ports for haversine, damage, complex, and reverse benchmarks
Copilot Apr 11, 2026
7dfe6e1
Fix ScratchView memory trait in reverse-kokkos
Copilot Apr 11, 2026
74cdebc
Fix projectile max_height formula, add wordcount is_alpha comment, fi…
Copilot Apr 11, 2026
c67c687
Improve wordcount is_alpha comment to clarify intentional non-standar…
Copilot Apr 11, 2026
b1cb223
Merge pull request #61 from kento/copilot/port-benchmarks-for-kokkos
kento Apr 12, 2026
4fb676b
Port 6 benchmarks from OpenMP offload to Kokkos (OpenMP backend)
Copilot Apr 12, 2026
85c94ec
Port 7 benchmarks from OpenMP offload to Kokkos (OpenMP backend)
Copilot Apr 12, 2026
eadb059
Fix printf format specifier in goulash-kokkos: %lf -> %f
Copilot Apr 12, 2026
6f03014
Port 8 benchmarks from OpenMP offload to Kokkos (OpenMP backend)
Copilot Apr 12, 2026
46d9a9d
Port 10 more benchmarks to Kokkos: channelShuffle, chi2, fhd, gd, jac…
Copilot Apr 12, 2026
baec127
Merge pull request #62 from kento/copilot/port-all-kokkos-benchmarks
kento Apr 12, 2026
afecd09
Fix arithmetic bugs in aobench-kokkos and aop-kokkos
Copilot Apr 12, 2026
2c8a274
Fix arithmetic bugs in adam-kokkos and romberg-kokkos
Copilot Apr 12, 2026
cca7785
Add 7 new kokkos ports and fix 2 bugs in existing implementations
Copilot Apr 12, 2026
77713d9
Merge pull request #63 from kento/copilot/port-benchmarks-for-kokkos-…
kento Apr 12, 2026
0928c0e
Port mixbench, langevin, colorwheel, pnpoly, laplace3d, permute, conc…
Copilot Apr 12, 2026
fbc18d0
Port ne, convolution3D, threadfence, overlay to Kokkos
Copilot Apr 12, 2026
6ff6040
Port randomAccess, zeropoint, stddev, asmooth to Kokkos
Copilot Apr 12, 2026
686531c
Port popcount, cross, lombscargle to Kokkos
Copilot Apr 12, 2026
5d7bb31
Port cooling, layout, background-subtract to Kokkos
Copilot Apr 12, 2026
6e60f5e
Port flip, dense-embedding to Kokkos
Copilot Apr 12, 2026
e133249
Port atomicPerf, lif to Kokkos
Copilot Apr 12, 2026
ae9a6a1
Port tissue, rfs to Kokkos; commit atomicPerf and lif
Copilot Apr 12, 2026
596634b
Fix adam-kokkos epsilon bug; port burger-kokkos; review correctness
Copilot Apr 12, 2026
e1f52da
Port convolution1D, bsearch to Kokkos; fix adam-kokkos epsilon; port …
Copilot Apr 12, 2026
0e177c9
Merge pull request #64 from kento/copilot/port-benchmarks-for-kokkos-…
kento Apr 12, 2026
ce6e288
Port atomicReduction and contract benchmarks to Kokkos
Copilot Apr 12, 2026
97adf1d
Rename ambiguous loop variables in contract-kokkos kernel
Copilot Apr 12, 2026
7cde2a8
Add Kokkos ports of entropy and heat2d benchmarks
Copilot Apr 12, 2026
84519aa
Fix entropy-kokkos: use int for count/total, guard div-by-zero in opt…
Copilot Apr 12, 2026
6b8240f
Port car and bezier-surface CUDA benchmarks to Kokkos
Copilot Apr 12, 2026
aa6af03
Port nbody CUDA benchmark to Kokkos
Copilot Apr 12, 2026
ae37d2d
Add comment explaining GFLOPS flop count constants
Copilot Apr 12, 2026
78a0f1f
Port chacha20, bwt, and atomicCAS benchmarks to Kokkos
Copilot Apr 12, 2026
9172838
Port backprop and md benchmarks to Kokkos
Copilot Apr 12, 2026
f0984c9
Merge pull request #65 from kento/copilot/port-benchmarks-for-kokkos-…
kento Apr 12, 2026
f71a511
Port gamma-correction and lebesgue benchmarks to Kokkos
Copilot Apr 12, 2026
ef137f7
Address code review feedback in Kokkos ports
Copilot Apr 12, 2026
a08b4bd
Add Kokkos ports for floydwarshall and interleave benchmarks
Copilot Apr 12, 2026
9e4dd4a
Port heat and frechet benchmarks to Kokkos
Copilot Apr 12, 2026
eb2a21a
Add Kokkos ports of mandelbrot and nqueen benchmarks
Copilot Apr 12, 2026
22e9b83
Remove accidentally staged Kokkos build artifacts from other dirs
Copilot Apr 12, 2026
25beab8
Port hotspot3D benchmark to Kokkos
Copilot Apr 12, 2026
9096b2e
Remove accidentally committed Kokkos build artifacts and add .gitigno…
Copilot Apr 12, 2026
9d5af01
Merge pull request #66 from kento/copilot/port-kokkos-benchmarks
kento Apr 12, 2026
1621aba
Add Kokkos ports for resize, ising, pool, rainflow, hypterm, distort …
Copilot Apr 12, 2026
a10b392
Fix usage message formatting in resize-kokkos
Copilot Apr 12, 2026
35476ed
Add Kokkos ports for lrn, murmurhash3, extend2, geodesic, degrid, haccmk
Copilot Apr 12, 2026
21a2beb
Port page-rank benchmark to Kokkos
Copilot Apr 12, 2026
a830fdb
Merge pull request #67 from kento/copilot/port-benchmarks-for-kokkos-…
kento Apr 12, 2026
913f0c9
Add Kokkos ports: kalman, libor, particle-diffusion
Copilot Apr 12, 2026
a6dad8a
Fix code review feedback: logical && and L_b alias comment
Copilot Apr 12, 2026
a89a868
Port mallocFree, pitch, matrixT benchmarks to Kokkos
Copilot Apr 12, 2026
620f935
Address code review: simplify casts and use std::fabs
Copilot Apr 12, 2026
c80cbdd
Port 12 benchmarks to Kokkos
Copilot Apr 12, 2026
a025d31
Merge pull request #68 from kento/copilot/port-benchmarks-for-kokkos-…
kento Apr 12, 2026
e66a8bf
Port extrema benchmark from OMP to Kokkos
Copilot Apr 19, 2026
9713667
Add Kokkos port of iso2dfd benchmark
Copilot Apr 19, 2026
2d6594d
iso2dfd-kokkos: narrow MDRangePolicy to interior region, add comment
Copilot Apr 19, 2026
df4f808
Port jenkins-hash benchmark to Kokkos
Copilot Apr 19, 2026
cad662d
Fix comment accuracy for key slot size
Copilot Apr 19, 2026
8829585
Port laplace benchmark from OMP to Kokkos
Copilot Apr 19, 2026
21f4c0c
Port feynman-kac and henry benchmarks to Kokkos
Copilot Apr 19, 2026
3574d9f
Port vol2col, pointwise, and vmc benchmarks to Kokkos
Copilot Apr 19, 2026
a08ac0d
Address code review: remove redundant const View cast, improve commen…
Copilot Apr 19, 2026
6c4a86c
Port 14 new benchmarks to Kokkos (extrema, iso2dfd, jenkins-hash, lap…
Copilot Apr 19, 2026
810a10b
Merge pull request #69 from kento/copilot/port-benchmarks-for-kokkos-…
kento Apr 19, 2026
7eb2103
Port 19 CUDA benchmarks to Kokkos
kento Apr 19, 2026
c9ca339
Add Kokkos ports for 19 benchmarks (ring through snicit)
kento Apr 19, 2026
d2b4b32
Add Kokkos ports for 19 OMP benchmarks (l-series)
kento Apr 19, 2026
7bfab2f
Add Kokkos ports for 343 benchmarks
kento Apr 20, 2026
71eaea3
Merge pull request #70 from kento/copilot/port-all-benchmarks-to-kokkos
kento Apr 20, 2026
7b9a949
Align branch with RIKEN-RCCS/HeCBench upstream for kokkos-only PR
Copilot Apr 21, 2026
519c874
Merge pull request #72 from kento/copilot/add-kokkos-support-directory
kento Apr 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
16 changes: 14 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
*~
*.txt
*.bin
*.o
*.ppm
Expand All @@ -8,8 +7,21 @@
*.traindata
*.swp
*.out
*.json
*.log
*.yaml
dpct_output
*/main
main
*.tmp

# Exclude text files except CMake and documentation
*.txt
!CMakeLists.txt
!plan.md
!README*.md
!*.markdown

# Exclude JSON files except CMake presets
*.json
!CMakePresets.json
!CMakeUserPresets.json
350 changes: 350 additions & 0 deletions CMAKE_BUILD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,350 @@
# CMake Build System for HeCBench

This document describes the new CMake-based build system for HeCBench, which replaces the previous Makefile-based approach.

## Overview

The CMake build system provides:
- **Unified configuration** via CMake presets for common GPU architectures
- **Selective building** by benchmark, programming model, or category
- **Automatic compiler detection** for CUDA, HIP, SYCL, and OpenMP
- **Parallel builds** across benchmarks
- **IDE integration** (CLion, VS Code, etc.)

## Quick Start

### Prerequisites

Depending on which programming models you want to build:

- **CUDA**: NVIDIA CUDA Toolkit (11.0+)
- **HIP**: AMD ROCm (5.0+)
- **SYCL**: Intel oneAPI DPC++ or hipSYCL
- **OpenMP**: Intel oneAPI, NVIDIA HPC SDK, or AOMP
- **CMake**: 3.21 or later
- **Ninja** (recommended) or Make

### Build with a Preset

The simplest way to build is using a CMake preset:

```bash
# List available presets
cmake --list-presets

# Configure for NVIDIA Hopper GPUs with NVIDIA HPC SDK (version 25.7)
cmake -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DMPI_C_COMPILER=/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/comm_libs/mpi/bin/mpicc \
-DMPI_CXX_COMPILER=/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/comm_libs/mpi/bin/mpicxx \
-DCUDAToolkit_ROOT=/opt/nvidia/hpc_sdk/Linux_x86_64/25.7/cuda/12.9/ \
--preset cuda-sm90

# Build all configured benchmarks
cmake --build build/cuda-sm90

# Or build with Ninja in parallel
cmake --build build/cuda-sm90 --parallel
```

### Available Presets

#### NVIDIA GPUs (CUDA)
- `cuda-sm60` - Pascal (GTX 1080, P100)
- `cuda-sm70` - Volta (V100)
- `cuda-sm80` - Ampere (A100)
- `cuda-sm90` - Hopper (H100, H200)
- `cuda-sm120` - Blackwell
- `cuda-sm121` - Blackwell (GB10)

#### AMD GPUs (HIP)
- `hip-gfx908` - MI100
- `hip-gfx90a` - MI210, MI250X
- `hip-gfx942` - MI300A/X
- `hip-gfx1012` - Radeon RX 5500
- `hip-gfx1030` - Radeon RX 6900

#### SYCL
- `sycl-cuda-sm70` - SYCL with CUDA backend targeting Volta (Experimental)
- `sycl-cuda-sm80` - SYCL with CUDA backend targeting Ampere (Experimental)
- `sycl-cuda-sm90` - SYCL with CUDA backend targeting Hopper (Experimental)
- `sycl-hip-gfx908` - SYCL with HIP backend targeting MI100 (Experimental)
- `sycl-hip-gfx90a` - SYCL with HIP backend targeting MI210, MI250X (Experimental)
- `sycl-hip-gfx942` - SYCL with HIP backend targeting MI300A/X (Experimental)
- `sycl-cpu` - SYCL with CPU backend
- `sycl-xpu` - SYCL with XPU backend (Intel GPUs)

#### OpenMP offload
- `openmp-intel` - Intel compiler with OpenMP offload
- `openmp-nvidia-sm70` - NVIDIA compiler with OpenMP offload to Volta
- `openmp-nvidia-sm80` - NVIDIA compiler with OpenMP offload to Ampere
- `openmp-nvidia-sm90` - NVIDIA compiler with OpenMP offload to Hopper
- `openmp-amd-gfx908` - AMD compiler with OpenMP offload to MI100
- `openmp-amd-gfx90a` - AMD compiler with OpenMP offload to MI210, MI250X
- `openmp-amd-gfx942` - AMD compiler with OpenMP offload to MI300A/X

#### Multi-Model
- `all-models` - Build all programming models (requires all compilers)

## Building Specific Benchmarks

### Build a Single Benchmark (All Models)

```bash
cmake --preset all-models
cmake --build build/all-models --target jacobi-all
```

This builds jacobi for all enabled models (if you used `all-models` preset).

### Build a Specific Model Variant

```bash
# Build only CUDA version of jacobi
cmake --preset cuda-sm80
cmake --build build/cuda-sm80 --target jacobi-cuda

# Build only HIP version of attention
cmake --preset hip-gfx90a
cmake --build build/hip-gfx90a --target attention-hip

# Build only SYCL XPU version of attention
source /opt/intel/oneapi/setvars.sh
cmake --preset sycl-xpu
cmake --build build/sycl-xpu --target attention-sycl
```

### Build by Category

```bash
# Build all machine learning benchmarks
cmake --build build/cuda-sm80 --target category-ml

# Build all graph benchmarks
cmake --build build/cuda-sm80 --target category-graph

# Build all simulation benchmarks
cmake --build build/cuda-sm80 --target category-simulation
```

## Advanced Configuration

### Custom Configuration

If you need more control, configure without a preset:

```bash
cmake -B build/custom \
-G Ninja \
-DHECBENCH_ENABLE_CUDA=ON \
-DHECBENCH_ENABLE_HIP=OFF \
-DHECBENCH_ENABLE_SYCL=OFF \
-DHECBENCH_ENABLE_OPENMP=OFF \
-DHECBENCH_CUDA_ARCH=80

cmake --build build/custom
```

### CMake Options

| Option | Default | Description |
|--------|---------|-------------|
| `HECBENCH_ENABLE_CUDA` | ON | Enable CUDA benchmarks |
| `HECBENCH_ENABLE_HIP` | ON | Enable HIP benchmarks |
| `HECBENCH_ENABLE_SYCL` | ON | Enable SYCL benchmarks |
| `HECBENCH_ENABLE_OPENMP` | ON | Enable OpenMP benchmarks |
| `HECBENCH_CUDA_ARCH` | sm_80 | CUDA architecture (60, 70, 80, 90, etc.) |
| `HECBENCH_HIP_ARCH` | gfx90a | HIP architecture (gfx908, gfx90a, gfx942, etc.) |
| `HECBENCH_SYCL_TARGET` | (auto) | SYCL target backend |
| `HECBENCH_ENABLE_TESTING` | ON | Enable testing support |
| `HECBENCH_BUILD_ALL_BENCHMARKS` | ON | Build all vs. selective |

### Multi-Architecture Builds

To build for multiple GPU architectures, use multiple configure+build cycles:

```bash
# Build for A100
cmake --preset cuda-sm80
cmake --build build/cuda-sm80

# Build for V100
cmake --preset cuda-sm70
cmake --build build/cuda-sm70

# Build for MI250X
cmake --preset hip-gfx90a
cmake --build build/hip-gfx90a
```

## Output Structure

Compiled binaries are placed in:

```
build/<preset>/bin/<model>/
├── cuda/
│ ├── jacobi
│ ├── bfs
│ ├── softmax
│ └── ...
├── hip/
│ └── ...
├── sycl/
│ └── ...
└── omp/
└── ...
```

## Running Benchmarks

```bash
# Run a benchmark directly
./build/cuda-sm80/bin/cuda/jacobi

# Run with specific GPU
CUDA_VISIBLE_DEVICES=0 ./build/cuda-sm80/bin/cuda/attention
```

## Migration Status

The CMake build system migration is **98% complete**:

| Metric | Count |
|--------|-------|
| Total benchmark implementations | 1,818 |
| Converted to CMake | **1,790** |
| Remaining | 28 |
| Coverage | **98.5%** |

### Converted Benchmarks

**497 of 508 unique benchmarks** now have CMake support across their implementations:
- CUDA: ~495 benchmarks
- HIP: ~490 benchmarks
- SYCL: ~475 benchmarks
- OpenMP: ~320 benchmarks

### Benchmarks Not Yet Converted

The following 11 benchmarks (28 implementations) have complex dependencies that require additional work:

| Benchmark | Variants | Reason |
|-----------|----------|--------|
| `convolutionDeformable` | cuda, hip, sycl | Python/PyTorch extension (setup.py build) |
| `dwconv1d` | cuda, hip, sycl | Python/PyTorch extension (run.py build) |
| `gerbil` | cuda, hip | Requires Boost libraries |
| `halo-finder` | cuda, hip, sycl | MPI dependency + complex archive build |
| `hpl` | cuda, hip, sycl | HPL benchmark with external dependencies |
| `leukocyte` | cuda, hip, sycl, omp | External meschach library (requires pre-build) |
| `miniDGS` | cuda | MPI + ParMetis dependency |
| `miniFE` | cuda, hip, sycl, omp | Script-based build (get_common_files, generate_info_header) |
| `saxpy-ompt` | cuda, hip, sycl | Requires nvc++ compiler (OpenMP target offload) |
| `slu` | cuda | External nicslu library |

These benchmarks still work with their original Makefiles.

## Adding New Benchmarks

To convert a benchmark to CMake, add a `CMakeLists.txt` in each model directory:

```cmake
# src/mybench-cuda/CMakeLists.txt
add_hecbench_benchmark(
NAME mybench
MODEL cuda
SOURCES main.cu kernel.cu
CATEGORIES simulation physics
)
```

### Available Options

```cmake
add_hecbench_benchmark(
NAME mybench # Benchmark name (required)
MODEL cuda # Programming model: cuda, hip, sycl, omp (required)
SOURCES main.cu kernel.cu # Source files (required)
CATEGORIES simulation physics # Categories for grouping (optional)
INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/include # Additional include paths (optional)
COMPILE_OPTIONS -maxrregcount=32 # Extra compiler flags (optional)
LINK_LIBRARIES CUDA::cublas # Libraries to link (optional)
)
```

## Troubleshooting

### Compiler Not Found

```
CMake Error: Could not find CUDA/HIP/SYCL compiler
```

**Solution**: Install the required compiler or disable that model:
```bash
cmake --preset cuda-sm80 -DHECBENCH_ENABLE_HIP=OFF
```

### Architecture Mismatch

```
Error: Unsupported architecture sm_XX
```

**Solution**: Use a preset matching your GPU or set the architecture manually:
```bash
cmake --preset cuda-sm80 -DHECBENCH_CUDA_ARCH=86
```

### Missing Dependencies

Some benchmarks may require additional libraries (oneDPL, TBB, cuFFT, cuBLAS, etc.). These will be detected automatically if present.

## IDE Integration

### Visual Studio Code

Install the CMake Tools extension, then:
1. Open the HeCBench folder
2. Select a CMake preset from the status bar
3. Click "Build" or press F7

### CLion

CLion automatically detects `CMakePresets.json`:
1. Open the HeCBench project
2. CLion will import presets automatically
3. Select a profile from the dropdown
4. Build → Build Project

## Comparison with Makefile Build

| Feature | Makefile | CMake |
|---------|----------|-------|
| Configuration | Edit 2,587 individual Makefiles | Single preset selection |
| Parallel builds | Per-benchmark only | Across all benchmarks |
| Selective building | Manual (cd + make) | Target-based (by name/category) |
| IDE support | Limited | Full integration |
| Multi-arch | Rebuild everything | Separate build dirs |
| Dependency tracking | Manual | Automatic |

## Future Enhancements

Planned improvements:
- [x] Migrate benchmarks to CMake (98% complete)
- [ ] Convert remaining 11 complex benchmarks
- [ ] CTest integration for automated testing
- [ ] CPack support for distribution
- [ ] Benchmark performance regression tracking
- [ ] Docker container presets
- [ ] GitHub Actions CI integration

## Getting Help

- Report issues: https://github.com/zjin-lcf/HeCBench/issues
- Main README: [README.md](README.md)
- Full renovation plan: [plan.md](plan.md)

---

**Last Updated**: 2025-12-07
**Status**: Phase 2 Nearly Complete (98% migration)
Loading