Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,6 @@ biases2.csv

.venv/
experiments/src/bhd.scala

scripts/cellar
.cellar/
168 changes: 168 additions & 0 deletions gvecxt/cyfra docs/gpu-functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
---
sidebar_position: 3
---

# GPU Functions

The simplest way to use the Cyfra library is with a GFunction. In essence, it is a function that takes any input you give it, runs on the GPU, and returns the output.

```scala
import io.computenode.cyfra.dsl.{*, given}
import io.computenode.cyfra.foton.GFunction
import io.computenode.cyfra.runtime.VkCyfraRuntime

@main
def multiplyByTwo(): Unit =
VkCyfraRuntime.using:
val input = (0 until 256).map(_.toFloat).toArray

val doubleIt: GFunction[GStruct.Empty, Float32, Float32] = GFunction: x =>
x * 2.0f

val result: Array[Float] = doubleIt.run(input)

println(s"Output: ${result.take(10).mkString(", ")}...")
```

`doubleIt.run(input)` will simply take the provided input and run the GFunction on it. As a result, we will get an array of floats that are each a doubled entry from the input array.

## Cyfra DSL

When you use Cyfra, you enter a world of values that are entirely separate from standard Scala values. Float becomes `Float32`, Double becomes `Float64`, and so on. Below is a table with more examples. Those are not all the types, but this includes most important ground types (`Float32`, `Int32`, `GBoolean`, etc.). Any ground types can be then used with Vectors. Additionally any types can be composed into `GStruct`s (including other `GStruct`s).

| Scala Type | Cyfra Type |
|------------|------------|
| `Float` | `Float32` |
| `Double` | `Float64` |
| `Int` | `Int32` |
| `Int` | `UInt32` (unsigned) |
| `Boolean` | `GBoolean` |
| `(Float, Float)` | `Vec2[Float32]` |
| `(Float, Float, Float, Float)` | `Vec4[Float32]` |
| `(Int, Int)` | `Vec2[Int32]` |

The operators you use stay the same, but keep in mind - for an operation to happen on the GPU, it needs to involve a Cyfra value type.

## Using Uniforms

In the previous example, the GFunction only took a float array as an input. There is, however, a way to provide additional parameters to each run. This has to do with the first type parameter of `GFunction` that was set to `GStruct.Empty` in the previous example. This is the Uniform structure that can be provided for each GFunction.

```scala
case class FunctionParam(a: Float32) extends GStruct[FunctionParam]

@main
def multiplyByTwo(): Unit =
VkCyfraRuntime.using:
val input = (0 until 256).map(_.toFloat)

val doubleIt: GFunction[FunctionParam, Float32, Float32] = GFunction:
(params: FunctionParam, x: Float32) =>
x * params.a

val params = FunctionParam(2.0f)

val result: Array[Float] = doubleIt.run(input, params)

println(s"Output: ${result.take(10).mkString(", ")}...")
```

You can see that the lambda in GFunction takes `FunctionParam`. The GStruct case class can be any product of any Cyfra values (including other structs).

## If-else becomes when-otherwise

Because in Cyfra we live in a different (GPU) world, it is required to use alternative control expressions. The most basic one is the `when`(-`elseWhen`-)`otherwise`:
```scala
val multiplyIt: GFunction[FunctionParam, Float32, Float32] = GFunction:
(params: FunctionParam, x: Float32) =>
when(x < 100f):
x * params.a
.elseWhen(x < 200f):
x * params.a * 2f
.otherwise:
x * params.a * 4f
```

## GSeqs

To iterate and express collections, Cyfra offers a `GSeq` type. It corresponds to a `LazyList` from Scala - a lazily evaluated sequence that can be transformed and consumed with familiar functional operations.

### Creating a GSeq

Use `GSeq.gen` to create a sequence by providing an initial value and a function that produces the next element:

```scala
// Create from a known list of elements
val colors = GSeq.of(List(red, green, blue))

// Generate integers: 0, 1, 2, 3, ...
val integers = GSeq.gen[Int32](0, n => n + 1)

// Generate Fibonacci-like pairs using Vec2: (0,1), (1,1), (1,2), (2,3), ...
val fibonacci = GSeq.gen[Vec2[Float32]]((0.0f, 1.0f), pair => (pair.y, pair.x + pair.y))

// Mandelbrot iteration: z = z² + c
val mandelbrot = GSeq.gen(
vec2(0.0f, 0.0f),
z => vec2(z.x * z.x - z.y * z.y + cx, 2.0f * z.x * z.y + cy)
)
```

You must always call `.limit(n)` before consuming a GSeq to set a maximum iteration count (infinite sequences are not supported on GPU).

### Map, filter, takeWhile

Transform and filter sequences with familiar operations:

```scala
// Map: transform each element
val doubled = GSeq.gen[Int32](0, _ + 1).limit(100).map(_ * 2)

// Filter: keep only matching elements
val evens = GSeq.gen[Int32](0, _ + 1).limit(100).filter(n => n.mod(2) === 0)

// TakeWhile: stop when condition becomes false
val underTen = GSeq.gen[Int32](0, _ + 1).limit(100).takeWhile(_ < 10)
```

These can be chained together:

```scala
// Julia set iteration: iterate until escape or limit
val iterations = GSeq
.gen(uv, v => ((v.x * v.x) - (v.y * v.y), 2.0f * v.x * v.y) + const)
.limit(1000)
.map(length) // Transform to magnitude
.takeWhile(_ < 2.0f) // Stop when magnitude exceeds 2
```

### Fold, count, lastOr

Terminal operations consume the sequence and produce a result:

```scala
// Count: number of elements that passed through
val iterationCount: Int32 = GSeq
.gen(vec2(0f, 0f), z => vec2(z.x*z.x - z.y*z.y + cx, 2f*z.x*z.y + cy))
.limit(256)
.takeWhile(z => z.x*z.x + z.y*z.y < 4.0f)
.count

// Fold: reduce with accumulator
val sum: Int32 = GSeq.gen[Int32](1, _ + 1).limit(10).fold(0, _ + _)

// LastOr: get final element (or default if empty)
val finalValue: Int32 = GSeq.gen[Int32](0, _ + 1).limit(10).lastOr(0)
```

**Every GSeq must have a hard `limit` of maximum elements it can hold**


## Example usage

GFunction may be a simple construct, but it is enough to accelerate many applications. An example is a raytracer that would otherwise take a very long time to run on a CPU. Here is the implementation of a raytracer with Cyfra:

![Animated Raytracing](https://github.com/user-attachments/assets/3eac9f7f-72df-4a5d-b768-9117d651c78d)

Source:
- [ImageRtRenderer.scala](https://github.com/ComputeNode/cyfra/blob/cab6b4cae3a3402a3de43272bc7cb50acf5ec67b/cyfra-foton/src/main/scala/io/computenode/cyfra/foton/rt/ImageRtRenderer.scala)
- [RtRenderer.scala](https://github.com/ComputeNode/cyfra/blob/cab6b4cae3a3402a3de43272bc7cb50acf5ec67b/cyfra-foton/src/main/scala/io/computenode/cyfra/foton/rt/RtRenderer.scala)
175 changes: 175 additions & 0 deletions gvecxt/cyfra docs/gpu-pipelines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
---
sidebar_position: 5
---

# GPU Pipelines with GExecution

When you need to run multiple GPU programs in sequence, sharing data between them, `GExecution` provides a composable way to build GPU pipelines. The big benefit is that those pipelines, even highly complex ones, are materialized as one on the GPU and will synchronize and pass data without any operations on the CPU side. This greatly improves performance of computations.

## What is GExecution?

`GExecution` represents a sequence of GPU operations that can be composed together. It's a monadic abstraction that allows you to:

- Chain multiple `GProgram`s together
- Share buffers between pipeline stages
- Transform parameters and layouts between programs

Every `GProgram` is also a `GExecution`, making them directly composable.

## Building a Simple Pipeline

Let's build a pipeline that doubles values and then adds a constant:

```scala
import io.computenode.cyfra.core.{GBufferRegion, GExecution, GProgram}
import io.computenode.cyfra.core.GProgram.StaticDispatch
import io.computenode.cyfra.core.layout.Layout
import io.computenode.cyfra.dsl.{*, given}
import io.computenode.cyfra.runtime.VkCyfraRuntime

// Step 1: Define individual programs

case class DoubleLayout(input: GBuffer[Float32], output: GBuffer[Float32]) derives Layout

val doubleProgram: GProgram[Int, DoubleLayout] = GProgram[Int, DoubleLayout](
layout = size => DoubleLayout(
input = GBuffer[Float32](size),
output = GBuffer[Float32](size)
),
dispatch = (_, size) => StaticDispatch(((size + 255) / 256, 1, 1)),
workgroupSize = (256, 1, 1),
): layout =>
val idx = GIO.invocationId
GIO.when(idx < 256):
val value = GIO.read(layout.input, idx)
GIO.write(layout.output, idx, value * 2.0f)

case class SumParams(value: Float32) extends GStruct[SumParams]
case class SumLayout(
input: GBuffer[Float32],
output: GBuffer[Float32],
params: GUniform[AddParams]
) derives Layout

val sumProgram: GProgram[Int, SumLayout] = GProgram[Int, SumLayout](
layout = size => SumLayout(
input = GBuffer[Float32](size),
output = GBuffer[Float32](size),
params = GUniform[SumParams]()
),
dispatch = (_, size) => StaticDispatch(((size + 255) / 256, 1, 1)),
workgroupSize = (256, 1, 1),
): layout =>
val idx = GIO.invocationId
GIO.when(idx < 256):
val value = GIO.read(layout.input, idx)
val addValue = layout.params.read.value
GIO.write(layout.output, idx, value + addValue)
```

## Composing Programs with addProgram

The key to building pipelines is the `addProgram` method. It takes:

1. A program to add to the execution
2. A function to map pipeline parameters to program parameters
3. A function to map the pipeline layout to the program's layout

```scala
// Step 2: Define the combined pipeline layout
case class PipelineLayout(
input: GBuffer[Float32],
doubled: GBuffer[Float32], // Intermediate buffer
output: GBuffer[Float32],
sumParams: GUniform[SumParams]
) derives Layout

// Step 3: Compose the pipeline
val doubleAndAddPipeline: GExecution[Int, PipelineLayout, PipelineLayout] =
GExecution[Int, PipelineLayout]()
.addProgram(doubleProgram)(
size => size, // Map params: pipeline size -> program size
layout => DoubleLayout(layout.input, layout.doubled) // Map layout
)
.addProgram(sumProgram)(
size => size,
layout => SumLayout(layout.doubled, layout.output, layout.sumParams)
)
```

Notice how the `doubled` buffer connects the two programs - the first program writes to it, and the second reads from it.

**Pipelines can be made from any number of GPrograms that form a directed acyclic graph.**

## Running the Pipeline

Execute the pipeline using `GBufferRegion`:

```scala
@main
def runDoubleAndSumPipeline(): Unit = VkCyfraRuntime.using:
val size = 256
val inputData = (0 until size).map(_.toFloat).toArray
val results = Array.ofDim[Float](size)

val region = GBufferRegion
.allocate[PipelineLayout]
.map: layout =>
doubleAndAddPipeline.execute(size, layout)

region.runUnsafe(
init = PipelineLayout(
input = GBuffer(inputData),
doubled = GBuffer[Float32](size),
output = GBuffer[Float32](size),
sumParams = GUniform(SumParams(10.0f)),
),
onDone = layout => layout.output.readArray(results),
)
```

## GExecution Operations

`GExecution` provides several composition methods:

### addProgram

Add another program to the pipeline, mapping parameters and layout:

```scala
execution.addProgram(program)(
mapParams = pipelineParams => programParams,
mapLayout = pipelineLayout => programLayout
)
```

### map

Transform the result layout:

```scala
execution.map(resultLayout => newResultLayout)
```

### flatMap

Sequence executions where the second depends on the first's result:

```scala
execution.flatMap: resultLayout =>
anotherExecution
```

## Example: Case-study on fs2 filtering

A great read for understanding pipelines is a report from our contributor `spamegg`. It describes a process of implementing a [parallel prefix sum](https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda) based filtering approach. We highly recommend reading it: [GSoC 2025: fs2 filtering through Cyfra](https://spamegg1.github.io/gsoc-2025/#fs2-filtering-through-cyfra).

This article also tackles the topic of adapting GPrograms of mismatched Layouts and Parameters with `contramap` and `contramapParams`. They make it possible to connect any kinds of unrelated GPrograms into one pipeline.

## Example: Navier-Stokes Fluid Simulation

The `cyfra-fluids` module implements a full 3D Navier-Stokes fluid solver using GExecution pipelines. Each simulation step chains multiple GPU programs: forces, advection, diffusion, pressure projection, and boundary conditions. The GPipeline in this example is built from over 100 GPrograms.

![Fluid Simulation](/img/full_fluid_8s.gif)

[View the implementation](https://github.com/ComputeNode/cyfra/tree/cab6b4cae3a3402a3de43272bc7cb50acf5ec67b/cyfra-fluids/src/main/scala/io/computenode/cyfra/fluids)
Loading