Skip to content

Generating Shader Code (part 1) #35

@github-actions

Description

@github-actions

Recently I've been...

initial commit... 2 years ago
Well that's disturbing.

wow, ok, how about "in the last couple of years" I've been working on a new project: a game engine written in Rust, mostly from scratch. To get the obvious out of the way, I'm writing my own game engine because:

  • I want to learn more about modern low-level 3D graphics programming, and
  • I like building the game engine part more than I like putting the gameplay pieces together in an editor.

If I wanted to actually ship something in an anywhere-reasonable amount of time, I would be using Godot, or Unreal, or even Bevy, which I really admire and will refer to later on (Not Unity though. Not again). This should hopefully be clear.

I've had moderate success so far, implementing a few core systems like a flexible renderer with support for variable pipelines, a flexible system for defining game inputs, and dynamically scaled text rendering via multi-channel signed distance fields (TODO: future blog post, there's not enough information out there on this). The renderer uses WGPU to target multiple backends (Vulkan, DirectX12, Metal, WebGPU, ...) and relies heavily on instanced batches to draw tens or hundreds of thousands of sprites at 60fps on my laptop.

To flex and scrutinize some of the graphics pipeline abstractions, I'm working on implementing a few basic but satisfying effects in 3D scenes - shadow mapping, SSAO, depth of field.

What does it take to get high(er) quality screen captures? Asking for a friend...

The current hurdle I'm facing is how to keep data types in sync between rust code and shaders (WGSL). Duplicating and hand-editing both sides in tandem is tedious and error prone; when you violate one of the unintuitive layout rules or forget to change a vec3 to a vec4 somewhere it can be quite difficult to debug. It should be possible, if not straightfoward to automate some of this validation and generate some of the glue code for initializing the relevant layouts, buffers, bindings, and data structures. In simpler terms, I want to define types once and share them between Rust code and shader code.

I started by researching what existing approaches did. Many WGPU or WebGPU examples are too simple to bother with streamlining these pieces, and those that are more complex are usually doing something highly specialized. One approach is "shader-first": writing WGSL and using that to generate Rust types. There are clear benefits - shader code is more specialized, so it helps to retain control over it. Generally if you can express a type in the shader, it should be representable in Rust.

The biggest drawback is that WGSL is very limited, missing things like a standardized way to organize and share code between multiple shader units. You end up copying types and functions between every shader, needing to keep them synchronized manually. Efforts like WESL exist to provide some of the missing pieces (standardized syntax for imports, for example), but they are early and in flux. I don't doubt the promise of these projects, but they don't solve my problem just yet.

WWBD?

We know the very flexible Bevy uses WGPU also - what does it do?

Bevy defines the naga-oil crate as a pre-processor layer, which mostly consists of using regex to parse custom directives from the WGSL source, modifying it, and then emitting a complete shader.

See the post-processing example, which reuses an existing FullscreenVertexOutput builtin definition:

#import bevy_core_pipeline::fullscreen_vertex_shader::FullscreenVertexOutput

@group(0) @binding(0) var screen_texture: texture_2d<f32>;
@group(0) @binding(1) var texture_sampler: sampler;
struct PostProcessSettings {
    intensity: f32,
#ifdef SIXTEEN_BYTE_ALIGNMENT
    // WebGL2 structs must be 16 byte aligned.
    _webgl2_padding: vec3<f32>
#endif
}
@group(0) @binding(2) var<uniform> settings: PostProcessSettings;

@fragment
fn fragment(in: FullscreenVertexOutput) -> @location(0) vec4<f32> {
    // Chromatic aberration strength
    let offset_strength = settings.intensity;

    // Sample each color channel with an arbitrary shift
    return vec4<f32>(
        textureSample(screen_texture, texture_sampler, in.uv + vec2<f32>(offset_strength, -offset_strength)).r,
        textureSample(screen_texture, texture_sampler, in.uv + vec2<f32>(-offset_strength, 0.0)).g,
        textureSample(screen_texture, texture_sampler, in.uv + vec2<f32>(0.0, offset_strength)).b,
        1.0
    );
}

The shader also makes use of preprocessor #ifdef conditionals, and defines PostProcessSettings for bound pipeline data. This struct must also be defined on the Rust side:

#[derive(Component, Default, Clone, Copy, ExtractComponent, ShaderType)]
struct PostProcessSettings {
    intensity: f32,
    // WebGL2 structs must be 16 byte aligned.
    #[cfg(feature = "webgl2")]
    _webgl2_padding: Vec3,
}

So unfortunately we still have duplication, though Bevy does provide some helpful derive macros like AsBindGroup that eliminates some of the WGPU verbosity (bind group layout descriptor creation).

As an aside: I think this approach is probably the right tradeoff for something like Bevy - it's reasonably simple for the end-user, handling most of the complexity in the core engine.

If we take as our main objective automating the synchronization of types between Rust and WGPU, what can we accomplish?

Shader-First (Generating Rust)

There is some prior art in the area of using WGSL to generate Rust, particularly:

  • naga-to-tokenstream, which takes a shader as input and generates a Rust module containing types, constants, and other metadata like bind group info.
  • include-wgsl-oil, which builds on the former and naga-oil to preprocess included WGSL files before generating Rust code.

Ultimately this lets you do something like:

@export struct MyStruct {
    foo: u32,
    bar: i32
}

in your shader, and

#[include_wgsl_oil::include_wgsl_oil("path/to/shader.wgsl")]
mod my_shader { }

let my_instance = my_shader::types::MyStruct {
    foo: 12,
    bar: -7
};

in Rust.

Great, this is what we were after! I forked each to suit my needs slightly more - naga-to-tokenstream supports encase which generates conversion functions for Rust types in the default representation to a GPU-compatible layout, specifically matching WGSL's memory layout requirements (alignment, etc). To avoid obfuscating any performance issues, my engine prefers that types used in the renderer require no conversion, instead using bytemuck to "cast" into byte slices for upload to the GPU.

// An example of using `encase` to generate conversion functions:
use encase::{ShaderType, UniformBuffer};

#[derive(ShaderType)]
struct AffineTransform2D {
    matrix: test_impl::Mat2x2f,
    translate: test_impl::Vec2f,
}

// UniformBuffer encapsulates (encases?) the actual data transformation 
let mut buffer = UniformBuffer::new(Vec::<u8>::new());
buffer.write(&AffineTransform2D {
    matrix: test_impl::Mat2x2f::IDENTITY,
    translate: test_impl::Vec2f::ZERO,
}).unwrap();
// byte_buffer can now be passed directly to WGPU
let byte_buffer = buffer.into_inner();
// ...

// An example of using bytemuck to copy types as-is:
#[repr(C)]
#[derive(Debug, Clone, Copy, bytemuck::Pod, bytemuck::Zeroable)]
pub struct BasicInstanceData {
    pub subtexture: Rect,
    pub tint: Color,
    pub transform: Mat4,
}

let instances = &[
    BasicInstanceData{
        tint: Color::RED,
        transform: Mat4::from_translation(vec3(1.0, 0.0, 2.0)),
        ..Default::default()
    },
];
let byte_slice: &[u8] = bytemuck::cast_slice(instances);

I also modified generate-wgsl-oil to run in a build script rather than as an attribute macro:

use generate_wgsl_oil::generate_from_entrypoints;

// simplified build script example
fn main() {
    let dir = std::path::Path::new("res/shaders/");
    let mut shader_paths = vec![];
    for res in dir.read_dir().expect("failed to open res/shaders/") {
        if let Ok(entry) = res {
            shader_paths.push(entry.path().to_string_lossy().into_owned());
        }
    }
    let result = generate_from_entrypoints(&shader_paths);
    std::fs::write("src/renderer/shaders.rs", result).unwrap();
}

When the crate is compiled, build.rs reads included WGSL files and generates Rust structs for any types marked with @export.

Shader:

// deferred_lighting.wgsl
@export
struct Light {
    position: vec4<f32>,
    color: vec4<f32>,
    view_proj: mat4x4<f32>,
}

@export
struct LightsUniform {
    items: array<Light, 8>,
    count: u32,
}

// ...

Generated Rust:

pub mod deferred_lighting {
    ///Equivalent Rust definitions of the types defined in this module.
    pub mod types {
        #[allow(unused, non_camel_case_types)]
        #[repr(C)]
        #[derive(Debug, PartialEq, Copy, Clone, bytemuck::Pod, bytemuck::Zeroable)]
        pub struct Light {
            pub position: glam::f32::Vec4,
            pub color: glam::f32::Vec4,
            pub view_proj: glam::f32::Mat4,
        }
        #[allow(unused, non_camel_case_types)]
        #[repr(C)]
        #[derive(Debug, PartialEq, Copy, Clone, bytemuck::Pod, bytemuck::Zeroable)]
        pub struct LightsUniform {
            pub items: [Light; 8u32 as usize],
            pub count: u32,
            pub _pad: [u8; 12u32 as usize],
        }
    }
    // ...
}

Aside: Automatic Alignment

One big pro of this approach is being able to generate padding easily - all of this nonsense can be hidden away:

(nonsense)
let mut layouter = Layouter::default();
layouter.update(module.to_ctx()).unwrap();

let members_have_names = members.iter().all(|member| member.name.is_some());
let mut last_field_name = None;
let mut total_offset = 0;
let mut largest_alignment = 0;

let mut members: Vec<_> = members
    .iter()
    .enumerate()
    .map(|(i_member, member)| {
        let member_name = if members_have_names {
            let member_name =
                member.name.as_ref().expect("all members had names").clone();
            syn::parse_str::<syn::Ident>(&member_name)
        } else {
            syn::parse_str::<syn::Ident>(&format!("v{}", i_member))
        };

        self.rust_type_ident(member.ty, module, args).and_then(|member_ty| {
            member_name.ok().map(|member_name| {
                let inner_type = module
                    .types
                    .get_handle(member.ty)
                    .expect("failed to locate member type")
                    .inner
                    .clone();
                let field_size = inner_type.size(module.to_ctx());
                let alignment = layouter[member.ty].alignment * 1u32;
                largest_alignment = largest_alignment.max(alignment);
                let padding_needed =
                    layouter[member.ty].alignment.round_up(total_offset)
                        - total_offset;
                let pad = if padding_needed > 0 {
                    let padding_member_name = format_ident!(
                        "_pad_{}",
                        last_field_name.as_ref().expect(
                            "invariant: expected prior member before padding field"
                        )
                    );
                    quote::quote! {
                        pub #padding_member_name: [u8; #padding_needed as usize],
                    }
                } else {
                    quote::quote! {}
                };
                total_offset += field_size + padding_needed;
                last_field_name = Some(member_name.clone());
                quote::quote! {
                    #pad
                    pub #member_name: #member_ty
                }
            })
        })
    })
    .collect();

// Some final padding may be needed to ensure a packed version of the struct
// has proper overall alignment, which is determined by the largest among
// individual field alignment requirements
let struct_alignment = Alignment::from_width(largest_alignment as u8);
if !struct_alignment.is_aligned(total_offset) {
    // struct needs padding to be aligned
    let padding_needed = struct_alignment.round_up(total_offset) - total_offset;
    members.push(Some(quote::quote! {
        pub _pad: [u8; #padding_needed as usize],
    }));
}

// ...

syn::parse_quote! {
    #[repr(C)]
    #[derive(Debug, PartialEq, Copy, Clone)]
    pub struct #struct_name {
        #(#members ,)*
    }
}

What's the catch

I actually went quite far with this approach, eliminating more and more boilerplate and repetition with codegen, extending what was now "WGSL" in airquotes to support more and more of the ergonomics you'd want: namespacing, imports, macros, etc. - trying to turn it into a first-class programming language, which it is not. I contemplated forking the language server to add support for the different pre-processor directives and stop my editor from complaining (we are like 3 levels deep in the Hal-changing-a-lightbulb-recursion at this point).

Ultimately I found myself unsatisfied with one significant aspect of this approach - and (perhaps surprisingly) it wasn't the amount of busywork I had created.

The problem now is that despite Rust types being what we'll interact with the most, we have limited control over them and lose a lot of expressiveness; WGSL types are far simpler (by design). With this implementation, these basic WGSL types can only have one corresponding Rust representation - a WGSL vec4<f32> is always a Rust glam::Vec4, for example. To make the difficulty this imposes more concrete, consider the following case.

Instancing Matrices

It is common in graphics programming to have per-vertex data, and optionally per-instance data, both of which are stored in vertex buffers. The per-vertex data the shader acts on changes with every - you guessed it - vertex, but the programmer has control over how these vertices are grouped into instances. This is a way to define "per-object" (or per-mesh) data without duplicating it for each vertex.

struct VertexInput {
    @location(0) position: vec4<f32>,
    @location(1) tex_coords: vec2<f32>,
}

struct InstanceInput {
    // example instance data: a color value to be applied to every vertex
    @location(2) tint: vec4<f32>,
}

@vertex
fn main(
    // when rendering triangles, we might have three different VertexInputs for
    // each InstanceInput
    vertex: VertexInput,
    instance: InstanceInput,
) -> @builtin(position) vec4<f32> {
    // ...
}

There are differences in the data structures supported by vertex buffers in WebGPU, or at least in the way the shader can access that data. Most notably in our case, they can't contain a matrix - but if you wanted, for example, a 4x4 matrix, you could just put 4 vec4fs in the vertex data structure, and then reconstruct from these columns (or rows).

I wrote the renderer relying heavily on instancing, with 2 4x4 matrices in the instance data structure, so that means the structs for instance data in shaders look something like this:

struct Instance {
  ...
  model_1: vec4f,
  model_2: vec4f,
  model_3: vec4f,
  model_4: vec4f,
  normal_1: vec4f,
  normal_2: vec4f,
  normal_3: vec4f,
  normal_4: vec4f,
};

and I manually transform these into the desired mat4x4:

@vertex
fn vs_main(
    vertex: ModelVertexData,
    instance: InstanceInput,
) -> VertexOutput {
    let model_transform = mat4x4<f32>(
        instance.model_1,
        instance.model_2,
        instance.model_3,
        instance.model_4,
    );
    let normal_matrix = mat4x4<f32>(
        instance.normal_1,
        instance.normal_2,
        instance.normal_3,
        instance.normal_4,
    );
    // ...
}

Cumbersome, but we'll deal with it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions