C++ game engine built to explore high-performance architecture.
Currently under active development, serves as both a learning platform and research project.
Or it might just be a playground to test my sanity.
Important
My original Bachelor's Thesis version is archived in the thesis branch.
Honestly? I just really love this stuff.
It started with my Bachelor's Thesis, where I designed a dual-renderer engine to benchmark Vulkan path tracing against traditional OpenGL PBR. The focus was purely on real-time graphics, so the underlying architecture was single-threaded. It worked, and I had a blast building it!
Then I watched Christian Gyrling’s GDC talk on Parallelizing the Naughty Dog Engine Using Fibers. Seeing how they saturated every single CPU core made me realize that my "simple loop" was basically running with the parking brake on.
So, I started Luth from scratch to explore high-performance architecture: fiber-based job systems, lock-free memory models, and bindless Vulkan rendering. It is absolutely over-engineered for a solo project, but that’s the point.
Prerequisites:
- OS: Windows 10 / 11
- Compiler: MSVC (v143+) or Clang (C++20-compliant)
- SDK: Vulkan SDK 1.3+. Needs
dynamicRendering,timelineSemaphore, anddescriptor indexing(any GPU 2018+)
Steps:
- Clone with submodules
git clone --recursive https://github.com/Hekbas/Luth.git
- Generate the VS solution
scripts/setup/setup_windows.bat
- Build — either open
Luth.slnin Visual Studio 2022, or run the headless script:scripts/build/build_windows.bat
The editor binary lands at bin/windows-x86_64/Debug/Runtime/Luthien.exe.
Luth moves away from standard C++ patterns (RAII everywhere, heavy STL usage, single-threaded contexts) in favor of Data-Oriented Design and Fiber-Based Concurrency.
Instead of dedicated OS threads per task ("Render Thread", "Audio Thread"), Luth treats the CPU as a generic worker pool.
- N:M Threading: One Worker Thread per CPU core. Logical tasks are wrapped in Fibers aka lightweight user-mode stacks that migrate freely between workers.
- Zero Blocking: When a job waits on a dependency (or the GPU), it yields to the scheduler, which swaps in another fiber. CPU saturation stays near 100%.
- Synchronization: SpinLocks (test-and-set +
_mm_pause()) and Atomic Counters keep critical sections short, never blocks the OS.
Three stages run in parallel. At any frame T, the engine is processing three frames at once:
time ──► frame N frame N-1 frame N-2
┌────────┐ ┌───────────┐ ┌─────────┐
CPU → │ Game │ → │ Render │ → │ GPU │
│ logic │ │ recording │ │ execute │
└────────┘ └───────────┘ └─────────┘
- Game (N): Physics, AI, transform updates.
- Render (N-1): Read last frame's results, record Vulkan command buffers in parallel.
- GPU (N-2): Execute the commands submitted previously.
new / delete are forbidden in the hot path. Two allocators handle everything that churns:
Page Pool (2 MB virtual pages)
├── TaggedPageAllocator ── tagged lifetime, bulk free
│ └── per-thread cache ── lock-free hot-path allocations
└── LinearAllocator ── per-frame, reset on Begin()
- Tagged Page Allocator — Naughty Dog–style. Allocations carry a tag (
LevelGeometry,Frame_N, …) and are freed in bulk by tag. - Linear Allocator — bump-allocate transient frame data (command lists, UI state); resets each frame, no per-object destructors.
Modern hardware, minimal driver overhead.
- Bindless Descriptors:
VK_EXT_descriptor_indexingbinds all engine textures to one global array (Set 0). Materials store an integer index — any draw call can sample any texture without rebinding. - Dynamic Rendering: No
VkRenderPass/VkFramebuffer— passes usevkCmdBeginRenderingdirectly. - Timeline Semaphores: Replace
vkWaitForFences. A dedicated Poller Job queries semaphore values and wakes dependent fibers only when the GPU finishes their workload. - VMA: Vulkan Memory Allocator handles all device-memory placement (buffers, images, staging).
Each frame, Luth builds a DAG of render passes. Passes declare reads and writes through a RenderPassBuilder; the graph solves pipeline barriers, culls unused passes, and computes resource lifetimes automatically.
graph.AddPass<GeometryPassData>("GeometryPass",
[&](GeometryPassData& data, RG::RenderPassBuilder& builder) {
data.depthTex = builder.WriteDepth(sceneDepth, ...);
data.outputTex = builder.Write(sceneColor);
data.indirect = builder.ReadIndirectBuffer(indirectBuffer);
},
[=](GeometryPassData& data, RG::RenderPassContext& ctx) {
// record draw commands on ctx.commandBuffer
});Passes execute in topological order; command-buffer recording inside each pass parallelizes across worker threads.
| PBR | Cook-Torrance BRDF, metallic/roughness workflow, material SSBO with render mode variants (Opaque, Cutout, Transparent) |
| Lighting | 1 directional + up to 64 point lights from ECS, LightUBO (Set 3) |
| Shadows | 4-cascade PSSM (Sascha Willems bounding-sphere fit), per-cascade GPU cull, PCF 3×3 via sampler2DShadow, cascade blending + bias |
| Ambient Occlusion | GTAO half-res compute chain — depth prefilter → horizon integral → bilateral denoise (Jimenez 2016 slice integral, VS-normal reconstruction from depth) |
| GPU Culling | Compute frustum cull per cascade + main scene, GPUObjectData SSBO (Set 5), vkCmdDrawIndexedIndirect everywhere |
| IBL | HDR skybox, diffuse irradiance, pre-filtered specular (5 mips), BRDF LUT, split-sum ambient |
| Post-Processing | HDR pipeline, bloom, tonemapping (Reinhard/ACES/Uncharted 2/exposure), vignette, film grain, chromatic aberration |
| Shaders | Single-stage SPIR-V asset pipeline (.vert/.frag/.comp each one artifact + UUID), hot-reload on any stage via FileWatcher, SPIRV-Cross reflection |
| Pipeline Cache | Disk-persisted VkPipelineCache, lazy variant creation, targeted hot-reload invalidation |
| Mipmaps | Per-texture settings pipeline with sampler maxLod control |
| Sampling | Fiber-parallel keyframe evaluation across worker threads |
| GPU Skinning | Bone matrix SSBO, vertex shader skinning |
| Blending | SQT interpolation, crossfade transitions, layered override with bone masks |
| Root Motion | Automatic extraction and application to entity transform |
| Debug | Bone overlay visualization in editor viewport |
| Asset Database | UUID-based registry with .meta sidecar files, importers for shaders/textures/models/materials |
| Smart Import | Multi-strategy texture discovery, drag-and-drop with eager import, texture remap dialog |
| Hot Reload | FileWatcher-based live reload for shaders, textures, and project files |
| Scene Format | Custom JSON .luth format with dirty tracking and native file dialogs |
| Scene Interaction | Mouse picking (ID buffer), selection outlines with occluded fade, shade modes (Lit/Wireframe/Unlit) |
| Inspector | Material editor, animation controls, light/shadow settings, Add Component workflow |
| Undo / Redo | Command pattern with 14 command types, UUID-based entity resolution, gizmo drag coalescing, compound commands, material snapshot undo |
| Frame Debugger | Trigger-based capture, frozen-state model with auto-recapture on camera move, hierarchical event tree (Group/Pass/Cascade/Draw), per-draw replay-then-copy, archive sink + per-pass image staging, CSM cascade detail panel |
| Project Panel | Folder navigation, search, hot reload, context menus for entity/primitive creation |
| Profiler | Per-system timing breakdown with fiber-aware instrumentation |
| Persistence | Window layouts, editor settings, and panel state saved across sessions |
See the full development roadmap for completed phases and version history.
Rendering — Deferred GBuffer, Forward+ clustered lighting, FXAA/TAA, global illumination, volumetric fog, SSR
Gameplay — Physics (Jolt, jobified), GPU particle system, animation blend trees & IK, prefab system, scripting (C#/Lua)
Editor — Play mode, asset streaming, visual shader editor
LUTH Engine is built on the shoulders of giants:
| Vulkan SDK | Rendering backend |
| VMA | Vulkan memory allocator |
| shaderc | Runtime GLSL → SPIR-V compilation (ships with Vulkan SDK) |
| SPIRV-Cross | Shader reflection |
| EnTT | Entity-Component-System |
| ImGui | Editor GUI |
| ImGuizmo | Translate / rotate / scale gizmos |
| Tracy | Frame profiler |
| GLFW | Windowing + input |
| GLM | Math |
| spdlog | Logging |
| assimp | Model importing |
| stb_image | Image loading |
| nlohmann/json | JSON serialization |
Planned integrations:
- Jolt Physics — rigid body physics, jobified onto the fiber scheduler
Released under the MIT License.

