GitHub - da-roth/forge

FORGE — Forward & Reverse Gradient Engine

High-performance JIT compilation for mathematical expressions with automatic differentiation

Forge compiles mathematical expressions to optimized x86-64 machine code with automatic gradient computation. It follows a record-once, compile-once, evaluate-many paradigm designed for workloads where the same computation is repeated with varying inputs.

Key Features

JIT Compilation: Generates native x86-64 machine code via AsmJit
Reverse-mode AD: Automatic gradient computation for all recorded operations
Graph Optimizations: Common subexpression elimination, constant folding, algebraic simplification
Instruction Set Backends: SSE2 scalar (default) and AVX2 packed (4-wide SIMD), with extensible backend interface
Branching Support: Record-time conditional evaluation via fbool and If() for data-dependent control flow

Pluggable Backend Architecture

Forge is designed to be backend-agnostic — the core compiler is decoupled from specific instruction sets, number types, and hardware. The AVX2 backend demonstrates this: it can be bundled at compile time (FORGE_BUNDLE_AVX2=ON) or loaded dynamically at runtime via InstructionSetFactory::loadBackend(). This architecture enables custom backends with their own register allocation strategies, machine code generation, and memory layouts. The compilation policy (ICompilationPolicy) controls whether intermediate values are stored to memory or kept in registers — enabling forward-optimized execution when gradients aren't needed, or storing values for backward forging when they are. See backends/ for implementation details and a step-by-step guide to creating custom backends.

When to Use Forge

Forge is designed for repeated evaluation scenarios:

Monte Carlo methods: Pricing, XVA, path-dependent calculations
Scenario analysis: Stress testing, what-if analysis, parameter sweeps
Sensitivities: Fast gradient computation across input variations
Model calibration: Repeated function/gradient evaluation during optimization

Trade-off: Forge incurs upfront compilation cost. For single evaluations, tape-based AD is faster. Break-even typically occurs after 10–50 evaluations depending on graph complexity.

Overview

Phase	Description
1. Graph API	Define computation graph using Direct API, operator overloading (`fdouble`), or transform from external sources (e.g., xad-forge)
2. Graph Pre-processing	ForgeEngine applies graph optimizations: common subexpression elimination, constant folding, algebraic simplification, and stability cleaning
3. Kernel Forging	ForgeEngine compiles optimized graph to native machine code via forward forging and optional backward forging (for gradients) using pluggable instruction set backends
4. Execution	Execute the ForgedKernel repeatedly with varying inputs; retrieve computed values and gradients

Extensibility: Custom graph transformations (1), optimization passes (2), instruction set backends with custom machine code and register management (3).

Example

#include <graph/graph.hpp>
#include <compiler/forge_engine.hpp>
#include <compiler/interfaces/node_value_buffer.hpp>

using namespace forge;

int main() {
    // 1. Graph API — Define f(x) = x² + sin(x) using Direct API
    Graph graph;
    NodeId x = graph.addInput();
    graph.diff_inputs.push_back(x);                        // Mark x for gradient computation

    NodeId x_squared = graph.addNode({OpCode::Mul, x, x}); // x²
    NodeId sin_x = graph.addNode({OpCode::Sin, x});        // sin(x)
    NodeId result = graph.addNode({OpCode::Add, x_squared, sin_x});
    graph.markOutput(result);

    // 2. Graph Pre-processing + 3. Kernel Forging — ForgeEngine compiles graph
    ForgeEngine engine;
    auto kernel = engine.compile(graph);
    auto buffer = NodeValueBufferFactory::create(graph, *kernel);

    // 4. Execution — Run ForgedKernel repeatedly with different inputs
    buffer->setValue(x, 2.0);
    kernel->execute(*buffer);

    double f_x = buffer->getValue(result);    // f(2.0)
    double df_dx = buffer->getGradient(x);    // f'(2.0)
}

Getting Started

git clone https://github.com/da-roth/forge.git
cd forge && mkdir build && cd build
cmake .. && cmake --build .

CMake integration:

add_subdirectory(forge)
target_link_libraries(your_target PRIVATE forge::forge)

Requires C++17 and CMake 3.20+. All dependencies are fetched automatically.

License

FORGE is licensed under the Zlib License. See LICENSE.md for details.

Related Projects

xad-forge — Forge JIT backend for XAD
QuantLib-Risks-Cpp-Forge — QuantLib-Risks with Forge JIT integration

Authors & Maintainers

da-roth

Acknowledgments

AsmJit — High-performance machine code generation
MathPresso — Mathematical expression JIT compilation inspiration
AutoDiffSharp — Automatic differentiation design influence
SLEEF — Vectorized math functions for SIMD operations

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
api		api
backends		backends
examples		examples
scripts		scripts
src		src
tests		tests
tools		tools
toolsTests		toolsTests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE.md		LICENSE.md
README.md		README.md
build-run-examples.ps1		build-run-examples.ps1
build-run-examples.sh		build-run-examples.sh
forgeGPT.png		forgeGPT.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FORGE — Forward & Reverse Gradient Engine

Key Features

Pluggable Backend Architecture

When to Use Forge

Overview

Example

Getting Started

License

Related Projects

Authors & Maintainers

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

da-roth/forge

Folders and files

Latest commit

History

Repository files navigation

FORGE — Forward & Reverse Gradient Engine

Key Features

Pluggable Backend Architecture

When to Use Forge

Overview

Example

Getting Started

License

Related Projects

Authors & Maintainers

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages