Skip to content

da-roth/forge

Repository files navigation

Forge Logo

FORGE — Forward & Reverse Gradient Engine

High-performance JIT compilation for mathematical expressions with automatic differentiation


Forge compiles mathematical expressions to optimized x86-64 machine code with automatic gradient computation. It follows a record-once, compile-once, evaluate-many paradigm designed for workloads where the same computation is repeated with varying inputs.

Key Features

  • JIT Compilation: Generates native x86-64 machine code via AsmJit
  • Reverse-mode AD: Automatic gradient computation for all recorded operations
  • Graph Optimizations: Common subexpression elimination, constant folding, algebraic simplification
  • Instruction Set Backends: SSE2 scalar (default) and AVX2 packed (4-wide SIMD), with extensible backend interface
  • Branching Support: Record-time conditional evaluation via fbool and If() for data-dependent control flow

Pluggable Backend Architecture

Forge is designed to be backend-agnostic — the core compiler is decoupled from specific instruction sets, number types, and hardware. The AVX2 backend demonstrates this: it can be bundled at compile time (FORGE_BUNDLE_AVX2=ON) or loaded dynamically at runtime via InstructionSetFactory::loadBackend(). This architecture enables custom backends with their own register allocation strategies, machine code generation, and memory layouts. The compilation policy (ICompilationPolicy) controls whether intermediate values are stored to memory or kept in registers — enabling forward-optimized execution when gradients aren't needed, or storing values for backward forging when they are. See backends/ for implementation details and a step-by-step guide to creating custom backends.

When to Use Forge

Forge is designed for repeated evaluation scenarios:

  • Monte Carlo methods: Pricing, XVA, path-dependent calculations
  • Scenario analysis: Stress testing, what-if analysis, parameter sweeps
  • Sensitivities: Fast gradient computation across input variations
  • Model calibration: Repeated function/gradient evaluation during optimization

Trade-off: Forge incurs upfront compilation cost. For single evaluations, tape-based AD is faster. Break-even typically occurs after 10–50 evaluations depending on graph complexity.

Overview

Phase Description
1. Graph API Define computation graph using Direct API, operator overloading (fdouble), or transform from external sources (e.g., xad-forge)
2. Graph Pre-processing ForgeEngine applies graph optimizations: common subexpression elimination, constant folding, algebraic simplification, and stability cleaning
3. Kernel Forging ForgeEngine compiles optimized graph to native machine code via forward forging and optional backward forging (for gradients) using pluggable instruction set backends
4. Execution Execute the ForgedKernel repeatedly with varying inputs; retrieve computed values and gradients

Extensibility: Custom graph transformations (1), optimization passes (2), instruction set backends with custom machine code and register management (3).

Example

#include <graph/graph.hpp>
#include <compiler/forge_engine.hpp>
#include <compiler/interfaces/node_value_buffer.hpp>

using namespace forge;

int main() {
    // 1. Graph API — Define f(x) = x² + sin(x) using Direct API
    Graph graph;
    NodeId x = graph.addInput();
    graph.diff_inputs.push_back(x);                        // Mark x for gradient computation

    NodeId x_squared = graph.addNode({OpCode::Mul, x, x}); //
    NodeId sin_x = graph.addNode({OpCode::Sin, x});        // sin(x)
    NodeId result = graph.addNode({OpCode::Add, x_squared, sin_x});
    graph.markOutput(result);

    // 2. Graph Pre-processing + 3. Kernel Forging — ForgeEngine compiles graph
    ForgeEngine engine;
    auto kernel = engine.compile(graph);
    auto buffer = NodeValueBufferFactory::create(graph, *kernel);

    // 4. Execution — Run ForgedKernel repeatedly with different inputs
    buffer->setValue(x, 2.0);
    kernel->execute(*buffer);

    double f_x = buffer->getValue(result);    // f(2.0)
    double df_dx = buffer->getGradient(x);    // f'(2.0)
}

Getting Started

git clone https://github.com/da-roth/forge.git
cd forge && mkdir build && cd build
cmake .. && cmake --build .

CMake integration:

add_subdirectory(forge)
target_link_libraries(your_target PRIVATE forge::forge)

Requires C++17 and CMake 3.20+. All dependencies are fetched automatically.

License

FORGE is licensed under the Zlib License. See LICENSE.md for details.

Related Projects

Authors & Maintainers

Acknowledgments

  • AsmJit — High-performance machine code generation
  • MathPresso — Mathematical expression JIT compilation inspiration
  • AutoDiffSharp — Automatic differentiation design influence
  • SLEEF — Vectorized math functions for SIMD operations

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published