An open-source, full-stack AI inference platform.
Ember covers the entire inference stack from GPU kernels to serving, built on Triton, IREE, and SGLang.
AI Evolution Layer Auto-Evolve (AI-driven continuous optimization)
─────────────────────────────────────────────────────────────────
Layer 8 Serve SGLang-based, 3-process, continuous batching
Layer 7 Pipeline Text gen / speculative / constrained decoding
Layer 6 KV Cache Paged + multi-tier (GPU, CPU, SSD)
Layer 5 NN Module Transformer, Attention, RoPE, weight loading
Layer 4 Graph Static graph API + IREE compiler (custom passes)
Layer 3 Runtime IREE Runtime + LLM scheduling extensions
Layer 2 Kernels Triton kernels + FlashAttention + FlagGems
Layer 1 Compiler MLIR/LLVM + Triton compiler + CUTLASS
─────────────────────────────────────────────────────────────────
Hardware NVIDIA (PTX) | AMD (ROCm) | Apple (Metal)
Early development. Not yet functional.
bazel build //...Apache 2.0. See LICENSE.