Skip to content

NVIDIA/nv-sflow

sflow

License Python CI

A declarative workflow descriptor that separates what to deploy from where to deploy it, see our project page.

Describe your workflow once in a portable YAML -- tasks, dependencies, resources, and launch methods -- and sflow executes the DAG through swappable backends, leveraging each platform's native ecosystem. Write one sflow.yaml and run it across environments with minimal changes.

The current focus is Slurm, which lacks a built-in workflow orchestration layer. Docker and Kubernetes backends are planned.

sflow TUI

Key Features

Feature Description
Modular Composition Split workflows into reusable YAML fragments, merge at runtime with sflow compose or multi-file sflow run -f
Topology-aware GPU Allocation Automatic node/GPU placement with CUDA_VISIBLE_DEVICES slicing across tasks and replicas
Probes Readiness and failure gates -- TCP port, HTTP, log watch with pattern matching
Replicas & Sweeps Parallel/sequential replicas with Cartesian product variable sweeps
Batch Mode Generate sbatch scripts, CSV-driven bulk sweeps, parallel preflight validation
Expressions Jinja2 ${{ }} syntax for variables, backend info, and task metadata
Artifacts Named URIs (fs://, file://, http://) with inline content generation
Live TUI Rich terminal interface with task status, log tailing, and allocation maps
AI Agent Skills Built-in skills that teach coding assistants (Cursor, Copilot) to write and debug sflow YAML
Preflight Validation Container image checks, GPU oversubscription detection, dependency cycle analysis

Production-Ready Samples

Modular workflow samples for LLM inference serving with NVIDIA Dynamo:

Framework Aggregated Disaggregated (P/D) Multi-Node
SGLang Yes Yes Yes
vLLM Yes Yes Yes
TRT-LLM Yes Yes Yes

All frameworks share a common infrastructure layer (etcd, NATS, frontend, nginx) -- only the server task files differ.

Workflow DAG Example

CLI at a Glance

Command Purpose Key Flags
sflow run Execute a workflow --dry-run --tui --set -f (multi-file)
sflow batch Generate sbatch scripts --submit --bulk-input --row
sflow compose Merge multiple YAMLs --resolve --missable-tasks -o
sflow visualize Render DAG graph --format png/svg/mermaid
sflow sample List / copy examples --list -o
sflow skill Export AI agent skills --list -o

Documentation

Full user documentation: https://nvidia.github.io/nv-sflow/

Quickstart

Validate the workflow engine locally (no Slurm required):

uv venv
source .venv/bin/activate
uv pip install "sflow @ git+https://github.com/NVIDIA/nv-sflow.git@main"

sflow run --file examples/local_hello_world.yaml --tui

Minimal workflow:

version: "0.1"

variables:
  WHO:
    description: "who to greet"
    value: Nvidia

workflow:
  name: hello_local
  tasks:
    - name: hello
      script:
        - echo "Hello ${WHO}"

Run a modular multi-file workflow on Slurm:

sflow run \
  -f slurm_config.yaml -f common_workflow.yaml \
  -f sglang/prefill.yaml -f sglang/decode.yaml -f benchmark_aiperf.yaml \
  --missable-tasks agg_server --tui

Export AI agent skills for your IDE:

sflow skill -o .cursor/skills/

Development Setup

Prerequisites

  • Python 3.10 or higher

  • uv (Python package installer and resolver)

    curl -LsSf https://astral.sh/uv/install.sh | sh

Install in Development Mode

git clone https://github.com/NVIDIA/nv-sflow.git
cd nv-sflow
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
pytest

Contributing

Please see CONTRIBUTING.md for details on how to contribute to this project.

License

This project is licensed under the Apache License 2.0.

About

A Python CLI workflow orchestrator with pluggable backends (e.g. local, Slurm) for running declarative YAML DAGs, collecting logs, and organizing outputs consistently.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages