Skip to content

Conversation

@markphelps
Copy link
Contributor

@markphelps markphelps commented Jan 22, 2026

Summary

This PR reorganizes and updates documentation for the FFI branch, including:

  1. Architecture documentation - Comprehensive docs for both legacy and FFI runtimes
  2. Tool management - Make mise the default way to install all development tools (Go, Rust, Python)

Changes

1. Architecture Documentation Restructure

Reorganized architecture/ directory to clearly separate the two runtime implementations:

architecture/
├── 00-overview.md              # Updated to explain both runtimes
├── legacy/                     # Python/FastAPI runtime (current default)
│   ├── README.md
│   ├── 03-prediction-api.md
│   └── 04-container-runtime.md
├── ffi/                        # Rust/PyO3 runtime (experimental)
│   ├── README.md
│   ├── 03-prediction-api.md
│   └── 04-container-runtime.md
└── [shared docs: 01-model-source, 02-schema, 05-build-system, 06-cli]

New FFI Documentation

Created comprehensive documentation for the FFI runtime:

  • Component Architecture: PredictionService, PredictionSupervisor, PermitPool, Orchestrator
  • Component Ownership: RAII patterns, PredictionSlot, PredictionHandle, SyncPredictionGuard
  • IPC Protocol: Control channel (stdin/stdout) and Slot channels (Unix sockets) with JSON framing
  • Health State Machine: STARTING → READY ↔ BUSY → DEFUNCT states
  • Prediction Flows: Detailed sequence diagrams for:
    • Sync predictions with automatic cancellation on connection drop
    • Async predictions with background processing
    • Idempotent PUT with concurrent-safe DashMap
    • Cancellation propagation via cancel tokens
  • File Structure: Complete mapping of crates/coglet/src/ components
  • Invocation Path: How USE_COGLET triggers FFI runtime
  • Performance Comparisons: Legacy vs FFI benchmarks and characteristics
  • Migration Notes: Behavioral differences and improvements

FFI-Specific Behaviors Documented

Connection Drop Handling: Sync predictions automatically cancel when client disconnects (via RAII SyncPredictionGuard)

Enhanced Health States: More granular health reporting (STARTING, READY, BUSY, SETUP_FAILED, DEFUNCT)

Backpressure: Returns 409 Conflict when all slots occupied (instead of queuing)

Slot-Based Concurrency: Predictable resource management with PermitPool

Worker Isolation: Python crashes don't kill the server (marks health as DEFUNCT)

Architecture Diagrams

Incorporated all diagrams from PR #2641 including:

  • Component ownership model
  • Worker subprocess protocol
  • Health state machine
  • Complete prediction flows
  • Invocation path

Updated Existing Documentation

  • architecture/00-overview.md: Added section explaining both runtime implementations with clear navigation
  • docs/http.md: Added note about USE_COGLET environment variable to enable FFI runtime
  • README.md: Updated to mention both runtime implementations

2. Tool Management with mise

Made mise the default and recommended way to install all development tools.

Changes

  • mise.toml: Added rust = "latest" to managed tools
  • mise.lock: Added Rust 1.93.0 to lockfile
  • CONTRIBUTING.md:
    • Updated to clarify mise manages all tools (Go, Rust, Python/uv, cargo tools)
    • Added comprehensive "Tool management with mise" section
    • Documented common mise commands
    • Added migration guide from rustup to mise
  • AGENTS.md: Added "Tool Management" subsection documenting mise usage
  • README.md: Updated build-from-source instructions to mention script/setup

Tools Managed by mise

  • Go (latest) - CLI and build system
  • Rust (latest) - FFI runtime (coglet) ← NEW
  • uv (latest) - Python package manager
  • cargo-binstall - Fast binary installer
  • cargo:cargo-deny - License/advisory checking
  • cargo:cargo-insta - Snapshot testing
  • cargo:cargo-nextest - Next-gen test runner
  • cargo:maturin - Build Python wheels from Rust
  • ruff - Python linter/formatter
  • ty - Type checker

Benefits

Single Source of Truth: One command (script/setup) installs everything
Consistent Environments: Lockfile ensures same versions across dev/CI
Easier Onboarding: No manual Go/Rust/Python installation needed
No Conflicts: Isolated from system installations
Version Pinning: Easy to pin specific versions when needed

Developer Experience

Developers can now simply run:

script/setup        # Installs all tools via mise + sets up Python venv

And they're ready to contribute with Go, Rust, and Python all configured!

Testing

  • All documentation links verified
  • Mermaid diagrams render correctly
  • Code references match actual file paths
  • Consistent formatting and structure
  • mise install rust successfully installs Rust 1.93.0
  • mise exec -- cargo build compiles successfully with mise-managed Rust
  • All mise-managed tools show in mise ls

Related

This PR is meant to be merged into the FFI branch (#2641) to ensure the documentation and development tooling are up-to-date when FFI becomes the default runtime.

Commits

  1. docs: reorganize architecture docs for FFI and legacy runtimes - Architecture documentation restructure
  2. chore: make mise the default tool manager for all development tools - Tool management improvements

@markphelps markphelps marked this pull request as ready for review January 22, 2026 15:10
@markphelps markphelps requested a review from a team as a code owner January 22, 2026 15:10
@mfainberg-cf mfainberg-cf force-pushed the FFI branch 2 times, most recently from e7a3014 to e3d6061 Compare January 23, 2026 05:08
Base automatically changed from FFI to main January 23, 2026 14:26
Restructure architecture documentation to clearly separate the two runtime
implementations (legacy Python/FastAPI and experimental Rust/PyO3 FFI).

Changes:
- Move existing runtime docs to architecture/legacy/
- Create new architecture/ffi/ with comprehensive FFI documentation
- Add README files for both runtime implementations
- Update architecture/00-overview.md to explain both runtimes
- Document FFI components: PredictionService, Supervisor, PermitPool
- Document IPC protocol: control channel and slot channels
- Document health state machine and prediction flows
- Add migration notes and performance comparisons
- Update docs/http.md and README.md to mention USE_COGLET option

The FFI runtime provides significant improvements: faster HTTP layer,
better worker isolation, slot-based concurrency, and automatic
cancellation on connection drops.
- Fix cancellation propagation to show sync (SIGUSR1 + CancelableGuard) vs
  async (future.cancel()) mechanisms instead of incorrect cancel_token
- Fix control channel protocol messages and fields
- Fix file structure (worker.rs not worker/ subdirectory)
- Remove incorrect cancel_token from component ownership
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>

| Variable | Default | Purpose |
|----------|---------|---------|
| `USE_COGLET` | unset | Enable FFI runtime (set to any value) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the case? There was a need to specify the COGLET_RUST_WHEEL as well (at one point) to load into the container. Easy enoujgh to guarantee in the near term though, so fair to always install it once we do our first release and then toggle.


⚠️ **Behavioral differences**:
- Sync predictions cancel on connection drop
- 409 responses when at capacity (not queuing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cog queued? oof that's not a great behavior IMO. 409 is much better [though TBH 429 feels more correct?] Nothing to change here.


| State | HTTP Behavior | Meaning |
|-------|--------------|---------|
| `STARTING` | 503 Service Unavailable | Worker subprocess initializing, `setup()` running |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we walked this back to a 200 but status STARTING... we should confirm these states are correct and match cog's behavior in rust.

- **Isolation**: Python crashes/segfaults don't kill server
- **CUDA context**: Clean GPU initialization per worker
- **Memory**: Fresh address space for model loading
- **Restart**: Server can restart worker on fatal errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future (maybe) behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants