-
Notifications
You must be signed in to change notification settings - Fork 654
docs: Add comprehensive FFI architecture documentation #2648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e7a3014 to
e3d6061
Compare
Restructure architecture documentation to clearly separate the two runtime implementations (legacy Python/FastAPI and experimental Rust/PyO3 FFI). Changes: - Move existing runtime docs to architecture/legacy/ - Create new architecture/ffi/ with comprehensive FFI documentation - Add README files for both runtime implementations - Update architecture/00-overview.md to explain both runtimes - Document FFI components: PredictionService, Supervisor, PermitPool - Document IPC protocol: control channel and slot channels - Document health state machine and prediction flows - Add migration notes and performance comparisons - Update docs/http.md and README.md to mention USE_COGLET option The FFI runtime provides significant improvements: faster HTTP layer, better worker isolation, slot-based concurrency, and automatic cancellation on connection drops.
ddb96fb to
09d4189
Compare
- Fix cancellation propagation to show sync (SIGUSR1 + CancelableGuard) vs async (future.cancel()) mechanisms instead of incorrect cancel_token - Fix control channel protocol messages and fields - Fix file structure (worker.rs not worker/ subdirectory) - Remove incorrect cancel_token from component ownership
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
|
|
||
| | Variable | Default | Purpose | | ||
| |----------|---------|---------| | ||
| | `USE_COGLET` | unset | Enable FFI runtime (set to any value) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the case? There was a need to specify the COGLET_RUST_WHEEL as well (at one point) to load into the container. Easy enoujgh to guarantee in the near term though, so fair to always install it once we do our first release and then toggle.
|
|
||
| ⚠️ **Behavioral differences**: | ||
| - Sync predictions cancel on connection drop | ||
| - 409 responses when at capacity (not queuing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cog queued? oof that's not a great behavior IMO. 409 is much better [though TBH 429 feels more correct?] Nothing to change here.
|
|
||
| | State | HTTP Behavior | Meaning | | ||
| |-------|--------------|---------| | ||
| | `STARTING` | 503 Service Unavailable | Worker subprocess initializing, `setup()` running | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we walked this back to a 200 but status STARTING... we should confirm these states are correct and match cog's behavior in rust.
| - **Isolation**: Python crashes/segfaults don't kill server | ||
| - **CUDA context**: Clean GPU initialization per worker | ||
| - **Memory**: Fresh address space for model loading | ||
| - **Restart**: Server can restart worker on fatal errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future (maybe) behavior.
Summary
This PR reorganizes and updates documentation for the FFI branch, including:
Changes
1. Architecture Documentation Restructure
Reorganized
architecture/directory to clearly separate the two runtime implementations:New FFI Documentation
Created comprehensive documentation for the FFI runtime:
crates/coglet/src/componentsUSE_COGLETtriggers FFI runtimeFFI-Specific Behaviors Documented
✅ Connection Drop Handling: Sync predictions automatically cancel when client disconnects (via RAII
SyncPredictionGuard)✅ Enhanced Health States: More granular health reporting (STARTING, READY, BUSY, SETUP_FAILED, DEFUNCT)
✅ Backpressure: Returns 409 Conflict when all slots occupied (instead of queuing)
✅ Slot-Based Concurrency: Predictable resource management with PermitPool
✅ Worker Isolation: Python crashes don't kill the server (marks health as DEFUNCT)
Architecture Diagrams
Incorporated all diagrams from PR #2641 including:
Updated Existing Documentation
architecture/00-overview.md: Added section explaining both runtime implementations with clear navigationdocs/http.md: Added note aboutUSE_COGLETenvironment variable to enable FFI runtimeREADME.md: Updated to mention both runtime implementations2. Tool Management with mise
Made mise the default and recommended way to install all development tools.
Changes
rust = "latest"to managed toolsscript/setupTools Managed by mise
Benefits
✅ Single Source of Truth: One command (
script/setup) installs everything✅ Consistent Environments: Lockfile ensures same versions across dev/CI
✅ Easier Onboarding: No manual Go/Rust/Python installation needed
✅ No Conflicts: Isolated from system installations
✅ Version Pinning: Easy to pin specific versions when needed
Developer Experience
Developers can now simply run:
script/setup # Installs all tools via mise + sets up Python venvAnd they're ready to contribute with Go, Rust, and Python all configured!
Testing
mise install rustsuccessfully installs Rust 1.93.0mise exec -- cargo buildcompiles successfully with mise-managed Rustmise lsRelated
This PR is meant to be merged into the FFI branch (#2641) to ensure the documentation and development tooling are up-to-date when FFI becomes the default runtime.
Commits
docs: reorganize architecture docs for FFI and legacy runtimes- Architecture documentation restructurechore: make mise the default tool manager for all development tools- Tool management improvements