Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
API Refactoring: Significant changes to the Python API. Previous methods like write_batch and read_batch have been replaced by unified write and read interfaces. Input arguments have shifted from object lists to vectorized parameters (e.g., vector<uintptr_t>) to reduce Python overhead.
Async Pattern Shift: The explicit async_op flag has been removed in favor of a full Future-based pattern. All communication operations (Send/Recv/Read/Write) now return Future objects (e.g., SlimeSendFuture), requiring the user to explicitly call .wait() for synchronization.
🚀 New Features & Core Improvements
Unified Endpoint Architecture:
Introduced the RDMAEndpoint class, which unifies the previously separated RDMAIOEndpoint (for one-sided RDMA) and RDMAMsgEndpoint (for two-sided messaging). Users can now manage multiple communication modes via a single endpoint instance.
Async Worker Runtime:
Implemented RDMAWorker and GlobalWorkerManager. A dedicated background thread now handles Completion Queue (CQ) events without relying on main-thread polling.
Optimized Resource Management:
Introduced GlobalContextManager and pooled RDMAContext management. This supports the reuse of contexts across multiple devices, reducing initialization overhead.
PyTorch Backend Upgrade:
Rewrote csrc/torch/slime_backend.cpp. The PyTorch distributed backend has been migrated to the new Unified Endpoint and Worker architecture, improving efficiency when used as a ProcessGroup.
Enhanced Developer Experience:
Added dlslime/_slime_c.pyi type stubs, significantly improving code completion and type checking in IDEs.
API 重构:Python 侧的 API 发生了重大变化。旧有的 write_batch、read_batch 等方法被替换为统一的 write、read 接口,且参数传递方式由原来的对象列表变为向量化参数(如 vector<uintptr_t>),以减少 Python 层的开销。
异步模式变更:不再通过参数控制 async_op,而是全面转向 Future 模式。所有通信操作(Send/Recv/Read/Write)现在均返回 Future 对象(如 SlimeSendFuture),用户需显式调用 .wait() 来同步操作。
🚀 新特性与核心改进
统一端点架构 (Unified Endpoint):
引入了全新的 RDMAEndpoint 类,将原先分散的 RDMAIOEndpoint(负责单边读写)和 RDMAMsgEndpoint(负责双边消息)进行了统一封装。用户现在可以通过一个端点实例同时管理多种通信模式。
异步 Worker 机制 (Async Runtime):
新增 RDMAWorker 和 GlobalWorkerManager。引入了独立的后台线程来处理完成队列(CQ)事件,不再依赖主线程的轮询。
资源管理优化:
引入 GlobalContextManager 和 RDMAContext 池化管理,支持多设备上下文的复用,减少了资源初始化开销。
PyTorch 后端升级:
重写了 csrc/torch/slime_backend.cpp,将 PyTorch 分布式后端迁移至新的 Unified Endpoint 和 Worker 架构,提升了作为 ProcessGroup 使用时的效率。
开发体验提升:
新增 dlslime/_slime_c.pyi 类型桩文件(Type Stubs),显著改善了 IDE 中的代码补全和类型检查体验。