-
Notifications
You must be signed in to change notification settings - Fork 459
Description
Describe your feature request
We are refactoring Mooncake Store to support multiple transfer engines in a clean and pluggable way.
Today, Store's TransferTask and TransferSubmitter are tightly coupled to a single C++ TransferEngine class. As we push TENT into the mainline and consider integrations with NIXL and other external transport libraries, we want Mooncake Storeo depend on a generic transfer interface instead of a concrete TE implementation.
Concretely, we already have a working prototype in our WIP branch. However, all of this is wired directly against a TE& engine_ reference and has several #ifdef MOONCAKE_USE_V1 branches.
🎯 Goal
Decouple TransferTask / TransferSubmitter from the concrete TE class by introducing a pluggable ITransferEngine interface and a small factory that can bind TE, TENT, or NIXL at runtime (or configuration time).
In other words:
Store should only know about "a transfer engine" and not care whether that engine is TE, TENT, or NIXL.
🛠️ Suggested Design (AI-assisted)
1. Define a minimal ITransferEngine interface
Capture only what Store needs, e.g.:
class ITransferEngine {
public:
using BatchID = uint64_t;
using SegmentHandle = uint64_t;
virtual ~ITransferEngine() = default;
virtual BatchID allocateBatch(size_t batch_size) = 0;
virtual Status submitTransfer(BatchID batch_id,
const std::vector<Request>& requests) = 0;
virtual void freeBatch(BatchID batch_id) = 0;
virtual Status getTransferStatus(BatchID batch_id,
size_t task_index,
TStatus& out_status) = 0;
virtual SegmentHandle openSegment(const std::string& endpoint) = 0;
// Used to detect "local" transfers for memcpy optimization.
virtual std::string localEndpoint() const = 0;
};2. Implement adapters for existing engines
-
TransferEngineAdapterwrapping the currentTEclass- Hides
MOONCAKE_USE_V1differences (allocateBatchvsallocateBatchID,freeBatchvsfreeBatchID,getSegmentNamevsgetLocalIpAndPort, etc.)
- Hides
-
TentEngineAdapterwrapping TENT’s C++ API -
Optionally:
NixlEngineAdapterwrapping a NIXL-like library
All of the #ifdef logic should live inside these adapters, not inside Store.
3. Refactor TransferEngineOperationState and TransferSubmitter
Change them to depend on ITransferEngine& instead of TE&:
-
Replace all direct TE calls with
ITransferEnginemethods. -
Remove TE-specific
#ifdefbranches from:submitTransfer()submitTransferEngineOperation()isLocalTransfer()TransferEngineOperationState::check_task_status(), destructor, etc.
The goal is that transfer_task.cc does not include TE headers at all.
4. Introduce a simple factory for binding engines
Add a small factory that creates the appropriate ITransferEngine based on configuration (YAML, flags, or environment variables):
struct TransferEngineConfig {
std::string backend; // "te", "tent", "nixl", ...
};
class TransferEngineFactory {
public:
static std::unique_ptr<ITransferEngine> Create(
const TransferEngineConfig& cfg);
};Store (or the higher-level runtime) can then do:
auto engine = TransferEngineFactory::Create(cfg);
TransferSubmitter submitter(*engine, backend, &metrics);✅ What’s Already There
-
A working
TransferSubmitterwith:- Async memcpy (
MemcpyWorkerPool) - Async disk reads (
FilereadWorkerPool) - TE-based batch submission and status polling
- Async memcpy (
-
A compile-time switchable
TEalias (v1/v2) that we want to wrap behind
ITransferEngine.
🤝 How to Join
If this sounds interesting, we’d love your help:
- Comment on this issue / start a discussion to claim a sub-task:
- Interface design
- Store refactor
- Integration test
- We’re happy to review design sketches or partial PRs.
- Submit PRs referencing this Call for Contribution.
This refactor is a key step toward making Mooncake Store a truly transport-agnostic, future-proof KVCache backend that can run on TE, TENT,
NIXL, and beyond.
Before submitting a new issue...
- Make sure you already searched for relevant issues and read the documentation