Skip to content

[Call for Contribution]: Decouple TransferTask from TE in Mooncake Store #1163

@alogfans

Description

@alogfans

Describe your feature request

We are refactoring Mooncake Store to support multiple transfer engines in a clean and pluggable way.

Today, Store's TransferTask and TransferSubmitter are tightly coupled to a single C++ TransferEngine class. As we push TENT into the mainline and consider integrations with NIXL and other external transport libraries, we want Mooncake Storeo depend on a generic transfer interface instead of a concrete TE implementation.

Concretely, we already have a working prototype in our WIP branch. However, all of this is wired directly against a TE& engine_ reference and has several #ifdef MOONCAKE_USE_V1 branches.

🎯 Goal

Decouple TransferTask / TransferSubmitter from the concrete TE class by introducing a pluggable ITransferEngine interface and a small factory that can bind TE, TENT, or NIXL at runtime (or configuration time).

In other words:

Store should only know about "a transfer engine" and not care whether that engine is TE, TENT, or NIXL.

🛠️ Suggested Design (AI-assisted)

1. Define a minimal ITransferEngine interface

Capture only what Store needs, e.g.:

class ITransferEngine {
public:
    using BatchID = uint64_t;
    using SegmentHandle = uint64_t;

    virtual ~ITransferEngine() = default;

    virtual BatchID allocateBatch(size_t batch_size) = 0;
    virtual Status submitTransfer(BatchID batch_id,
                                  const std::vector<Request>& requests) = 0;
    virtual void freeBatch(BatchID batch_id) = 0;

    virtual Status getTransferStatus(BatchID batch_id,
                                     size_t task_index,
                                     TStatus& out_status) = 0;

    virtual SegmentHandle openSegment(const std::string& endpoint) = 0;

    // Used to detect "local" transfers for memcpy optimization.
    virtual std::string localEndpoint() const = 0;
};

2. Implement adapters for existing engines

  • TransferEngineAdapter wrapping the current TE class

    • Hides MOONCAKE_USE_V1 differences (allocateBatch vs allocateBatchID, freeBatch vs freeBatchID, getSegmentName vs getLocalIpAndPort, etc.)
  • TentEngineAdapter wrapping TENT’s C++ API

  • Optionally: NixlEngineAdapter wrapping a NIXL-like library

All of the #ifdef logic should live inside these adapters, not inside Store.

3. Refactor TransferEngineOperationState and TransferSubmitter

Change them to depend on ITransferEngine& instead of TE&:

  • Replace all direct TE calls with ITransferEngine methods.

  • Remove TE-specific #ifdef branches from:

    • submitTransfer()
    • submitTransferEngineOperation()
    • isLocalTransfer()
    • TransferEngineOperationState::check_task_status(), destructor, etc.

The goal is that transfer_task.cc does not include TE headers at all.

4. Introduce a simple factory for binding engines

Add a small factory that creates the appropriate ITransferEngine based on configuration (YAML, flags, or environment variables):

struct TransferEngineConfig {
    std::string backend;  // "te", "tent", "nixl", ...
};

class TransferEngineFactory {
public:
    static std::unique_ptr<ITransferEngine> Create(
        const TransferEngineConfig& cfg);
};

Store (or the higher-level runtime) can then do:

auto engine = TransferEngineFactory::Create(cfg);
TransferSubmitter submitter(*engine, backend, &metrics);

✅ What’s Already There

  • A working TransferSubmitter with:

    • Async memcpy (MemcpyWorkerPool)
    • Async disk reads (FilereadWorkerPool)
    • TE-based batch submission and status polling
  • A compile-time switchable TE alias (v1/v2) that we want to wrap behind
    ITransferEngine.

🤝 How to Join

If this sounds interesting, we’d love your help:

  1. Comment on this issue / start a discussion to claim a sub-task:
    • Interface design
    • Store refactor
    • Integration test
  2. We’re happy to review design sketches or partial PRs.
  3. Submit PRs referencing this Call for Contribution.

This refactor is a key step toward making Mooncake Store a truly transport-agnostic, future-proof KVCache backend that can run on TE, TENT,
NIXL, and beyond.

Before submitting a new issue...

  • Make sure you already searched for relevant issues and read the documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions