Skip to content

[UR][L0v2] Add graph support for batched queue#21324

Draft
KFilipek wants to merge 4 commits intointel:syclfrom
KFilipek:07-graph_for_batched
Draft

[UR][L0v2] Add graph support for batched queue#21324
KFilipek wants to merge 4 commits intointel:syclfrom
KFilipek:07-graph_for_batched

Conversation

@KFilipek
Copy link
Contributor

This PR adds support for graph capture and execution in the Level Zero v2 batched queue implementation.

Changes:

  • Add command list determination mechanism that switches between immediate and regular command lists based on graph
    capture state
  • Implement previously unsupported graph API methods:
    • queueBeginGraphCapteExp() - begin graph capture
    • queueBeginCapteIntoGraphExp() - begin capture into existing graph
    • queueEndGraphCapteExp() - end graph capture
    • queueIsGraphCapteEnabledExp() - check capture status
    • enqueueGraphExp() - execute captured graph
  • Update operations to use appropriate command list and event pool during graph capture

@KFilipek KFilipek requested a review from a team as a code owner February 19, 2026 13:07
@KFilipek KFilipek marked this pull request as draft February 19, 2026 13:07
@KFilipek KFilipek self-assigned this Feb 19, 2026
@KFilipek KFilipek force-pushed the 07-graph_for_batched branch from 560edb4 to 7c9779b Compare February 19, 2026 13:12
.borrow(hDevice->Id.value(), eventFlags);
}

ur_event_handle_t ur_queue_batched_t::createEventIfRequestedRegular(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should no longer be createEventIfRequestedRegular since there's an event_pool param that decides the origin of created event.

ur_event_handle_t ur_queue_batched_t::createEventAndRetainRegular(
ur_event_handle_t *phEvent, ur_event_generation_t batch_generation) {
auto hEvent = eventPoolRegular->allocate();
event_pool *pool, ur_event_handle_t *phEvent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.


ur_result_t queueBeginGraphCapteExp() override {
return UR_RESULT_ERROR_UNSUPPORTED_FEATURE;
ur_result_t enqueueGraphExp(ur_exp_executable_graph_handle_t hGraph,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only helper methods are defined here. Implementations of the enqueue operation for batched queue are stored in the queue_batched.cpp.


markIssuedCommandInBatch(currentRegular);

UR_CALL(currentRegular->getActiveBatch().appendKernelLaunch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getActiveBatch is no longed used anywhere.

@@ -189,14 +195,19 @@

TRACK_SCOPE_LATENCY("ur_queue_batched_t::enqueueKernelLaunch");
auto currentRegular = currentCmdLists.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lockedBatch is more fitting than currentRegular.

TRACK_SCOPE_LATENCY("ur_queue_batched_t::enqueueKernelLaunch");
auto currentRegular = currentCmdLists.lock();
auto &eventPool =
currentRegular->isGraphCapture() ? eventPoolImmediate : eventPoolRegular;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also move the event pool choosing logic to a helper method getEventPool or something, like you did for getListManager. Perhaps there could also be getCurrentGeneration helper method in batched queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a helper method for generating events, since the event pool is needed only to create an event to be passed as an argument for command list manager functions.

Such a helper could look like this:

ur_event_t ur_queue_batched_t::getEvent(ur_event_handle_t *phEvent, ur_event_generation_t batch_generation) {
if (isGraphCapture()) {
  return createEventIfRequestedRegular(phEvent, batch_generation);
} else {
  return createEventIfRequested(eventPoolImmediate.get(), phEvent, this); 
}

Where createEventIfRequested(...) is a default implementation from event_pool.hpp . There should also be a version with retaining events.

This way, enqueueing operations is simplified to:

UR_CALL(currentRegular->getListManager().appendKernelLaunch(
      hKernel, workDim, pGlobalWorkOffset, pGlobalWorkSize, pLocalWorkSize,
      launchPropList, waitListView,
      getEvent(phEvent, currentRegular->getCurrentGeneration())));

// capture. If the graph capture is active, the immediate command list manager
// handle is returned, otherwise the regular command list manager handle is
// returned.
ur_command_list_manager &getListManager() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment on why we need an immediate list when the graph capture is active

TRACK_SCOPE_LATENCY("ur_queue_batched_t::enqueueKernelLaunch");
auto currentRegular = currentCmdLists.lock();
auto &eventPool =
currentRegular->isGraphCapture() ? eventPoolImmediate : eventPoolRegular;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a helper method for generating events, since the event pool is needed only to create an event to be passed as an argument for command list manager functions.

Such a helper could look like this:

ur_event_t ur_queue_batched_t::getEvent(ur_event_handle_t *phEvent, ur_event_generation_t batch_generation) {
if (isGraphCapture()) {
  return createEventIfRequestedRegular(phEvent, batch_generation);
} else {
  return createEventIfRequested(eventPoolImmediate.get(), phEvent, this); 
}

Where createEventIfRequested(...) is a default implementation from event_pool.hpp . There should also be a version with retaining events.

This way, enqueueing operations is simplified to:

UR_CALL(currentRegular->getListManager().appendKernelLaunch(
      hKernel, workDim, pGlobalWorkOffset, pGlobalWorkSize, pLocalWorkSize,
      launchPropList, waitListView,
      getEvent(phEvent, currentRegular->getCurrentGeneration())));

ur_event_handle_t
createEventAndRetainRegular(ur_event_handle_t *phEvent,
ur_event_generation_t batch_generation);
ur_event_handle_t createEventAndRetainRegular(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this patch, this function is not used anywhere, so it could be removed.

return maxNumberOfEnqueuedOperations <= enqueuedOperationsCounter;
}

bool isGraphCapture() const { return isGraphCaptureActive; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe isGraphCaptureActive()?

batchLocked->isGraphCapture() ? eventPoolImmediate : eventPoolRegular;
return batchLocked->getListManager().appendGraph(
hGraph, waitListView,
createEventIfRequestedRegular(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work when graph capturing is not active? Is it possible to enqueue a graph on a regular command list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to enqueue a graph on a regular command list?

Not sure, let's test it. What definitely doesn't work, is enqueuing a command buffer (i.e., another command list) on a regular command list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants