Skip to content

Conversation

@aharon-abramson
Copy link
Contributor

Add cl_mobileye_reservation_sets extension and related API definitions

aharon-abramson and others added 30 commits November 6, 2023 14:53
Add type cl_mutable_dispatch_promises_khr and its possible values
Co-authored-by: Ewan Crawford <ewan.cr@gmail.com>
Co-authored-by: Ewan Crawford <ewan.cr@gmail.com>
Co-authored-by: Sun Serega <sunserega2@gmail.com>
Co-authored-by: Sun Serega <sunserega2@gmail.com>
replace error with undefined behavior
Co-authored-by: Sun Serega <sunserega2@gmail.com>
Co-authored-by: Ewan Crawford <ewan.cr@gmail.com>
Co-authored-by: Sun Serega <sunserega2@gmail.com>
Delete obsolete comment in cl_khr_semaphore.  Issue
…#991)

* Use hexapdf instead of ghostscript for PDF optimization

Resulting PDFs tend to be considerably smaller, and also runs about 15%
faster when doing a full PDF build (2:39 vs. 3:06 on my machine).

The hexapdf tool does need to be installed in the build environment - it
is in the khronosgroup/docker-images:asciidoctor-spec Docker image.

* Add hexapdf to Travis environment.
…hronosGroup#996)

* cl_khr_semaphore: Enforce one device semaphores (KhronosGroup#973)

Only permit semaphores to be associated with a single device.  Add an error code for invalid use.

* Changes wording according to review comments

* Change error code to CL_INVALID_PROPERTY if a context is multi-device, and no device is specified.
Since the layers spec is not published in the OpenCL extension spec
and is instead published on the OpenCL registry similar to EXT and
vendor extensions, it makes more sense to put it in the extensions
directory.
* cl_semaphore_khr: Query if semaphore is exportable

Add query to clGetSemaphoreInfoKHR that returns CL_TRUE if a semaphore is exportable.

* Change extension version to 0.9.1

* Add missing brackets around return types.
The default behavior when the device handle list is not specified
is now properly described, so the TODO comment can be removed.
Change-Id: I942c3ce47284e7aea93edc02cf0f327af95e4ed9
bashbaug and others added 22 commits February 27, 2025 10:37
…sGroup#1245)

* rephrase and correct the descriptions for clSetKernelExecInfo

* further wordsmithing
clarify that CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM does not
affect kernel arguments

* fix typo

* simplify CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM
* initial change log for changes to 3.0.16

* update with last-minute changes

* add named NT handles to external memory in addition to semaphores
…Group#1285)

Relates to KhronosGroup#1280


Change-Id: I66b553b4708b913a219d803e45ab7dd6cfb8fe93

Signed-off-by: Kevin Petit <kevin.petit@arm.com>
* cl_khr_command_buffer sync-point capacity

Document the behaviour when command-buffer command capacity
is reached, and track under "Issues" the possible future
use-cases for being able to optimize based on the capacity
of a command-buffer.

Closes KhronosGroup#844

* Refine out-of-order command-buffer enqueue wording

* Update api/cl_khr_command_buffer.asciidoc

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

---------

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
* document img 1x2_2x2 matmul functions.

* Correct the description of matmul with saturation.

* Address review comments.
Reserving OpenCL Enumerant range for vendor Mastiṣka AI
…hronosGroup#1295)

* Integrate cl_ext_image_requirements_info into unified specification

Signed-off-by: Kevin Petit <kevin.petit@arm.com>
Change-Id: Ia249f78aa521a8d202dfafbb736c9887574e56f9

* Update api/opencl_runtime_layer.asciidoc

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

* Update api/opencl_runtime_layer.asciidoc

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

---------

Signed-off-by: Kevin Petit <kevin.petit@arm.com>
Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
This change introduces a new device query related to the
command-buffer extension -
`CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR`.

This is different from
`CL_DEVICE_COMMAND_BUFFER_REQUIRED_QUEUE_PROPERTIES_KHR`, as we want
to convey to the user that an implementation supports using a queue
property with a command-buffer, but is not *required* to use the
property.

This supersedes reporting queue related values from the
`CL_DEVICE_COMMAND_BUFFER_CAPABILITIES_KHR` query. The flaw
with `CL_DEVICE_COMMAND_BUFFER_CAPABILITIES_KHR` is that it contains bits
explicitly added by the command-buffer extension for reporting support for
queue properties. This is a brittle design, as any new queue property added
in future would need to have a new bit added here in the command-buffer extension
to report support when used with command-buffers.

Instead a better design is to have a new query reporting queue
properties supported,
`CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR`, and keeping
`CL_DEVICE_COMMAND_BUFFER_CAPABILITIES_KHR` for capabilities unrelated
to the command-queue properties.

The `CL_COMMAND_BUFFER_CAPABILITY_OUT_OF_ORDER_KHR` use-case can now be
covered by returning
`CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE` from
`CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR`, so it is
removed.
…rocal (KhronosGroup#1293)

* OpenCL C: Update ULP requirements for half-precision divide and reciprocal

Update ULP requirements for these builtins to 1.0 as per discussion
on KhronosGroup#1278

* Restrict update to divide for now

* Update SPIR-V environment spec to set fp-16 divide ULP to 1.0

* relax reciprocal ULP requirement also

---------

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
1. Add missing newline between double and half versions of 'ldexp'
2. Fix formatting of snippets in 'mix'
…sGroup#1299)

* Integrate cl_ext_image_from_buffer into unified specification

Also add version notes for all CL_IMAGE_REQUIREMENTS_* enums.

Signed-off-by: Kevin Petit <kevin.petit@arm.com>
Change-Id: I7b7b093034121a9215786beff7318b18e7d0c24a

* Update api/opencl_runtime_layer.asciidoc

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

* Update api/footnotes.asciidoc

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

* Update api/cl_ext_image_from_buffer.asciidoc

---------

Signed-off-by: Kevin Petit <kevin.petit@arm.com>
Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
* Refactor command-buffer queue compatability

As proposed in KhronosGroup#1142
the PR changes the semantics of the command-queues parameters used for
command-buffer creation and enqueue.

The queues used on command-buffer creation now only inform the
device and dependencies of commands, rather than restricting the
properties set on the queues used for command-buffer enqueue.

This is based ontop on the change in KhronosGroup#850
to add supported queue property semantics.

* Address review feedback

Clarify wording around default list of command-queues used for command-buffer
enqueue.

* Update XML version
* cl_ext_buffer_device_address

The basic cl_mem buffer API doesn't enable access to the underlying raw
pointers in the device memory, preventing its use in host side
data structures that need pointer references to objects. This API
adds a minimal increment on top of cl_mem that provides such
capabilities.

* BDA: Removed CL_MEM_DEVICE_SHARED_ADDRESS_EXT as unneeded.

Also made the enums globally unique.

* cl_ext_buffer_device_address to 1.0.0

The only content addition since the previous version is

"If the device supports SVM and {clCreateBufferWithProperties} is
 called with a pointer returned by {clSVMAlloc} as its _host_ptr_
 argument, and {CL_MEM_USE_HOST_PTR} is set in its _flags_ argument,
 the device-side address is guaranteed to match the _host_ptr."

* cl_ext_buffer_device_address: Revision 1.0.1

* Made it explicit that passing illegal pointers is legal as long
as they are not referenced.
* Removed CL_INVALID_ARG_VALUE as a possible error in
clSetKernelArgDevicePointerEXT() as there are no illegal pointer
cases when calling this function. Return CL_INVALID_OPERATION for
clGetMemObjectInfo() if the pointer is not a buffer device
pointer.
* clSetKernelExecInfo() and clSetKernelArgDevicePointerEXT() now
only error out if no devices in the context associated with kernel
support device pointers.

* cl_ext_buffer_device_address: Revision 1.0.2

Converted the clSetKernelArgDevicePointerEXT() address parameter to
a value instead of a pointer to the value.
…hronosGroup#1318)

Clarifies the unsafe math accuracy requirements for divide, reciprocal,
exp, and exp2 for the embedded profile.  The previous wording was
ambiguous, and could have been interpreted to mean that the embedded
profile had stricter accuracy requirements than the full profile.
The intent behind "resolvable at compile time" is that format cannot be a kernel argument, i.e. it can only be a string literal.
Add cl_mobileye_reservation_sets extension and related API definitions
@CLAassistant
Copy link

CLAassistant commented Sep 17, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 10 committers have signed the CLA.

✅ bashbaug
✅ aharon-abramson
✅ SunSerega
✅ frasercrmck
❌ tomasz-platek
❌ gowtham-sarc
❌ cycheng
❌ ssugumar-mstk
❌ gwawiork
❌ pjaaskel
You have signed the CLA already but the status is still pending? Let us recheck it.

@bashbaug
Copy link
Contributor

You may want to rebase so this PR only includes your changes. It looks like it picked up a bunch of other commits, which is confusing the CLA bot. There is a merge conflict that needs to be resolved now, also. Thanks!

@pjaaskel
Copy link
Contributor

Thanks for working on this. Did you consider improving/extending the current pipe spec instead? We wrote down some thoughts about it in this IWOCL publication.

Copy link
Contributor

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on pipes, but left a couple of minor command-buffer related thoughts.

Comment on lines +267 to +268
* `CL_INVALID_COMMAND_BUFFER_KHR` if _command_buffer_ is not a valid command
-buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it's a bit confusing using CL_INVALID_COMMAND_BUFFER_KHR for the error code with reservation set retain/release. You seem to be using CL_INVALID_RESERVATION_SET_MOBILEYE for a new clCommandNDRangeKernelReservationSetMOBILEYE error, so you could use that here too.

Also here "not a valid command-buffer" -> "not a valid reservation-set object"

--
To record an ND-range kernel command into a reservation-set, call the function:
include::{generated}/api/protos/clCommandNDRangeKernelReservationSetMOBILEYE.txt[]
include::{generated}/api/version-notes/clCommandNDRangeKernelReservationSetMOBILEYE.asciidoc[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, but instead of defining a new entry-point you could define a new cl_command_properties_khr property for clCommandNDRangeKernelKHR that lets the user specify the reservation set. Then rather than having to copy all the spec language for clCommandNDRangeKernelKHR and keep it up to date with any changes to clCommandNDRangeKernelKHR, you would automatically be in sync.

function is executed successfully. Otherwise, it returns the errors defined by
*clEnqueueNDRangeKernel* except::

* `CL_INVALID_RESERVATION_SET_MOBILEYE` if _reservation_set_ is not a valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should grab a new enum value for this in the xml

<enums start="-1154" end="-9999" name="ErrorCodes.future" vendor="Khronos" comment="RESERVED FOR FUTURE ALLOCATIONS BY KHRONOS">

include::{generated}/api/version-notes/clCreateReservationSetMOBILEYE.asciidoc[]

* _command_buffer_ is a valid command buffer object.
* _affinity_domain_ is the level of the cache hierarchy that the work-groups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is useful. Isn't it implicit from declaring kernels to the reservation set that the implementation wants to maximize locality between the connected work-groups? What if the implementation has, say, too small L2 but large enough L3. Or doesn't even have dynamic caches but pieces of scratch pad memory? Should it fail in that case when the affinity is defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation doesn't know if a certain cache level is big enough for the tiles used by the kernels; it's determined by the kernels' logic.
If the platform doesn't have the desired cache level, the API should return an error. Also, when recording an ND-range kernel to a device that doesn't share this cache level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of portability it might be a good idea to consider add a bit more freedom, perhaps by making it a hint, or at least by providing a default value which gives the implementation the freedom. AFAIU, the most essential information is that that there are kernels collaborating and should concurrently execute, thus the implementation should use this information for optimization and place them close to each other in terms of the memory hierarchy. In fact, I think the kernels have to make concurrent progress otherwise deadlocks might occur, right?

endif::cl_khr_command_buffer_mutable_dispatch[]
--

[open,refpage='clCreateReservationSetPipeMOBILEYE',desc='Create a reservation-set pipe.',type='protos']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a sake of avoiding more spec fragmentation and reusing good spec groundwork, I strongly suggest to expand the old pipe spec with the concept of that if they are used to connect kernels in a reservation set, they get one instance for each concurrently executing work-group. A new flag in pipe creation could do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a similar "not used outside a command buffer" property to the pipe as we've outlined here to the basic buffers to cover the problem of needing to preserve the state outside the command buffer.


All ND-range kernel commands inside a reservation set must have the same number
of dimensions and the same number of work-groups in each dimension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concurrency semantics need to be defined a bit clearer: What is expected to make progress concurrently? If WG 0 of kernel A starts running, also WG 0 of all other kernels in the set should make progress or just those that are connected with a pipe to declare a producer-consumer relation?

@pjaaskel
Copy link
Contributor

Could we represent the concept of "reservation sets" using command buffers or command queues with a special property flag that says the kernels enqueued there should make concurrent progress or similar? I'm a big fan of trying to reduce API fragmentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.