-
Notifications
You must be signed in to change notification settings - Fork 127
cl_mobileye_reservation_sets #1463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add type cl_mutable_dispatch_promises_khr and its possible values
Co-authored-by: Ewan Crawford <ewan.cr@gmail.com>
Co-authored-by: Ewan Crawford <ewan.cr@gmail.com>
Co-authored-by: Sun Serega <sunserega2@gmail.com>
Co-authored-by: Sun Serega <sunserega2@gmail.com>
replace error with undefined behavior
Co-authored-by: Ewan Crawford <ewan.cr@gmail.com>
Co-authored-by: Sun Serega <sunserega2@gmail.com>
Delete obsolete comment in cl_khr_semaphore. Issue
…#991) * Use hexapdf instead of ghostscript for PDF optimization Resulting PDFs tend to be considerably smaller, and also runs about 15% faster when doing a full PDF build (2:39 vs. 3:06 on my machine). The hexapdf tool does need to be installed in the build environment - it is in the khronosgroup/docker-images:asciidoctor-spec Docker image. * Add hexapdf to Travis environment.
…hronosGroup#996) * cl_khr_semaphore: Enforce one device semaphores (KhronosGroup#973) Only permit semaphores to be associated with a single device. Add an error code for invalid use. * Changes wording according to review comments * Change error code to CL_INVALID_PROPERTY if a context is multi-device, and no device is specified.
Since the layers spec is not published in the OpenCL extension spec and is instead published on the OpenCL registry similar to EXT and vendor extensions, it makes more sense to put it in the extensions directory.
* cl_semaphore_khr: Query if semaphore is exportable Add query to clGetSemaphoreInfoKHR that returns CL_TRUE if a semaphore is exportable. * Change extension version to 0.9.1 * Add missing brackets around return types.
The default behavior when the device handle list is not specified is now properly described, so the TODO comment can be removed.
Change-Id: I942c3ce47284e7aea93edc02cf0f327af95e4ed9
…sGroup#1245) * rephrase and correct the descriptions for clSetKernelExecInfo * further wordsmithing clarify that CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM does not affect kernel arguments * fix typo * simplify CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM
* initial change log for changes to 3.0.16 * update with last-minute changes * add named NT handles to external memory in addition to semaphores
…Group#1285) Relates to KhronosGroup#1280 Change-Id: I66b553b4708b913a219d803e45ab7dd6cfb8fe93 Signed-off-by: Kevin Petit <kevin.petit@arm.com>
* cl_khr_command_buffer sync-point capacity Document the behaviour when command-buffer command capacity is reached, and track under "Issues" the possible future use-cases for being able to optimize based on the capacity of a command-buffer. Closes KhronosGroup#844 * Refine out-of-order command-buffer enqueue wording * Update api/cl_khr_command_buffer.asciidoc Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> --------- Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
* document img 1x2_2x2 matmul functions. * Correct the description of matmul with saturation. * Address review comments.
Reserving OpenCL Enumerant range for vendor Mastiṣka AI
…hronosGroup#1295) * Integrate cl_ext_image_requirements_info into unified specification Signed-off-by: Kevin Petit <kevin.petit@arm.com> Change-Id: Ia249f78aa521a8d202dfafbb736c9887574e56f9 * Update api/opencl_runtime_layer.asciidoc Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> * Update api/opencl_runtime_layer.asciidoc Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> --------- Signed-off-by: Kevin Petit <kevin.petit@arm.com> Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
This change introduces a new device query related to the command-buffer extension - `CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR`. This is different from `CL_DEVICE_COMMAND_BUFFER_REQUIRED_QUEUE_PROPERTIES_KHR`, as we want to convey to the user that an implementation supports using a queue property with a command-buffer, but is not *required* to use the property. This supersedes reporting queue related values from the `CL_DEVICE_COMMAND_BUFFER_CAPABILITIES_KHR` query. The flaw with `CL_DEVICE_COMMAND_BUFFER_CAPABILITIES_KHR` is that it contains bits explicitly added by the command-buffer extension for reporting support for queue properties. This is a brittle design, as any new queue property added in future would need to have a new bit added here in the command-buffer extension to report support when used with command-buffers. Instead a better design is to have a new query reporting queue properties supported, `CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR`, and keeping `CL_DEVICE_COMMAND_BUFFER_CAPABILITIES_KHR` for capabilities unrelated to the command-queue properties. The `CL_COMMAND_BUFFER_CAPABILITY_OUT_OF_ORDER_KHR` use-case can now be covered by returning `CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE` from `CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR`, so it is removed.
…rocal (KhronosGroup#1293) * OpenCL C: Update ULP requirements for half-precision divide and reciprocal Update ULP requirements for these builtins to 1.0 as per discussion on KhronosGroup#1278 * Restrict update to divide for now * Update SPIR-V environment spec to set fp-16 divide ULP to 1.0 * relax reciprocal ULP requirement also --------- Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
1. Add missing newline between double and half versions of 'ldexp' 2. Fix formatting of snippets in 'mix'
…sGroup#1299) * Integrate cl_ext_image_from_buffer into unified specification Also add version notes for all CL_IMAGE_REQUIREMENTS_* enums. Signed-off-by: Kevin Petit <kevin.petit@arm.com> Change-Id: I7b7b093034121a9215786beff7318b18e7d0c24a * Update api/opencl_runtime_layer.asciidoc Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> * Update api/footnotes.asciidoc Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> * Update api/cl_ext_image_from_buffer.asciidoc --------- Signed-off-by: Kevin Petit <kevin.petit@arm.com> Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
* Refactor command-buffer queue compatability As proposed in KhronosGroup#1142 the PR changes the semantics of the command-queues parameters used for command-buffer creation and enqueue. The queues used on command-buffer creation now only inform the device and dependencies of commands, rather than restricting the properties set on the queues used for command-buffer enqueue. This is based ontop on the change in KhronosGroup#850 to add supported queue property semantics. * Address review feedback Clarify wording around default list of command-queues used for command-buffer enqueue. * Update XML version
* cl_ext_buffer_device_address
The basic cl_mem buffer API doesn't enable access to the underlying raw
pointers in the device memory, preventing its use in host side
data structures that need pointer references to objects. This API
adds a minimal increment on top of cl_mem that provides such
capabilities.
* BDA: Removed CL_MEM_DEVICE_SHARED_ADDRESS_EXT as unneeded.
Also made the enums globally unique.
* cl_ext_buffer_device_address to 1.0.0
The only content addition since the previous version is
"If the device supports SVM and {clCreateBufferWithProperties} is
called with a pointer returned by {clSVMAlloc} as its _host_ptr_
argument, and {CL_MEM_USE_HOST_PTR} is set in its _flags_ argument,
the device-side address is guaranteed to match the _host_ptr."
* cl_ext_buffer_device_address: Revision 1.0.1
* Made it explicit that passing illegal pointers is legal as long
as they are not referenced.
* Removed CL_INVALID_ARG_VALUE as a possible error in
clSetKernelArgDevicePointerEXT() as there are no illegal pointer
cases when calling this function. Return CL_INVALID_OPERATION for
clGetMemObjectInfo() if the pointer is not a buffer device
pointer.
* clSetKernelExecInfo() and clSetKernelArgDevicePointerEXT() now
only error out if no devices in the context associated with kernel
support device pointers.
* cl_ext_buffer_device_address: Revision 1.0.2
Converted the clSetKernelArgDevicePointerEXT() address parameter to
a value instead of a pointer to the value.
…hronosGroup#1318) Clarifies the unsafe math accuracy requirements for divide, reciprocal, exp, and exp2 for the embedded profile. The previous wording was ambiguous, and could have been interpreted to mean that the embedded profile had stricter accuracy requirements than the full profile.
The intent behind "resolvable at compile time" is that format cannot be a kernel argument, i.e. it can only be a string literal.
Add cl_mobileye_reservation_sets extension and related API definitions
|
|
|
You may want to rebase so this PR only includes your changes. It looks like it picked up a bunch of other commits, which is confusing the CLA bot. There is a merge conflict that needs to be resolved now, also. Thanks! |
…zes; remove unused API type
|
Thanks for working on this. Did you consider improving/extending the current pipe spec instead? We wrote down some thoughts about it in this IWOCL publication. |
EwanC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on pipes, but left a couple of minor command-buffer related thoughts.
| * `CL_INVALID_COMMAND_BUFFER_KHR` if _command_buffer_ is not a valid command | ||
| -buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think it's a bit confusing using CL_INVALID_COMMAND_BUFFER_KHR for the error code with reservation set retain/release. You seem to be using CL_INVALID_RESERVATION_SET_MOBILEYE for a new clCommandNDRangeKernelReservationSetMOBILEYE error, so you could use that here too.
Also here "not a valid command-buffer" -> "not a valid reservation-set object"
| -- | ||
| To record an ND-range kernel command into a reservation-set, call the function: | ||
| include::{generated}/api/protos/clCommandNDRangeKernelReservationSetMOBILEYE.txt[] | ||
| include::{generated}/api/version-notes/clCommandNDRangeKernelReservationSetMOBILEYE.asciidoc[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, but instead of defining a new entry-point you could define a new cl_command_properties_khr property for clCommandNDRangeKernelKHR that lets the user specify the reservation set. Then rather than having to copy all the spec language for clCommandNDRangeKernelKHR and keep it up to date with any changes to clCommandNDRangeKernelKHR, you would automatically be in sync.
| function is executed successfully. Otherwise, it returns the errors defined by | ||
| *clEnqueueNDRangeKernel* except:: | ||
|
|
||
| * `CL_INVALID_RESERVATION_SET_MOBILEYE` if _reservation_set_ is not a valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should grab a new enum value for this in the xml
Line 788 in 696e820
| <enums start="-1154" end="-9999" name="ErrorCodes.future" vendor="Khronos" comment="RESERVED FOR FUTURE ALLOCATIONS BY KHRONOS"> |
| include::{generated}/api/version-notes/clCreateReservationSetMOBILEYE.asciidoc[] | ||
|
|
||
| * _command_buffer_ is a valid command buffer object. | ||
| * _affinity_domain_ is the level of the cache hierarchy that the work-groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is useful. Isn't it implicit from declaring kernels to the reservation set that the implementation wants to maximize locality between the connected work-groups? What if the implementation has, say, too small L2 but large enough L3. Or doesn't even have dynamic caches but pieces of scratch pad memory? Should it fail in that case when the affinity is defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation doesn't know if a certain cache level is big enough for the tiles used by the kernels; it's determined by the kernels' logic.
If the platform doesn't have the desired cache level, the API should return an error. Also, when recording an ND-range kernel to a device that doesn't share this cache level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of portability it might be a good idea to consider add a bit more freedom, perhaps by making it a hint, or at least by providing a default value which gives the implementation the freedom. AFAIU, the most essential information is that that there are kernels collaborating and should concurrently execute, thus the implementation should use this information for optimization and place them close to each other in terms of the memory hierarchy. In fact, I think the kernels have to make concurrent progress otherwise deadlocks might occur, right?
| endif::cl_khr_command_buffer_mutable_dispatch[] | ||
| -- | ||
|
|
||
| [open,refpage='clCreateReservationSetPipeMOBILEYE',desc='Create a reservation-set pipe.',type='protos'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a sake of avoiding more spec fragmentation and reusing good spec groundwork, I strongly suggest to expand the old pipe spec with the concept of that if they are used to connect kernels in a reservation set, they get one instance for each concurrently executing work-group. A new flag in pipe creation could do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a similar "not used outside a command buffer" property to the pipe as we've outlined here to the basic buffers to cover the problem of needing to preserve the state outside the command buffer.
|
|
||
| All ND-range kernel commands inside a reservation set must have the same number | ||
| of dimensions and the same number of work-groups in each dimension. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concurrency semantics need to be defined a bit clearer: What is expected to make progress concurrently? If WG 0 of kernel A starts running, also WG 0 of all other kernels in the set should make progress or just those that are connected with a pipe to declare a producer-consumer relation?
|
Could we represent the concept of "reservation sets" using command buffers or command queues with a special property flag that says the kernels enqueued there should make concurrent progress or similar? I'm a big fan of trying to reduce API fragmentation. |
Add cl_mobileye_reservation_sets extension and related API definitions