Skip to content

Releases: IntelPython/dpctl

0.22.1

06 May 19:53
85c52be

Choose a tag to compare

This is a bug-fix release which fixes a memory leak in dpctl.RawKernelArg gh-2294.

0.22.0

06 May 19:52
0b49a4b

Choose a tag to compare

The highlight of this release is the full migration of dpctl.tensor submodule to sister project dpnp, shrinking the size of the package tremendously, by between 93% and 96%. The __sycl_usm_array_interface__ is still supported, with dpctl serving as curator of the protocol.

Additionally, dpctl build scripts were updated, removing use of python setup.py develop and python setup.py install, and dpctl documentation page now supports a version dropdown.

NOTE: Changes below which reference tensor were added to the tensor submodule prior to release, and therefore are included in the migrated tensor submodule in dpnp. They are included here for transparency and continuity of the submodule's history in the changelog.

Removed

  • Removed support for Python 3.9 gh-2180
  • Removed previously deprecated dpctl.tensor submodule, with all tensor functionality migrated to dpnp gh-2245

Added

  • Various options to build scripts (i.e., --clean). Options can be seen by calling --help, for example, python scripts/build_locally.py --help from the repository root gh-2172
  • Added multiversioned documentation for dpctl using a custom drop-down for version selection and a --multiversion option for documentation build helper script gh-2276

Changed

  • Improved performance of tensor.astype for boolean arrays gh-2158
  • Updated tensor DLPack version to v1.2 gh-2193, gh-2219
  • Changed how libsyclinterface find Intel SYCL compiler to account for changes in versioning of the Intel SYCL Nightly compiler gh-2211
  • Disallowed scalar conversion for non-0D tensor.usm_ndarray per Python Array API specification gh-2223, gh-2229

Fixed

  • Fixed a CMake warning from pybind11 gh-2162
  • Fixed a false positive warning for missing DPCTLSyclInterface.dll when building dpctl Conda package on Windows gh-2167
  • Fixed typos in SyclContextCreationError and SyclDeviceCreationError exception text gh-2169
  • Fixed undefined behavior in a tensor.roll test gh-2220
  • Added a work-around for a bug in the Intel Graphics Compiler that could cause tensor.cumulative_logsumexp to return incorrect results gh-2275

Maintenance

  • Resolved license deprecation warnings when building dpctl in-place gh-2160
  • Removed unnecessary CMake patching logic in Windows build script gh-2163
  • Updated documentation and build scripts to remove use of python setup.py develop and python setup.py install throughout the project, as they were deprecated and are no longer supported gh-2172, gh-2227
  • Updated examples to remove use of dpctl.tensor, opting to simplify the examples and rely on dpctl.memory objects which wrap USM allocations. Examples which could not be reasonably rewritten were removed gh-2245
  • Removed deprecated property syntax from Cython files gh-2277
  • Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts gh-2188, gh-2189, gh-2221, gh-2255, gh-2262, gh-2264

0.21.1

17 Dec 03:51
d585d9a

Choose a tag to compare

This release is identical to 0.21.0 in terms of features.

This release adjusts conda recipes and metadata to release dpctl for Python 3.14, and deprecates dpctl.tensor. In the future, all tensor functionality will be moved to dpnp. For now, a DeprecationWarning is shown when importing dpctl.tensor or from dpctl.tensor.

Maintenance

  • Added Python 3.14 and python-gil to package metadata, as free-threaded Python is not yet supported gh-2173

Deprecated

  • Deprecated dpctl.tensor module pending move to dpnp gh-2191

v0.21.0

08 Oct 20:57
878cc19

Choose a tag to compare

This release features the addition of new function tensor.isin, indexing of tensor.usm_ndarray with numpy.ndarray, and support for building dpctl for specific CUDA architectures.

Improvements were also made to the build time and binary size of the project, and to the build driver script, making it more convenient when building for CUDA or AMD devices.

Added

  • Added tensor.isin per future Python Array API specification version gh-2098
  • numpy.ndarrays are now permitted when indexing on tensor.usm_ndarray gh-2128

Changed

  • Made a number of constexpr variables inline or static throughout the project, especially in headers, to reduce binary size and improve build time gh-2094, gh-2107
  • DPCTL_TARGET_CUDA and DPCTL_TARGET_HIP now permit specifying the CUDA or HIP architectures gh-2096, gh-2099
  • Extended build_locally.py build driver script to permit --target-cuda and --target-hip options, which match the behavior of DPCTL_TARGET_CUDA and DPCTL_TARGET_HIP gh-2109
  • Improved tensor.asnumpy and tensor.to_numpy for size-0 arrays gh-2120
  • Permit type casting size-0 tensor.usm_ndarray to arbitrary dtype via tensor.usm_ndarray constructor's buffer keyword (i.e., using the original memory as the buffer for the new size-0 array's underlying memory) gh-2123

Fixed

  • Fixed tensor.asarray failing when given device keyword with an input array of a dtype not supported by device gh-2097
  • Fixes undefined behavior in radix sort algorithm and avoids call to sorting algorithms when calling tensor.sort and tensor.argsort on size-1 arrays, or along a size-1 axis gh-2106
  • Fixed incorrect results when calling dpt.astype on tensor.usm_ndarray constructed from a boolean view into a numpy.ndarray gh-2122
  • Fixed dpctl imported in virtual environment on Windows failing to see devices or find DLLs gh-2130
  • Fixed Cythonization failure when testing the ability to create dpctl Cython API extensions with an editable install gh-2147

Maintenance

v0.20.2

26 Jun 21:35
0598395

Choose a tag to compare

This release is identical to 0.20.1 in terms of features

This release adds metadata and conda recipe changes intended for releasing the dpctl package with Python 3.13

v0.20.1

06 Jun 21:05
31d4c10

Choose a tag to compare

This is a bug fix release which fixes missing event dependencies in roll and reshape Python bindings for size-1 input arrays, see gh-2095

v0.20.0

06 Jun 20:55
de4b977

Choose a tag to compare

This release achieves compliance of dpctl.tensor with the Python Array API 2024.12 standard.

The dpctl namespace has also received a number of new features, including new Python classes dpctl.LocalAccessor, dpctl.WorkGroupMemory, and dpctl.RawKernelArg to be used as kernel argument types, support for peer access between dpctl.SyclDevice instances, and support for composite Level Zero devices.

Added

  • Added dpctl.WorkGroupMemory class representing sycl::ext::oneapi::experimental::work_group_memory, to be used as a kernel argument type gh-1984
  • Added dpctl.LocalAccessor class representing sycl::local_accessor, to be used as a kernel argument type gh-1991
  • Added dpctl.SyclPlatform.get_devices method for getting all dpctl.SyclDevices for the platform gh-1992
  • Added support for the composite devices extension for Level Zero devices, usable with some devices when setting ZE_FLAT_DEVICE_HIERARCHY=COMBINED gh-1993
  • Added out keyword to tensor.take gh-2010
  • Added dpctl.RawKernelArg class representing sycl::ext::oneapi::experimental::raw_kernal_arg, to be used as a kernel argument type gh-2038
  • Added dpctl.SyclDevice methods for querying, enabling, and disabling peer access between devices gh-2077, gh-2082

Changed

  • Updated Level Zero loader detection to no longer rely on reading libur_adapter_level_zero.so for the loader filename gh-2025
  • Updated integer array indexing to align with the 2024.12 array API specification gh-2032
  • Support for Boolean data-type is added to dpctl.tensor.ceil, dpctl.tensor.floor, and dpctl.tensor.trunc gh-2033
  • Changed implementation of DPCTLPlatform_GetDefaultContext from using deprecated ext_oneapi_get_default_context to khr_get_default_context gh-2042
  • Updated supported array API specification version to 2024.12 gh-2047
  • Implementation struct for tensor.imag now uses a static member value for the imaginary part of real-valued inputs gh-2063
  • Updated repr to show the shape of the abbreviated arrays and show the shape and data type of zero-size arrays gh-2067
  • Changed tensor.__array_namespace_info__().capabilities()[]"max dimensions"] to None gh-2071

Fixed

  • Refactored code common to accumulation operations (dpt.cumulative_sum, dpt.cumulative_prod, dpt.cumulative_logsumexp) and removed unnecessary event initialization gh-2011
  • Fixed incorrect results for dpt.cumulative_sum and dpt.cumulative_prod when dtype=dpt.bool gh-2018
  • Fixed a typo in dpctl.SyclPlatform repr gh-2035
  • Fixed a bug in tensor.asarray where order="K" could fail to produce an array sufficient for the internal copy operation for some edge cases, including a contiguous array with permuted dimensions gh-2058
  • Fixed a typo in dpctl.memory.USMAllocationError gh-2072

Maintenance

  • Document dpctl.device_type, dpctl.backend_type, dpctl.event_status_type, and dpctl.global_mem_cache_type enums gh-2019
  • Updated SYCL_INCLUDE_DIR_HINT in Conda recipe gh-2039
  • Updated expected dtypes in element-wise function docstrings gh-2041, gh-2048
  • Set ARRAY_API_TESTS_VERSION=2024.12 when running array API conformity job in CI gh-2046
  • Install hwloc when running CI job for nightly SYCL compiler gh-2050
  • Added cython-lint to pre-commit to improve style and readability of Cython code gh-2056
  • Skip upload jobs when GitHub CI is called from a forked repo gh-2059
  • Disable nightly tests run from forked repos gh-2060
  • Fixed a typo in beginner's guide example gh-2061
  • Updated bandit version gh-2075
  • Updated Conda installation instructions gh-2080, gh-2081
  • Fixed an incorrect link to changelog in package metadata gh-2085
  • Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts gh-2020, gh-2034, gh-2043, gh-2044, gh-2065, gh-2066, gh-2068, gh-2070

New Contributors

v0.19.0

28 Feb 19:25
1336b31

Choose a tag to compare

This release features official, out-of-the-box support for compiling dpctl for specified AMD GPU architectures, the addition of new function tensor.top_k, a radix-sort-based implementation of sorting functions, and improvements to interoperability with DLPack through tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice.

A number of adjustments were also made to improve performance of dpctl reductions (i.e., sum, min, max, etc.), accumulators (i.e., cumulative_sum, cumulative_logsumexp), and copy-and-cast operations.

Added

  • Support for compiling dpctl for specified AMD GPU architecture with use of CodePlay oneAPI plug-in gh-1731
  • Added tensor.top_k per Python Array API specification gh-1921
  • Added functions tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice for converting between DLPack and sycl devices, and a method get_device_id to dpctl.SyclDevice to improve interoperability with DLPack protocol gh-1953
  • Added DPCTL_OFFLOAD_COMPRESS cmake option (set to OFF by default) to toggle --offload-compress linker option when building dpctl gh-1961

Changed

  • Improved performance of copy-and-cast operations from numpy.ndarray to tensor.usm_ndarray for contiguous inputs gh-1829
  • py_sort and py_argsort now throw py::value_error if inputs are not C-contiguous gh-1838
  • Improved performance of copying operation to C-/F-contig array, with optimization for batch of square matrices gh-1850
  • Improved performance of tensor.argsort function for all types gh-1859
  • Improved performance of tensor.sort and tensor.argsort for short arrays in the range [16, 64] elements gh-1866
  • Implemented radix sort algorithm to be used in dpt.sort and dpt.argsort gh-1867, gh-1883
  • Extended dpctl.SyclTimer with device_timer keyword, implementing different methods of collecting device times gh-1872
  • dpctl changed to see GPU devices out of the box in virtual environment on Windows gh-1922
  • Improved performance of tensor.cumulative_sum, tensor.cumulative_prod, tensor.cumulative_logsumexp as well as performance of boolean indexing gh-1923, gh-1942
  • Improved performance of tensor.min, tensor.max, tensor.logsumexp, tensor.reduce_hypot for floating point type arrays by at least 2x gh-1932, gh-1937
  • Updated Cython examples to use scikit-build gh-1935
  • Reduced binary size of _tensor_accumulation_impl by 13 MB gh-1957
  • Extended tensor.asarray to support objects that implement __usm_ndarray__ property to be interpreted as usm_ndarray objects gh-1959
  • tensor.usm_ndarray object disallows implicit conversions to NumPy array gh-1964
  • stream arguments in tensor.usm_ndarray methods now raise an error if stream is not a tensor.SyclQueue gh-1969
  • dpctl initialization sets subprocess to use SPAWN method on Linux to enable gdb-oneapi to debug kernels submitted from Python applications gh-1971
  • Reduced binary size of _tensor_elementwise_impl gh-1976
  • Allow dpctl.SyclQueue.memcpy to and from multi-dimensional buffers gh-1985

Fixed

  • Fixed a bug in tensor.roll for very large values of shift gh-1869
  • Fix for tensor.result_type when all inputs are Python built-in scalars gh-1877
  • Improved error in constructors tensor.full and tensor.full_like when provided a non-numeric fill value gh-1878
  • Added a check for pointer alignment when copying to C-contiguous memory gh-1890, gh-1891
  • Fixed dpctl installed into virtual environment not finding DPC++ runtime libraries by adding DPCTL_WITH_REDIST cmake option (set to OFF by default) gh-1893
  • Fixed incorrect result (issue gh-1901) in tensor.cumulative_sum and in advanced indexing gh-1902
  • Fixed __setitem__() for tensor.usm_ndarray when passed an empty boolean mask gh-1915
  • tensor.from_dlpack docstring now shows that return type can be NumPy array and stipulates when this will be the case gh-1919
  • Fixed docstring in helper class in DLPack tests gh-1920
  • Fixed a bug in tensor.astype where copy=False would not be respected for 1d arrays when order keyword is specified gh-1928
  • Replaced deprecated CL/sycl.hpp with recommended sycl/sycl.hpp in examples gh-1933
  • Fixed tensor.take_along_axis and tensor.put_along_axis raising an error for tensor.uint64 indices when given an array of dimension greater than 1 gh-1934
  • Fixed unexpected results of tensor.sum with a requested output type of bool gh-1958
  • Use std::move to avoid unnecessary copying of temporary in triul_ctor.cpp gh-1960
  • Make stream a keyword-only argument in tensor.usm_ndarray.to_device per requirement by array API specification gh-1966
  • Improve efficiency of copy implementation and avoid an unnecessary kernel invocation in tensor.argsort for 1d input gh-1967
  • Corrected uses of NumPy constructors with tensor.usm_ndarray inputs in test suite gh-1968
  • Fixed array API namespace inspection utilities showing complex128 as a valid dtype on devices without double precision and device keywords not working with dpctl.SyclQueue or filter strings gh-1979
  • Fixed a bug in test_sycl_device_interface.cpp which would cause compilation to fail with Clang version 20.0 gh-1989
  • Fixed memory leaks in smart-pointer-managed USM temporaries in synchronizing kernel calls gh-2002
  • UsmNDArray_MakeSimpleFromPtr and UsmNDArray_MakeFromPtr now raise an error when provided an invalid typenum before attempting to create the array gh-2003
  • Fixed typos in tensor.from_numpy and tensor.astype gh-2006

Maintenance

  • Revert pinning of cmake to 3.26 on Windows gh-1823
  • Update black version used in Python code style workflow gh-1828
  • Fixed CI/CD workflow for building conda packages on Windows gh-1831
  • Revert work-around in test_sycl_kernel_submit.py for problem in MKL 2024.2.0 gh-1836
  • Do not use Mambaforge variant of miniforge as deprecated gh-1844
  • Use pybind11=2.13.6 gh-1845
  • Remove unnecessary include in C++ header file gh-1846
  • Build translation unit "simplify_iteration_space.cpp" compiled multiple times as a static library gh-1847
  • Add instructions for installing dpctl from Intel PyPi channel gh-1860
  • Fix warnings when generating docs gh-1855, gh-1861
  • Align conda recipe with conda-forge's {{ stdlib("c") }} migration gh-1868
  • Add missing include of SYCL header to "math_utils.hpp" gh-1899
  • Add support of CV-qualifiers in is_complex<T> helper gh-1900
  • Tuning work for elementwise functions with modest performance gains (under 10%) gh-1889
  • Reduce binary ...
Read more

v0.18.3

07 Dec 18:21
69be39d

Choose a tag to compare

This is a bug fix release which supports use of dpctl in virtual environment on Windows, resolving gh-1745.

v0.18.2

03 Dec 20:58
7bac769

Choose a tag to compare

This is a bug-fix release, see https://github.com/IntelPython/dpctl/milestone/15.

It backports fixes for

  • tensor.result_type behavior for scalars (see gh-1874) and
  • errors when using dpctl in virtual environment on Linux (gh-1892).

Changes from PR gh-1899 were also backported.