Releases: IntelPython/dpctl
0.22.1
0.22.0
The highlight of this release is the full migration of dpctl.tensor submodule to sister project dpnp, shrinking the size of the package tremendously, by between 93% and 96%. The __sycl_usm_array_interface__ is still supported, with dpctl serving as curator of the protocol.
Additionally, dpctl build scripts were updated, removing use of python setup.py develop and python setup.py install, and dpctl documentation page now supports a version dropdown.
NOTE: Changes below which reference tensor were added to the tensor submodule prior to release, and therefore are included in the migrated tensor submodule in dpnp. They are included here for transparency and continuity of the submodule's history in the changelog.
Removed
- Removed support for Python 3.9 gh-2180
- Removed previously deprecated
dpctl.tensorsubmodule, with all tensor functionality migrated todpnpgh-2245
Added
- Various options to build scripts (i.e.,
--clean). Options can be seen by calling--help, for example,python scripts/build_locally.py --helpfrom the repository root gh-2172 - Added multiversioned documentation for
dpctlusing a custom drop-down for version selection and a--multiversionoption for documentation build helper script gh-2276
Changed
- Improved performance of
tensor.astypefor boolean arrays gh-2158 - Updated
tensorDLPack version to v1.2 gh-2193, gh-2219 - Changed how libsyclinterface find Intel SYCL compiler to account for changes in versioning of the Intel SYCL Nightly compiler gh-2211
- Disallowed scalar conversion for non-0D
tensor.usm_ndarrayper Python Array API specification gh-2223, gh-2229
Fixed
- Fixed a CMake warning from pybind11 gh-2162
- Fixed a false positive warning for missing
DPCTLSyclInterface.dllwhen buildingdpctlConda package on Windows gh-2167 - Fixed typos in
SyclContextCreationErrorandSyclDeviceCreationErrorexception text gh-2169 - Fixed undefined behavior in a
tensor.rolltest gh-2220 - Added a work-around for a bug in the Intel Graphics Compiler that could cause
tensor.cumulative_logsumexpto return incorrect results gh-2275
Maintenance
- Resolved license deprecation warnings when building
dpctlin-place gh-2160 - Removed unnecessary CMake patching logic in Windows build script gh-2163
- Updated documentation and build scripts to remove use of
python setup.py developandpython setup.py installthroughout the project, as they were deprecated and are no longer supported gh-2172, gh-2227 - Updated examples to remove use of
dpctl.tensor, opting to simplify the examples and rely ondpctl.memoryobjects which wrap USM allocations. Examples which could not be reasonably rewritten were removed gh-2245 - Removed deprecated property syntax from Cython files gh-2277
- Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts gh-2188, gh-2189, gh-2221, gh-2255, gh-2262, gh-2264
0.21.1
This release is identical to 0.21.0 in terms of features.
This release adjusts conda recipes and metadata to release dpctl for Python 3.14, and deprecates dpctl.tensor. In the future, all tensor functionality will be moved to dpnp. For now, a DeprecationWarning is shown when importing dpctl.tensor or from dpctl.tensor.
Maintenance
- Added Python 3.14 and
python-gilto package metadata, as free-threaded Python is not yet supported gh-2173
Deprecated
- Deprecated
dpctl.tensormodule pending move todpnpgh-2191
v0.21.0
This release features the addition of new function tensor.isin, indexing of tensor.usm_ndarray with numpy.ndarray, and support for building dpctl for specific CUDA architectures.
Improvements were also made to the build time and binary size of the project, and to the build driver script, making it more convenient when building for CUDA or AMD devices.
Added
- Added
tensor.isinper future Python Array API specification version gh-2098 numpy.ndarraysare now permitted when indexing ontensor.usm_ndarraygh-2128
Changed
- Made a number of constexpr variables inline or static throughout the project, especially in headers, to reduce binary size and improve build time gh-2094, gh-2107
DPCTL_TARGET_CUDAandDPCTL_TARGET_HIPnow permit specifying the CUDA or HIP architectures gh-2096, gh-2099- Extended
build_locally.pybuild driver script to permit--target-cudaand--target-hipoptions, which match the behavior ofDPCTL_TARGET_CUDAandDPCTL_TARGET_HIPgh-2109 - Improved
tensor.asnumpyandtensor.to_numpyfor size-0 arrays gh-2120 - Permit type casting size-0
tensor.usm_ndarrayto arbitrary dtype viatensor.usm_ndarrayconstructor'sbufferkeyword (i.e., using the original memory as the buffer for the new size-0 array's underlying memory) gh-2123
Fixed
- Fixed
tensor.asarrayfailing when givendevicekeyword with an input array of a dtype not supported bydevicegh-2097 - Fixes undefined behavior in radix sort algorithm and avoids call to sorting algorithms when calling
tensor.sortandtensor.argsorton size-1 arrays, or along a size-1 axis gh-2106 - Fixed incorrect results when calling
dpt.astypeontensor.usm_ndarrayconstructed from a boolean view into anumpy.ndarraygh-2122 - Fixed
dpctlimported in virtual environment on Windows failing to see devices or find DLLs gh-2130 - Fixed Cythonization failure when testing the ability to create
dpctlCython API extensions with an editable install gh-2147
Maintenance
- Revert restricting Cython to below 3.1.0 when building dpctl for Python 3.13 gh-2118
- Add a link to
tensor.DLDeviceTypedocumentation from__dlpack_device__docstring gh-2127 - Update pybind11 to 3.0.1 gh-2145
- Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts gh-2043, gh-2044, gh-2065, gh-2066, gh-2068, gh-2070 gh-2088, gh-2104, gh-2151, gh-2154, gh-2155
v0.20.2
This release is identical to 0.20.1 in terms of features
This release adds metadata and conda recipe changes intended for releasing the dpctl package with Python 3.13
v0.20.1
v0.20.0
This release achieves compliance of dpctl.tensor with the Python Array API 2024.12 standard.
The dpctl namespace has also received a number of new features, including new Python classes dpctl.LocalAccessor, dpctl.WorkGroupMemory, and dpctl.RawKernelArg to be used as kernel argument types, support for peer access between dpctl.SyclDevice instances, and support for composite Level Zero devices.
Added
- Added
dpctl.WorkGroupMemoryclass representingsycl::ext::oneapi::experimental::work_group_memory, to be used as a kernel argument type gh-1984 - Added
dpctl.LocalAccessorclass representingsycl::local_accessor, to be used as a kernel argument type gh-1991 - Added
dpctl.SyclPlatform.get_devicesmethod for getting alldpctl.SyclDevicesfor the platform gh-1992 - Added support for the composite devices extension for Level Zero devices, usable with some devices when setting
ZE_FLAT_DEVICE_HIERARCHY=COMBINEDgh-1993 - Added
outkeyword totensor.takegh-2010 - Added
dpctl.RawKernelArgclass representingsycl::ext::oneapi::experimental::raw_kernal_arg, to be used as a kernel argument type gh-2038 - Added
dpctl.SyclDevicemethods for querying, enabling, and disabling peer access between devices gh-2077, gh-2082
Changed
- Updated Level Zero loader detection to no longer rely on reading
libur_adapter_level_zero.sofor the loader filename gh-2025 - Updated integer array indexing to align with the 2024.12 array API specification gh-2032
- Support for Boolean data-type is added to
dpctl.tensor.ceil,dpctl.tensor.floor, anddpctl.tensor.truncgh-2033 - Changed implementation of
DPCTLPlatform_GetDefaultContextfrom using deprecatedext_oneapi_get_default_contexttokhr_get_default_contextgh-2042 - Updated supported array API specification version to 2024.12 gh-2047
- Implementation struct for
tensor.imagnow uses a static member value for the imaginary part of real-valued inputs gh-2063 - Updated
reprto show the shape of the abbreviated arrays and show the shape and data type of zero-size arrays gh-2067 - Changed
tensor.__array_namespace_info__().capabilities()[]"max dimensions"]toNonegh-2071
Fixed
- Refactored code common to accumulation operations (
dpt.cumulative_sum,dpt.cumulative_prod,dpt.cumulative_logsumexp) and removed unnecessary event initialization gh-2011 - Fixed incorrect results for
dpt.cumulative_sumanddpt.cumulative_prodwhendtype=dpt.boolgh-2018 - Fixed a typo in
dpctl.SyclPlatformrepr gh-2035 - Fixed a bug in
tensor.asarraywhereorder="K"could fail to produce an array sufficient for the internal copy operation for some edge cases, including a contiguous array with permuted dimensions gh-2058 - Fixed a typo in
dpctl.memory.USMAllocationErrorgh-2072
Maintenance
- Document
dpctl.device_type,dpctl.backend_type,dpctl.event_status_type, anddpctl.global_mem_cache_typeenums gh-2019 - Updated
SYCL_INCLUDE_DIR_HINTin Conda recipe gh-2039 - Updated expected dtypes in element-wise function docstrings gh-2041, gh-2048
- Set
ARRAY_API_TESTS_VERSION=2024.12when running array API conformity job in CI gh-2046 - Install
hwlocwhen running CI job for nightly SYCL compiler gh-2050 - Added
cython-linttopre-committo improve style and readability of Cython code gh-2056 - Skip upload jobs when GitHub CI is called from a forked repo gh-2059
- Disable nightly tests run from forked repos gh-2060
- Fixed a typo in beginner's guide example gh-2061
- Updated bandit version gh-2075
- Updated Conda installation instructions gh-2080, gh-2081
- Fixed an incorrect link to changelog in package metadata gh-2085
- Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts gh-2020, gh-2034, gh-2043, gh-2044, gh-2065, gh-2066, gh-2068, gh-2070
New Contributors
- @jharlow-intel made their first contribution in #2054
- @david-cortes-intel made their first contribution in #2080
v0.19.0
This release features official, out-of-the-box support for compiling dpctl for specified AMD GPU architectures, the addition of new function tensor.top_k, a radix-sort-based implementation of sorting functions, and improvements to interoperability with DLPack through tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice.
A number of adjustments were also made to improve performance of dpctl reductions (i.e., sum, min, max, etc.), accumulators (i.e., cumulative_sum, cumulative_logsumexp), and copy-and-cast operations.
Added
- Support for compiling
dpctlfor specified AMD GPU architecture with use of CodePlay oneAPI plug-in gh-1731 - Added
tensor.top_kper Python Array API specification gh-1921 - Added functions
tensor.dldevice_to_sycl_deviceandtensor.sycl_device_to_dldevicefor converting between DLPack and sycl devices, and a methodget_device_idtodpctl.SyclDeviceto improve interoperability with DLPack protocol gh-1953 - Added
DPCTL_OFFLOAD_COMPRESScmake option (set toOFFby default) to toggle --offload-compress linker option when buildingdpctlgh-1961
Changed
- Improved performance of copy-and-cast operations from
numpy.ndarraytotensor.usm_ndarrayfor contiguous inputs gh-1829 py_sortandpy_argsortnow throwpy::value_errorif inputs are not C-contiguous gh-1838- Improved performance of copying operation to C-/F-contig array, with optimization for batch of square matrices gh-1850
- Improved performance of
tensor.argsortfunction for all types gh-1859 - Improved performance of
tensor.sortandtensor.argsortfor short arrays in the range [16, 64] elements gh-1866 - Implemented radix sort algorithm to be used in
dpt.sortanddpt.argsortgh-1867, gh-1883 - Extended
dpctl.SyclTimerwithdevice_timerkeyword, implementing different methods of collecting device times gh-1872 dpctlchanged to see GPU devices out of the box in virtual environment on Windows gh-1922- Improved performance of
tensor.cumulative_sum,tensor.cumulative_prod,tensor.cumulative_logsumexpas well as performance of boolean indexing gh-1923, gh-1942 - Improved performance of
tensor.min,tensor.max,tensor.logsumexp,tensor.reduce_hypotfor floating point type arrays by at least 2x gh-1932, gh-1937 - Updated Cython examples to use scikit-build gh-1935
- Reduced binary size of
_tensor_accumulation_implby 13 MB gh-1957 - Extended
tensor.asarrayto support objects that implement__usm_ndarray__property to be interpreted asusm_ndarrayobjects gh-1959 tensor.usm_ndarrayobject disallows implicit conversions to NumPy array gh-1964streamarguments intensor.usm_ndarraymethods now raise an error ifstreamis not atensor.SyclQueuegh-1969dpctlinitialization sets subprocess to use SPAWN method on Linux to enablegdb-oneapito debug kernels submitted from Python applications gh-1971- Reduced binary size of
_tensor_elementwise_implgh-1976 - Allow
dpctl.SyclQueue.memcpyto and from multi-dimensional buffers gh-1985
Fixed
- Fixed a bug in
tensor.rollfor very large values ofshiftgh-1869 - Fix for
tensor.result_typewhen all inputs are Python built-in scalars gh-1877 - Improved error in constructors
tensor.fullandtensor.full_likewhen provided a non-numeric fill value gh-1878 - Added a check for pointer alignment when copying to C-contiguous memory gh-1890, gh-1891
- Fixed
dpctlinstalled into virtual environment not finding DPC++ runtime libraries by addingDPCTL_WITH_REDISTcmake option (set toOFFby default) gh-1893 - Fixed incorrect result (issue gh-1901) in
tensor.cumulative_sumand in advanced indexing gh-1902 - Fixed
__setitem__()fortensor.usm_ndarraywhen passed an empty boolean mask gh-1915 tensor.from_dlpackdocstring now shows that return type can be NumPy array and stipulates when this will be the case gh-1919- Fixed docstring in helper class in DLPack tests gh-1920
- Fixed a bug in
tensor.astypewherecopy=Falsewould not be respected for 1d arrays when order keyword is specified gh-1928 - Replaced deprecated
CL/sycl.hppwith recommendedsycl/sycl.hppin examples gh-1933 - Fixed
tensor.take_along_axisandtensor.put_along_axisraising an error fortensor.uint64indices when given an array of dimension greater than 1 gh-1934 - Fixed unexpected results of
tensor.sumwith a requested output type ofboolgh-1958 - Use
std::moveto avoid unnecessary copying of temporary intriul_ctor.cppgh-1960 - Make
streama keyword-only argument intensor.usm_ndarray.to_deviceper requirement by array API specification gh-1966 - Improve efficiency of copy implementation and avoid an unnecessary kernel invocation in
tensor.argsortfor 1d input gh-1967 - Corrected uses of NumPy constructors with
tensor.usm_ndarrayinputs in test suite gh-1968 - Fixed array API namespace inspection utilities showing
complex128as a valid dtype on devices without double precision anddevicekeywords not working withdpctl.SyclQueueor filter strings gh-1979 - Fixed a bug in
test_sycl_device_interface.cppwhich would cause compilation to fail with Clang version 20.0 gh-1989 - Fixed memory leaks in smart-pointer-managed USM temporaries in synchronizing kernel calls gh-2002
UsmNDArray_MakeSimpleFromPtrandUsmNDArray_MakeFromPtrnow raise an error when provided an invalidtypenumbefore attempting to create the array gh-2003- Fixed typos in
tensor.from_numpyandtensor.astypegh-2006
Maintenance
- Revert pinning of cmake to 3.26 on Windows gh-1823
- Update black version used in Python code style workflow gh-1828
- Fixed CI/CD workflow for building conda packages on Windows gh-1831
- Revert work-around in
test_sycl_kernel_submit.pyfor problem in MKL 2024.2.0 gh-1836 - Do not use Mambaforge variant of miniforge as deprecated gh-1844
- Use pybind11=2.13.6 gh-1845
- Remove unnecessary include in C++ header file gh-1846
- Build translation unit "simplify_iteration_space.cpp" compiled multiple times as a static library gh-1847
- Add instructions for installing
dpctlfrom Intel PyPi channel gh-1860 - Fix warnings when generating docs gh-1855, gh-1861
- Align conda recipe with conda-forge's
{{ stdlib("c") }}migration gh-1868 - Add missing include of SYCL header to "math_utils.hpp" gh-1899
- Add support of CV-qualifiers in
is_complex<T>helper gh-1900 - Tuning work for elementwise functions with modest performance gains (under 10%) gh-1889
- Reduce binary ...
v0.18.3
v0.18.2
This is a bug-fix release, see https://github.com/IntelPython/dpctl/milestone/15.
It backports fixes for
tensor.result_typebehavior for scalars (see gh-1874) and- errors when using
dpctlin virtual environment on Linux (gh-1892).
Changes from PR gh-1899 were also backported.