Skip to content

"ze_peak" freezes on DG1 with latest drm-tip kernel + drivers #20

@eero-t

Description

@eero-t

Setup:

  • HW: CML-S / DG1 (0x4905)
  • OS: Ubuntu 22.04
  • Kernel: "drm-tip" head from yesterday
  • UMD: Latest releases of compute stack components, built with LLVM 12
  • App: "ze_peak" from level-zero-tests head

Bug:

./ze_peak freezes with 99% CPU usage after showing:
Single Precision Compute (GFLOPS)

(I.e. half precision and global BW tests before it worked fine.)

It can be quit with ^C, so it's not in 100% CPU loop.

Gdb shows:

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.
0x00007f6fbca28cab in sched_yield () from target:/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f6fbca28cab in sched_yield () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f6fbc27cd63 in ?? () from target:/usr/local/lib/libze_intel_gpu.so.1
#2  0x00007f6fbc0572c2 in ?? () from target:/usr/local/lib/libze_intel_gpu.so.1
#3  0x0000564de2c87d3f in ?? ()
#4  0x0000564de2c88653 in ?? ()
#5  0x0000564de2c94ba1 in ?? ()
#6  0x0000564de2c86104 in ?? ()
#7  0x00007f6fbc949d90 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#8  0x00007f6fbc949e40 in __libc_start_main () from target:/lib/x86_64-linux-gnu/libc.so.6
#9  0x0000564de2c862e5 in ?? ()

perf showed most of the time being spent inside libze_intel_gpu.so.1. I.e. it could be driver issue, but I thought it better to start from the app.

ze_image_copy, ze_nano and ze_pingpong work fine. ze_bandwidth gets slower and slower, and I did not wait for it to complete.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions