lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Wed, 24 Jan 2024 07:19:03 +0500
From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To: matthew.brost@...el.com, ltuikov89@...il.com, 
	Alex Deucher <alexdeucher@...il.com>, 
	Christian König <ckoenig.leichtzumerken@...il.com>, 
	dri-devel <dri-devel@...ts.freedesktop.org>, 
	amd-gfx list <amd-gfx@...ts.freedesktop.org>, 
	Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
Subject: regression/bisected/6.8 commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7
 leads to GPU hang when I open GNOME activities

Hi,
I spotted that between commits 70d201a40823 and 052d534373b7 my GPU
begins randomly hanging when I open the GNOME shell activity screen.
I found a good reproducing script.
- Launch Elden Ring game
- Continue game (game world should be loaded)
- Press start (windows) button
Here GPU hanged with 99% probability, if GPU not hanged that press
start button several times for ensure.

And founded bad commit is looking so:
f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 is the first bad commit
commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7
Author: Matthew Brost <matthew.brost@...el.com>
Date:   Mon Oct 30 20:24:37 2023 -0700

    drm/sched: Split free_job into own work item

    Rather than call free_job and run_job in same work item have a dedicated
    work item for each. This aligns with the design and intended use of work
    queues.

    v2:
       - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
         timestamp in free_job() work item (Danilo)
    v3:
      - Drop forward dec of drm_sched_select_entity (Boris)
      - Return in drm_sched_run_job_work if entity NULL (Boris)
    v4:
      - Replace dequeue with peek and invert logic (Luben)
      - Wrap to 100 lines (Luben)
      - Update comments for *_queue / *_queue_if_ready functions (Luben)
    v5:
      - Drop peek argument, blindly reinit idle (Luben)
      - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done
(Luben)
      - Update work_run_job & work_free_job kernel doc (Luben)
    v6:
      - Do not move drm_sched_select_entity in file (Luben)

    Signed-off-by: Matthew Brost <matthew.brost@...el.com>
    Link: https://lore.kernel.org/r/20231031032439.1558703-4-matthew.brost@intel.com
    Reviewed-by: Luben Tuikov <ltuikov89@...il.com>
    Signed-off-by: Luben Tuikov <ltuikov89@...il.com>

 drivers/gpu/drm/scheduler/sched_main.c | 146 ++++++++++++++++++++++-----------
 include/drm/gpu_scheduler.h            |   4 +-
 2 files changed, 101 insertions(+), 49 deletions(-)

Unfortunately GPU hangs still occurs even on 6.8-rc1 so why I wrote
here bug report.

GPU: Radeon 7900XTX
CPU: Ryzen 7950X
Full hardware specs are here: https://linux-hardware.org/?probe=9e5edb123e
Also I attach full bisect logs and kernel logs from each bisect step
in archives.

Who could dig into it, please?

-- 
Best Regards,
Mike Gavrilov.

Download attachment "bisect-GPU-hang-issue-log.zip" of type "application/zip" (1278 bytes)

Download attachment "kernel-logs.zip" of type "application/zip" (696348 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ