lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADGDV=WJjcLds5T1uAst7ctOMbApnLR6ixH8wvgvKvF-YS6kog@mail.gmail.com>
Date: Mon, 2 Jun 2025 12:25:28 +0200
From: Philipp Reisner <philipp.reisner@...bit.com>
To: Christopher Snowhill <chris@...e54.net>
Cc: Christian König <christian.koenig@....com>, 
	Philipp Stanner <pstanner@...hat.com>, dri-devel@...ts.freedesktop.org, 
	linux-kernel@...r.kernel.org, Simona Vetter <simona@...ll.ch>, 
	Danilo Krummrich <dakr@...nel.org>, Philipp Stanner <phasta@...nel.org>, 
	dri-devel <dri-devel-bounces@...ts.freedesktop.org>
Subject: Re: [PATCH] drm/sched: Fix amdgpu crash upon suspend/resume

Hi Christopher,

Thanks for following up. The bug still annoys me from time to time.
It triggered last on May 8, May 12, and May 18.
The crash on May 18 was already with the 6.14.5 kernel.

> Could this sleep wake issue also be caused by a similar thing to the
> panics and SMU hangs I was experiencing with my own issue? It's an issue
> known to have the same workaround for both 6000 and 7000 series users. A
> specific kernel commit seems to affect it as well.
>

I posted the stack trace earlier in the thread. The question is, what
was the stack
trace of the issue you are referring to?

>
> If you could test whether you can still reproduce the error after
> disabling GFXOFF states with the following kernel commandline override:
>
> amdgpu.ppfeaturemask=0xfff73fff
>

that disables PP_OVERDRIVE_MASK, PP_GFXOFF_MASK,
and PP_GFX_DCS_MASK.

IMHO, that looks like a mitigation for something different than the non-ready
compute schedulers that seem to be the root cause for the NULL pointer derefs
in my case.

Anyhow, I will give it a try, and will report back if my workstation
does not deref
NULL pointers for more than three weeks with that amdgpu.ppfeaturemask set.

Best regards,
 Philipp

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ