lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 15 Feb 2022 03:07:12 +0000
From:   "Quan, Evan" <Evan.Quan@....com>
To:     Salvatore Bonaccorso <carnil@...ian.org>,
        "Deucher, Alexander" <Alexander.Deucher@....com>
CC:     Dominique Dumont <dod@...ian.org>,
        "1005005@...s.debian.org" <1005005@...s.debian.org>,
        "Tuikov, Luben" <Luben.Tuikov@....com>,
        Sasha Levin <sashal@...nel.org>,
        "Koenig, Christian" <Christian.Koenig@....com>,
        "Pan, Xinhui" <Xinhui.Pan@....com>,
        David Airlie <airlied@...ux.ie>,
        Daniel Vetter <daniel@...ll.ch>,
        "amd-gfx@...ts.freedesktop.org" <amd-gfx@...ts.freedesktop.org>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic
 in suspend (v2)") on suspend?

[AMD Official Use Only]



> -----Original Message-----
> From: Salvatore Bonaccorso <salvatore.bonaccorso@...il.com> On Behalf
> Of Salvatore Bonaccorso
> Sent: Sunday, February 13, 2022 2:24 AM
> To: Deucher, Alexander <Alexander.Deucher@....com>
> Cc: Dominique Dumont <dod@...ian.org>; 1005005@...s.debian.org;
> Tuikov, Luben <Luben.Tuikov@....com>; Quan, Evan
> <Evan.Quan@....com>; Sasha Levin <sashal@...nel.org>; Koenig, Christian
> <Christian.Koenig@....com>; Pan, Xinhui <Xinhui.Pan@....com>; David
> Airlie <airlied@...ux.ie>; Daniel Vetter <daniel@...ll.ch>; amd-
> gfx@...ts.freedesktop.org; dri-devel@...ts.freedesktop.org; linux-
> kernel@...r.kernel.org
> Subject: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic
> in suspend (v2)") on suspend?
> 
> Hi Alex, hi all
> 
> In Debian we got a regression report from Dominique Dumont, CC'ed in
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .debian.org%2F1005005&amp;data=04%7C01%7Cevan.quan%40amd.com%7
> C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d994e1
> 83d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C3000&amp;sdata=6xECB3MmvNYuOn41ZOEDPyWUjklY%2Bfxumz7lf8fijwA
> %3D&amp;reserved=0 that afer an update to 5.15.15 based kernel, his
> machine noe longer suspends correctly, after screen going black as usual it
> comes back. The Debian bug above contians a trace.
> 
> Dominique confirmed that this issue persisted after updating to 5.16.7
> furthermore he bisected the issue and found
> 
> 	3c196f05666610912645c7c5d9107706003f67c3 is the first bad commit
> 	commit 3c196f05666610912645c7c5d9107706003f67c3
> 	Author: Alex Deucher <alexander.deucher@....com>
> 	Date:   Fri Nov 12 11:25:30 2021 -0500
> 
> 	    drm/amdgpu: always reset the asic in suspend (v2)
> 
> 	    [ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ]
> 
> 	    If the platform suspend happens to fail and the power rail
> 	    is not turned off, the GPU will be in an unknown state on
> 	    resume, so reset the asic so that it will be in a known
> 	    good state on resume even if the platform suspend failed.
> 
> 	    v2: handle s0ix
> 
> 	    Acked-by: Luben Tuikov <luben.tuikov@....com>
> 	    Acked-by: Evan Quan <evan.quan@....com>
> 	    Signed-off-by: Alex Deucher <alexander.deucher@....com>
> 	    Signed-off-by: Sasha Levin <sashal@...nel.org>
> 
> 	 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
> 	 1 file changed, 4 insertions(+), 1 deletion(-)
> 
> to be the first bad commit, see
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .debian.org%2F1005005%2334&amp;data=04%7C01%7Cevan.quan%40amd.c
> om%7C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d
> 994e183d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3
> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C3000&amp;sdata=CV%2FKmpYT8WOVJnrTiU91godaFDJMpjih%2FAV
> NAcw5qaI%3D&amp;reserved=0 .
I checked the back trace posted there(below). It seems the error occurred during amdgpu_device_suspend(). 
That means Alex's patch should not be related(as it affected only those logic after amdgpu_device_suspend()). 
So we might got a wrong regression point here.
[  257.842851]  ? vi_common_set_clockgating_state+0x229/0x2f0 [amdgpu]
[  257.843356]  amdgpu_device_ip_suspend_phase1+0x5e/0xc0 [amdgpu]
[  257.843771]  amdgpu_device_suspend+0x62/0xc0 [amdgpu]
[  257.844184]  amdgpu_pmops_suspend+0x36/0x70 [amdgpu]
[  257.844631]  pci_pm_suspend+0x71/0x160
[  257.844643]  ? pci_pm_freeze+0xb0/0xb0

BR
Evan
> 
> Does this ring any bell? Any idea on the problem?
> 
> Regards,
> Salvatore

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ