lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <DS7PR12MB6005CE495A063B43939F6CE8FB292@DS7PR12MB6005.namprd12.prod.outlook.com>
Date: Thu, 14 Mar 2024 03:00:41 +0000
From: "Liang, Prike" <Prike.Liang@....com>
To: Alex Deucher <alexdeucher@...il.com>, "Kuehling, Felix"
	<Felix.Kuehling@....com>
CC: Sasha Levin <sashal@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "stable@...r.kernel.org"
	<stable@...r.kernel.org>, "Deucher, Alexander" <Alexander.Deucher@....com>,
	"Koenig, Christian" <Christian.Koenig@....com>, "Pan, Xinhui"
	<Xinhui.Pan@....com>, "airlied@...il.com" <airlied@...il.com>,
	"daniel@...ll.ch" <daniel@...ll.ch>, "Zhang, Hawking"
	<Hawking.Zhang@....com>, "Lazar, Lijo" <Lijo.Lazar@....com>, "Ma, Le"
	<Le.Ma@....com>, "Zhu, James" <James.Zhu@....com>, "Xiao, Shane"
	<shane.xiao@....com>, "Jiang, Sonny" <Sonny.Jiang@....com>,
	"amd-gfx@...ts.freedesktop.org" <amd-gfx@...ts.freedesktop.org>,
	"dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>
Subject: RE: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3
 abort cases on Raven series

[AMD Official Use Only - General]

> From: Alex Deucher <alexdeucher@...il.com>
> Sent: Thursday, March 14, 2024 4:46 AM
> To: Kuehling, Felix <Felix.Kuehling@....com>
> Cc: Sasha Levin <sashal@...nel.org>; linux-kernel@...r.kernel.org;
> stable@...r.kernel.org; Liang, Prike <Prike.Liang@....com>; Deucher,
> Alexander <Alexander.Deucher@....com>; Koenig, Christian
> <Christian.Koenig@....com>; Pan, Xinhui <Xinhui.Pan@....com>;
> airlied@...il.com; daniel@...ll.ch; Zhang, Hawking
> <Hawking.Zhang@....com>; Lazar, Lijo <Lijo.Lazar@....com>; Ma, Le
> <Le.Ma@....com>; Zhu, James <James.Zhu@....com>; Xiao, Shane
> <shane.xiao@....com>; Jiang, Sonny <Sonny.Jiang@....com>; amd-
> gfx@...ts.freedesktop.org; dri-devel@...ts.freedesktop.org
> Subject: Re: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3
> abort cases on Raven series
>
> On Wed, Mar 13, 2024 at 4:12 PM Felix Kuehling <felix.kuehling@....com>
> wrote:
> >
> > On 2024-03-11 11:14, Sasha Levin wrote:
> > > From: Prike Liang <Prike.Liang@....com>
> > >
> > > [ Upstream commit c671ec01311b4744b377f98b0b4c6d033fe569b3 ]
> > >
> > > Currently, GPU resets can now be performed successfully on the Raven
> > > series. While GPU reset is required for the S3 suspend abort case.
> > > So now can enable gpu reset for S3 abort cases on the Raven series.
> >
> > This looks suspicious to me. I'm not sure what conditions made the GPU
> > reset successful. But unless all the changes involved were also
> > backported, this should probably not be applied to older kernel
> > branches. I'm speculating it may be related to the removal of AMD
> IOMMUv2.
> >
>
> We should get confirmation from Prike, but I think he tested this on older
> kernels as well.
>
> Alex
>
> > Regards,
> >    Felix
> >

The Raven/Raven2 series GPU reset function was enabled in some older kernel versions such as 5.5 but filtered out in more recent kernel driver versions. Therefore, this patch only applies to the latest kernel version, and it should be safe without affecting other cases by enabling the Raven GPU reset only on the S3 suspend abort case. From the Chrome kernel log indicating that the AMD IOMMUv2 driver is loaded, and with this patch triggering the GPU reset before the AMDGPU device reinitialization, it can effectively handle the S3 suspend abort resume problem on the Raven series.

Was the Raven GPU reset previously disabled due to the AMD IOMMUv2 driver? If so, based on the Chromebook's verification result, the Raven series GPU reset can probably be enabled with IOMMUv2 for other cases as well.

Thanks,
Prike
> >
> > >
> > > Signed-off-by: Prike Liang <Prike.Liang@....com>
> > > Acked-by: Alex Deucher <alexander.deucher@....com>
> > > Signed-off-by: Alex Deucher <alexander.deucher@....com>
> > > Signed-off-by: Sasha Levin <sashal@...nel.org>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +++++++++++++++++----------
> ---
> > >   1 file changed, 25 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > index 6a3486f52d698..ef5b3eedc8615 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > @@ -605,11 +605,34 @@ soc15_asic_reset_method(struct
> amdgpu_device *adev)
> > >               return AMD_RESET_METHOD_MODE1;
> > >   }
> > >
> > > +static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > > +{
> > > +     u32 sol_reg;
> > > +
> > > +     sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > > +
> > > +     /* Will reset for the following suspend abort cases.
> > > +      * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > > +      * 2) S3 suspend abort and TOS already launched.
> > > +      */
> > > +     if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > > +                     !adev->suspend_complete &&
> > > +                     sol_reg)
> > > +             return true;
> > > +
> > > +     return false;
> > > +}
> > > +
> > >   static int soc15_asic_reset(struct amdgpu_device *adev)
> > >   {
> > >       /* original raven doesn't have full asic reset */
> > > -     if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > > -         (adev->apu_flags & AMD_APU_IS_RAVEN2))
> > > +     /* On the latest Raven, the GPU reset can be performed
> > > +      * successfully. So now, temporarily enable it for the
> > > +      * S3 suspend abort case.
> > > +      */
> > > +     if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > > +         (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
> > > +             !soc15_need_reset_on_resume(adev))
> > >               return 0;
> > >
> > >       switch (soc15_asic_reset_method(adev)) { @@ -1490,24 +1513,6
> > > @@ static int soc15_common_suspend(void *handle)
> > >       return soc15_common_hw_fini(adev);
> > >   }
> > >
> > > -static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > > -{
> > > -     u32 sol_reg;
> > > -
> > > -     sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > > -
> > > -     /* Will reset for the following suspend abort cases.
> > > -      * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > > -      * 2) S3 suspend abort and TOS already launched.
> > > -      */
> > > -     if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > > -                     !adev->suspend_complete &&
> > > -                     sol_reg)
> > > -             return true;
> > > -
> > > -     return false;
> > > -}
> > > -
> > >   static int soc15_common_resume(void *handle)
> > >   {
> > >       struct amdgpu_device *adev = (struct amdgpu_device *)handle;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ