lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN6PR12MB18094BA9F4011E6698A56742F72E0@BN6PR12MB1809.namprd12.prod.outlook.com>
Date:   Wed, 10 Apr 2019 15:59:57 +0000
From:   "Deucher, Alexander" <Alexander.Deucher@....com>
To:     Bjorn Helgaas <helgaas@...nel.org>,
        Nikolai Kostrigin <nickel@...linux.org>,
        "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@....com>,
        "Lendacky, Thomas" <Thomas.Lendacky@....com>,
        "Kuehling, Felix" <Felix.Kuehling@....com>,
        "Koenig, Christian" <Christian.Koenig@....com>
CC:     "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jroedel@...e.de" <jroedel@...e.de>
Subject: RE: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD Radeon R7
 GPUs

> -----Original Message-----
> From: Deucher, Alexander
> Sent: Wednesday, April 10, 2019 10:47 AM
> To: Bjorn Helgaas <helgaas@...nel.org>; Nikolai Kostrigin
> <nickel@...linux.org>; Suthikulpanit, Suravee
> (Suravee.Suthikulpanit@....com) <Suravee.Suthikulpanit@....com>;
> Lendacky, Thomas <Thomas.Lendacky@....com>; Kuehling, Felix
> (Felix.Kuehling@....com) <Felix.Kuehling@....com>; Koenig, Christian
> (Christian.Koenig@....com) <Christian.Koenig@....com>
> Cc: linux-pci@...r.kernel.org; linux-kernel@...r.kernel.org;
> jroedel@...e.de
> Subject: RE: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD Radeon
> R7 GPUs
> 
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@...nel.org>
> > Sent: Tuesday, April 9, 2019 5:59 PM
> > To: Nikolai Kostrigin <nickel@...linux.org>
> > Cc: linux-pci@...r.kernel.org; linux-kernel@...r.kernel.org;
> > jroedel@...e.de; Deucher, Alexander <Alexander.Deucher@....com>
> > Subject: Re: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD
> > Radeon
> > R7 GPUs
> >
> > [+cc Alex]
> >
> > This claims to be a resend, but I don't see a previous posting.
> >
> > There *was* discussion when the quirk was added two years ago for a
> > different device.  As part of that, Alex thought only that device
> > would be affected and ATS was validated on other GPUs:
> >
> >
> >
> https://lore.kernel.org/lkml/BN6PR12MB165278346BE8A76B1E4412AFF7EA0
> > @BN6PR12MB1652.namprd12.prod.outlook.com/
> >
> > On Mon, Apr 08, 2019 at 01:37:25PM +0300, Nikolai Kostrigin wrote:
> > > ATS is broken on this hardware (at least for Stoney Ridge based
> > > laptop) and causes IOMMU stalls and system failure. Disable ATS on
> > > these devices to make them usable again with IOMMU enabled Thanks
> to
> > > Joerg Roedel <jroedel@...e.de> for help.
> > >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=194521
> > >
> 
> + a few AMD people
> 
> Seeing this bug makes it more clear.  I don't think this is a problem with the
> GPU.  I think it's a problem with either the sbios or iommu.  I think the original
> quirk added for stoney (0x98e4) is probably wrong as well.  I suspect we
> need a quirk for a particular laptop or sbios versions.  We validated ATS
> extensively with Carrizo based systems (the system in the bug report above
> is Carrizo based) since it is the basis of our ROCm support on APUs.  We have
> also been involved in tons of Linux OEM preloads with both Carrizo and
> Stoney based APUs in combination with TOPAZ dGPUs (0x6900) and haven't
> seen this issue in those programs.  We also have TOPAZ dGPUs used in OEM
> programs with Intel chipsets and haven't seen the issue.  I suspect since
> windows does not use the IOMMU by default, the sbios settings may not be
> well validated on certain windows only skus.  I'd rather make these DMI
> matches or something like that for the platform or at the very least match
> the SSIDs as well.

Reading through these bugs again it seems to be an issue with Stoney APUs, not the dGPU specifically.  I think it would be better to disable ATS in general if a stoney based platform was detected rather than adding ATS quirks for devices then someone may put in a Stoney based platform.  It also seems to be related to runtime pm on the dGPU.  Disabling runtime pm also seem to fix the issue.  On these systems runtime pm for the dGPU is controlled via ACPI (either ATPX or _PR3 depending on the platform).  Maybe something doesn't get restored properly on runtime resume which cases the ATS issues?

Alex

> 
> Alex
> 
> > > Signed-off-by: Nikolai Kostrigin <nickel@...linux.org>
> >
> > Joerg, I'm happy to merge this if you would review or ack it.  I don't
> > know enough to conclude that this is the root cause.  It'd be nice to
> > have an actual AMD erratum.  Maybe it would even have a list of
> > affected devices so we could get them all at once so people wouldn't
> > have to trip over them one by one.
> >
> > > ---
> > >  drivers/pci/quirks.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > > 4700d24e5d55..abb2532e16bf 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -4876,6 +4876,7 @@ static void quirk_no_ats(struct pci_dev *pdev)
> > >
> > >  /* AMD Stoney platform GPU */
> > >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_no_ats);
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900,
> > quirk_no_ats);
> > >  #endif /* CONFIG_PCI_ATS */
> > >
> > >  /* Freescale PCIe doesn't support MSI in RC mode */
> > > --
> > > 2.21.0
> > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ