[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN6PR12MB1809C529684D59E8DF3D1F0FF72E0@BN6PR12MB1809.namprd12.prod.outlook.com>
Date: Wed, 10 Apr 2019 14:46:37 +0000
From: "Deucher, Alexander" <Alexander.Deucher@....com>
To: Bjorn Helgaas <helgaas@...nel.org>,
Nikolai Kostrigin <nickel@...linux.org>,
"Suthikulpanit, Suravee" <Suravee.Suthikulpanit@....com>,
"Lendacky, Thomas" <Thomas.Lendacky@....com>,
"Kuehling, Felix" <Felix.Kuehling@....com>,
"Koenig, Christian" <Christian.Koenig@....com>
CC: "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jroedel@...e.de" <jroedel@...e.de>
Subject: RE: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD Radeon R7
GPUs
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@...nel.org>
> Sent: Tuesday, April 9, 2019 5:59 PM
> To: Nikolai Kostrigin <nickel@...linux.org>
> Cc: linux-pci@...r.kernel.org; linux-kernel@...r.kernel.org;
> jroedel@...e.de; Deucher, Alexander <Alexander.Deucher@....com>
> Subject: Re: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD Radeon
> R7 GPUs
>
> [+cc Alex]
>
> This claims to be a resend, but I don't see a previous posting.
>
> There *was* discussion when the quirk was added two years ago for a
> different device. As part of that, Alex thought only that device would be
> affected and ATS was validated on other GPUs:
>
>
> https://lore.kernel.org/lkml/BN6PR12MB165278346BE8A76B1E4412AFF7EA0
> @BN6PR12MB1652.namprd12.prod.outlook.com/
>
> On Mon, Apr 08, 2019 at 01:37:25PM +0300, Nikolai Kostrigin wrote:
> > ATS is broken on this hardware (at least for Stoney Ridge based
> > laptop) and causes IOMMU stalls and system failure. Disable ATS on
> > these devices to make them usable again with IOMMU enabled Thanks to
> > Joerg Roedel <jroedel@...e.de> for help.
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=194521
> >
+ a few AMD people
Seeing this bug makes it more clear. I don't think this is a problem with the GPU. I think it's a problem with either the sbios or iommu. I think the original quirk added for stoney (0x98e4) is probably wrong as well. I suspect we need a quirk for a particular laptop or sbios versions. We validated ATS extensively with Carrizo based systems (the system in the bug report above is Carrizo based) since it is the basis of our ROCm support on APUs. We have also been involved in tons of Linux OEM preloads with both Carrizo and Stoney based APUs in combination with TOPAZ dGPUs (0x6900) and haven't seen this issue in those programs. We also have TOPAZ dGPUs used in OEM programs with Intel chipsets and haven't seen the issue. I suspect since windows does not use the IOMMU by default, the sbios settings may not be well validated on certain windows only skus. I'd rather make these DMI matches or something like that for the platform or at the very least match the SSIDs as well.
Alex
> > Signed-off-by: Nikolai Kostrigin <nickel@...linux.org>
>
> Joerg, I'm happy to merge this if you would review or ack it. I don't know
> enough to conclude that this is the root cause. It'd be nice to have an actual
> AMD erratum. Maybe it would even have a list of affected devices so we
> could get them all at once so people wouldn't have to trip over them one by
> one.
>
> > ---
> > drivers/pci/quirks.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > 4700d24e5d55..abb2532e16bf 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -4876,6 +4876,7 @@ static void quirk_no_ats(struct pci_dev *pdev)
> >
> > /* AMD Stoney platform GPU */
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_no_ats);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900,
> quirk_no_ats);
> > #endif /* CONFIG_PCI_ATS */
> >
> > /* Freescale PCIe doesn't support MSI in RC mode */
> > --
> > 2.21.0
> >
Powered by blists - more mailing lists