lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 5 May 2021 14:15:01 +0200
From:   Pali Rohár <pali@...nel.org>
To:     Shanker R Donthineni <sdonthineni@...dia.com>
Cc:     Bjorn Helgaas <helgaas@...nel.org>,
        Alex Williamson <alex.williamson@...hat.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org, Sinan Kaya <okaya@...nel.org>,
        Vikram Sethi <vsethi@...dia.com>,
        Amey Narkhede <ameynarkhede03@...il.com>
Subject: Re: [PATCH v4 2/2] PCI: Enable NO_BUS_RESET quirk for Nvidia GPUs

On Friday 30 April 2021 17:11:23 Shanker R Donthineni wrote:
> Thanks Bjorn for reviewing patch.
> 
> On 4/30/21 12:01 PM, Bjorn Helgaas wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Apr 28, 2021 at 07:49:07PM -0500, Shanker Donthineni wrote:
> >> On select platforms, some Nvidia GPU devices do not work with SBR.
> >> Triggering SBR would leave the device inoperable for the current
> >> system boot. It requires a system hard-reboot to get the GPU device
> >> back to normal operating condition post-SBR. For the affected
> >> devices, enable NO_BUS_RESET quirk to fix the issue.
> > Since 1/2 adds _RST support, should I infer that _RST works on these
> > Nvidia GPUs even though SBR does not?  If so, how does _RST do the
> > reset?
> Yes, _RST method works but not SBR. The _RST method in DSDT-AML uses
> platform-specific initialization steps outside of the GPU BARs for resetting
> the GPU device.

Hello! If I understood this "reset" issue correctly, it means that
affected PCIe GPU device cannot be reset via PCI Secondary Bus Reset
(PCIe Warm Reset) and some special, platform specific reset type needs
to be issued.

And code for this platform specific reset is included in ACPI DSDT
table.

But because ACPI DSDT table is part of BIOS/firmware and not part of the
PCIe GPU device itself, it means that this kind of reset is available to
linux kernel only in the case when vendor of motherboard (or who burn
BIOS/firmware into motherboard EEPROM) includes this specific code into
HW. Am I Right?

So if this PCIe GPU device is connected to other motherboard or other
system then this special platform reset in ACPI DSDT is not available.

What is doing default APCI _RST() method on motherboards without this
special platform reset hook? It probably would not be able to reset
these PCIe GPU devices if standard SBR cannot reset them.

Would not be better to include for these PCIe devices "native" linux
code for resetting them?

Please correct me if I'm wrong in my assumption or if I understood this
issue incorrectly.

> > Do you have a root cause for why SBR doesn't work?  
> It is a hardware implementation specific issue. GPU end-point device
> is inoperative after receiving SBR from the RP/SwitchPort. This quirk is
> to prevent SBR.
> 
> > I'm not super
> > confident that we perform resets correctly in general, and if the
> > problem is an issue in Linux, it'd be nice to fix that.
> We have not seen any issue with Linux SBR implementation.
> >
> >> This issue will be fixed in the next generation of hardware.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ