lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 1 Oct 2021 14:29:08 -0600
From:   Logan Gunthorpe <logang@...tatee.com>
To:     Bjorn Helgaas <helgaas@...nel.org>, kelvin.cao@...rochip.com
Cc:     kurt.schwemmer@...rosemi.com, bhelgaas@...gle.com,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        kelvincao@...look.com
Subject: Re: [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS
 access



On 2021-10-01 2:18 p.m., Bjorn Helgaas wrote:
> On Fri, Sep 24, 2021 at 11:08:38AM +0000, kelvin.cao@...rochip.com wrote:
>> From: Kelvin Cao <kelvin.cao@...rochip.com>
>>
>> After a firmware hard reset, MRPC command executions, which are based
>> on the PCI BAR (which Microchip refers to as GAS) read/write, will hang
>> indefinitely. This is because after a reset, the host will fail all GAS
>> reads (get all 1s), in which case the driver won't get a valid MRPC
>> status.
> 
> Trying to write a merge commit log for this, but having a hard time
> summarizing it.  It sounds like it covers both Switchtec-specific
> (firmware and MRPC commands) and generic PCIe behavior (MMIO read
> failures).
> 
> This has something to do with a firmware hard reset.  What is that?
> Is that like a firmware reboot?  A device reset, e.g., FLR or
> secondary bus reset, that causes a firmware reboot?  A device reset
> initiated by firmware?
> 
> Anyway, apparently when that happens, MMIO reads to the switch fail
> (timeout or error completion on PCIe) for a while.  If a device reset
> is involved, that much is standard PCIe behavior.  And the driver sees
> ~0 data from those failed reads.  That's not part of the PCIe spec,
> but is typical root complex behavior.
> 
> But you said the MRPC commands hang indefinitely.  Presumably MMIO
> reads would start succeeding eventually when the device becomes ready,
> so I don't know how that translates to "indefinitely."

I suspect Kelvin can expand on this and fix the issue below. But in my
experience, the MMIO will read ~0 forever after a firmware reset, until
the system is rebooted. Presumably on systems that have good hot plug
support they are supposed to recover. Though I've never seen that.

The MMIO read that signals the MRPC status always returns ~0 and the
userspace request will eventually time out.

> Weird to refer to a PCI BAR as "GAS".  Maybe expanding the acronym
> would help it make sense.
GAS is the term used by the firmware developers and is in all their
documentation. It stands for Global Address Space.

> What does "host" refer to?  I guess it's the switch (the
> switchtec_dev), since you say it fails MMIO reads?

Yes, a bit confusing. The firmware is dead or not setup right so MMIO
reads are not succeeding and the root complex is returning ~0 to the
driver on reads.

Logan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ