linux-kernel - Re: [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e7654b51f2a28e033200c6de9c0a2d9c53c646d3.camel@microchip.com>
Date:   Wed, 6 Oct 2021 00:37:02 +0000
From:   <Kelvin.Cao@...rochip.com>
To:     <helgaas@...nel.org>
CC:     <kurt.schwemmer@...rosemi.com>, <bhelgaas@...gle.com>,
        <linux-pci@...r.kernel.org>, <kelvincao@...look.com>,
        <logang@...tatee.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS
 access

On Tue, 2021-10-05 at 15:11 -0500, Bjorn Helgaas wrote:
> On Mon, Oct 04, 2021 at 08:51:06PM +0000, Kelvin.Cao@...rochip.com
> wrote:
> > On Sat, 2021-10-02 at 10:11 -0500, Bjorn Helgaas wrote:
> > > On Fri, Oct 01, 2021 at 11:49:18PM +0000, 
> > > Kelvin.Cao@...rochip.com
> > > wrote:
> > > > On Fri, 2021-10-01 at 14:29 -0600, Logan Gunthorpe wrote:
> > > > > EXTERNAL EMAIL: Do not click links or open attachments unless
> > > > > you
> > > > > know the content is safe
> > > > > 
> > > > > On 2021-10-01 2:18 p.m., Bjorn Helgaas wrote:
> > > > > > On Fri, Sep 24, 2021 at 11:08:38AM +0000,
> > > > > > kelvin.cao@...rochip.com
> > > > > > wrote:
> > > > > > > From: Kelvin Cao <kelvin.cao@...rochip.com>
> > > > > > > 
> > > > > > > After a firmware hard reset, MRPC command executions,
> > > > > > > which are based on the PCI BAR (which Microchip refers to
> > > > > > > as GAS) read/write, will hang indefinitely. This is
> > > > > > > because after a reset, the host will fail all GAS reads
> > > > > > > (get all 1s), in which case the driver won't get a valid
> > > > > > > MRPC status.
> > > > > > 
> > > > > > Trying to write a merge commit log for this, but having a
> > > > > > hard time summarizing it.  It sounds like it covers both
> > > > > > Switchtec-specific (firmware and MRPC commands) and generic
> > > > > > PCIe behavior (MMIO read failures).
> > > > > > 
> > > > > > This has something to do with a firmware hard reset.  What
> > > > > > is that?  Is that like a firmware reboot?  A device reset,
> > > > > > e.g., FLR or secondary bus reset, that causes a firmware
> > > > > > reboot?  A device reset initiated by firmware?
> > > > 
> > > > A firmware reset can be triggered by a reset command issued to
> > > > the firmware to reboot it.
> > > 
> > > So I guess this reset command was issued by the driver?
> > 
> > Yes, the reset command can be issued by a userspace utility to the
> > firmware via the driver. In some other cases, user can also issue
> > the reset command, via a sideband interface (like UART), to the
> > firmware.
> 
> OK, thanks.  That means CRS is not a factor here because this is not
> an FLR or similar reset.
> 
> > > > > > Anyway, apparently when that happens, MMIO reads to the
> > > > > > switch
> > > > > > fail (timeout or error completion on PCIe) for a while.  If
> > > > > > a
> > > > > > device reset is involved, that much is standard PCIe
> > > > > > behavior.
> > > > > > And the driver sees ~0 data from those failed
> > > > > > reads.  That's
> > > > > > not
> > > > > > part of the PCIe spec, but is typical root complex
> > > > > > behavior.
> > > > > > 
> > > > > > But you said the MRPC commands hang
> > > > > > indefinitely.  Presumably
> > > > > > MMIO reads would start succeeding eventually when the
> > > > > > device
> > > > > > becomes ready, so I don't know how that translates to
> > > > > > "indefinitely."
> > > > > 
> > > > > I suspect Kelvin can expand on this and fix the issue below.
> > > > > But
> > > > > in my experience, the MMIO will read ~0 forever after a
> > > > > firmware
> > > > > reset, until the system is rebooted. Presumably on systems
> > > > > that
> > > > > have good hot plug support they are supposed to recover.
> > > > > Though
> > > > > I've never seen that.
> > > > 
> > > > This is also my observation, all MMIO read will fail (~0
> > > > returned)
> > > > until the system is rebooted or a PCI rescan is performed.
> > > 
> > > This made sense until you said MMIO reads would start working
> > > after a
> > > PCI rescan.  A rescan doesn't really do anything special other
> > > than
> > > doing config accesses to the device.  Two things come to mind:
> > > 
> > > 1) Rescan does a config read of the Vendor ID, and devices may
> > > respond with "Configuration Request Retry Status" if they are not
> > > ready.  In this event, Linux retries this for a while.  This
> > > scenario
> > > doesn't quite fit because it sounds like this is a device-
> > > specific
> > > reset initiated by the driver, and CRS is not permited in this
> > > case.
> > > PCIe r5.0, sec 2.3.1, says:
> > > 
> > >   A device Function is explicitly not permitted to return CRS
> > >   following a software-initiated reset (other than an FLR) of the
> > >   device, e.g., by the device's software driver writing to a
> > >   device-specific reset bit.
> > > 
> > > 2) The device may lose its bus and device number configuration
> > > after a reset, so it must capture bus and device numbers from
> > > config writes.  I don't think Linux does this explicitly, but a
> > > rescan does do config writes, which could accidentally fix
> > > something (PCIe r5.0, sec 2.2.9).
> > 
> > Thanks Bjorn. It makes perfect sense!
> > > > > The MMIO read that signals the MRPC status always returns ~0
> > > > > and the userspace request will eventually time out.
> > > > 
> > > > The problem in this case is that, in DMA MRPC mode, the status
> > > > (in host memory) is always initialized to 'in progress', and
> > > > it's up to the firmware to update it to 'done' after the
> > > > command
> > > > is executed in the firmware. After a firmware reset is
> > > > performed, the firmware cannot be triggered to start a MRPC
> > > > command, therefore the status in host memory remains 'in
> > > > progress' in the driver, which prevents a MRPC from timing out.
> > > > I should have included this in the message.
> > > 
> > > I *thought* the problem was that the PCIe Memory Read failed and
> > > the Root Complex fabricated ~0 data to complete the CPU
> > > read.  But
> > > now I'm not sure, because it sounds like it might be that the
> > > PCIe
> > > transaction succeeds, but it reads data that hasn't been updated
> > > by the firmware, i.e., it reads 'in progress' because firmware
> > > hasn't updated it to 'done'.
> > 
> > The original message was sort of misleading. After a firmware
> > reset,
> > CPU getting ~0 for the PCIe Memory Read doesn't explain the hang.
> > In
> > a MRPC execution (DMA MRPC mode), the MRPC status which is located
> > in the host memory, gets initialized by the CPU and
> > updated/finalized by the firmware. In the situation of a firmware
> > reset, any MRPC initiated afterwards will not get the status
> > updated
> > by the firmware per the reason you pointed out above (or similar,
> > to
> > my understanding, firmware can no longer DMA data to host memory in
> > such cases), therefore the MRPC execution will never end.
> 
> I'm glad this makes sense to you, because it still doesn't to me.
> 
> check_access() does an MMIO read to something in BAR0.  If that read
> returns ~0, it means either the PCIe Memory Read was successful and
> the Switchtec device supplied ~0 data (maybe because firmware has not
> initialized that part of the BAR) or the PCIe Memory Read failed and
> the root complex fabricated the ~0 data.
> 
> I'd like to know which one is happening so we can clarify the commit
> log text about "MRPC command executions hang indefinitely" and "host
> wil fail all GAS reads."  It's not clear whether these are PCIe
> protocol issues or driver/firmware interaction issues.
> 
> Bjorn

I think it's the latter case, the ~0 data was fabricated by the root
complex, as the MMIO read in check_access() always returns ~0 until a
reboot or a rescan happens. 

Kelvin