linux-kernel - Re: [PATCH v5 5/5] PCI: Work around PCIe link training failures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221109131632.6a059bd9.alex.williamson@redhat.com>
Date:   Wed, 9 Nov 2022 13:16:32 -0700
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     "Maciej W. Rozycki" <macro@...am.me.uk>,
        Pali Rohár <pali@...nel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>, Stefan Roese <sr@...x.de>,
        Jim Wilson <wilson@...iptree.org>,
        David Abdurachmanov <david.abdurachmanov@...il.com>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 5/5] PCI: Work around PCIe link training failures

On Tue, 8 Nov 2022 23:04:18 -0600
Bjorn Helgaas <helgaas@...nel.org> wrote:

> [+cc Alex, in case he has any reset-related comments]
> 
> On Wed, Nov 09, 2022 at 02:57:57AM +0000, Maciej W. Rozycki wrote:
> > Hi Bjorn,
> > 
> >  Thank you for coming back to this patch series.  I'll try to address your 
> > concerns, but it may take a little.  The reason is I'm currently on site 
> > at my lab until the end of the week and barring my day job, etc. I want to 
> > focus on items to do (and I do have a bunch) that require local hardware 
> > access.  The issue concerned with this patch series does not, so I'll get 
> > to looking into it in more depth hopefully from next week.  For the time 
> > being however please see below.
> > 
> > On Thu, 3 Nov 2022, Bjorn Helgaas wrote:
> >   
> > > > > > Also check for a 2.5GT/s speed restriction the firmware may have already 
> > > > > > arranged and lift it too with ports of devices known to continue working 
> > > > > > afterwards, currently the ASM2824 only, that already report their data 
> > > > > > link being up.  
> > > > > 
> > > > > This quirk is run at boot-time and resume-time.  What happens after a
> > > > > Secondary Bus Reset, as is done by pci_reset_secondary_bus()?  
> > > > 
> > > > Flipping SBR bit can be done on any PCI-to-PCI bridge device and in this
> > > > topology there are following: PCIe Root Port, ASMedia PCIe Switch
> > > > Upstream Port, ASMedia PCIe Switch Downstream Port, Pericom PCIe Switch
> > > > Upstream Port, Pericom PCIe Switch Downstream Port.
> > > > (Maciej, I hope that this is whole topology and there is not some other
> > > > device of PCI-to-PCI bridge type in your setup; please correct me)  
> > 
> >  There is actually a PCIe-to-PCI bridge device further downstream (device 
> > 0000:08:00.0 in the listings below; bus 09 is conventional PCI), but it 
> > doesn't matter for the issue concerned; the issue triggers whether the 
> > bridge module has been plugged or not.
> >   
> > > > Bjorn, to make it clear, on which device you mean to issue secondary bus
> > > > reset?  
> > > 
> > > IIUC, the problem is observed on the link between the ASM2824
> > > downstream port and the PI7C9X2G304 upstream port, so my question is
> > > about asserting SBR on the ASM2824 downstream port.  I think that
> > > should cause the link between ASM2824 and PI7C9X2G304 to go down and
> > > back up.  
> > 
> >  That would be my expectation as well.  Is there a reliable way to request
> > that however without actually writing a piece of code to do so from inside 
> > the kernel?  Sadly our documentation is vague on the matter, in particular 
> > Documentation/ABI/testing/sysfs-bus-pci, but here's what I have obtained:
> > 
> > # lspci -t
> > -[0000:00]---00.0-[01-0b]----00.0-[02-0b]--+-00.0-[03]--
> >                                            +-02.0-[04]----00.0
> >                                            +-03.0-[05-09]----00.0-[06-09]--+-01.0-[07]--+-00.0
> >                                            |                               |            \-00.3
> >                                            |                               \-02.0-[08-09]----00.0-[09]--+-01.0
> >                                            |                                                            \-02.0
> >                                            +-04.0-[0a]----00.0
> >                                            \-08.0-[0b]--+-00.0
> >                                                         \-00.1
> > # for name in /sys/bus/pci/devices/0000\:??\:??.?/reset_method; do echo "$(basename $(dirname $name)): $(cat $name)"; done
> > 0000:01:00.0: pm bus
> > 0000:02:00.0: pm bus
> > 0000:02:02.0: pm
> > 0000:02:03.0: pm
> > 0000:02:04.0: pm
> > 0000:02:08.0: pm
> > 0000:04:00.0: bus
> > 0000:05:00.0: bus
> > 0000:06:01.0: bus
> > 0000:07:00.0: bus
> > 0000:08:00.0: bus
> > 0000:09:01.0: pm bus
> > 0000:0a:00.0: flr bus
> > 0000:0b:00.0: pm bus
> > 0000:0b:00.1: pm
> > # 
> > 
> > (mind that the problematic link is between 0000:02:03.0 and 0000:05:00.0), 
> > and then:
> > 
> > # echo 1 >/sys/bus/pci/devices/0000\:05\:00.0/reset
> > -sh: echo: write error: Inappropriate ioctl for device
> > # 
> > 
> > (which I gather is supposed to poke at 0000:02:03.0's SBR) so it doesn't 
> > seem to be effective.    
> 
> 05:00.0 supports the "bus" method, i.e., pci_reset_bus_function(),
> which tries pci_dev_reset_slot_function() followed by
> pci_parent_bus_reset().  Both of them return -ENOTTY if the device
> (05:00.0) has a secondary bus ("dev->subordinate"), so I think nothing
> happens here.

Right, the pci-sysfs reset attribute is only meant for a reset scope
limited to the device, we'd need something to call pci_reset_bus() to
commit to the whole hierarchy, which is not something we typically do.
vfio-pci will only bind to endpoint devices, so it shouldn't provide an
interface to inject a bus reset here either.

Based on the fact that there's a pericom switch in play here, I'll just
note that I think this is the same device with other link speed issues
as well:

https://lore.kernel.org/all/20161026180140.23495.27388.stgit@gimli.home/

This fell off my plate some time ago, but as noted there, enabling ACS
when the upstream and downstream ports run at different link rates
exposes errata where packets are queued and not delivered within the
switch.

Could enabling ACS on this device be contributing to the issue here,
for example triggering the Asmedia downstream port to get into this
link reseting issue?  A test with
pci=disable_acs_redir=0000:06:01.0;0000:06:02.0 could be interesting
assuming this occurs on an platform that has an IOMMU, ie. calls
pci_request_acs().  Thanks,

Alex