linux-kernel - Re: [PATCH v2] PCI: Fix no-op wait after secondary bus reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGdb+H00q3xhCfw-x+DG624sMuuKqaRwRpPWDJCYs2iLsBCyVw@mail.gmail.com>
Date:   Fri, 20 May 2022 11:00:42 +0800
From:   windy Bi <windy.bi.enflame@...il.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     Bjorn Helgaas <helgaas@...nel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Lukas Wunner <lukas@...ner.de>, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] PCI: Fix no-op wait after secondary bus reset

On Fri, May 20, 2022 at 1:06 AM Alex Williamson
<alex.williamson@...hat.com> wrote:
>
> On Wed, 18 May 2022 19:54:32 +0800
> Sheng Bi <windy.bi.enflame@...il.com> wrote:
>
> > pci_bridge_secondary_bus_reset() triggers SBR followed by 1 second sleep,
> > and then uses pci_dev_wait() for waiting device ready. The dev parameter
> > passes to the wait function is currently the bridge itself, but not the
> > device been reset.
> >
> > If we call pci_bridge_secondary_bus_reset() to trigger SBR to a device,
> > there is 1 second sleep but not waiting device ready, since the bridge
> > is always ready while resetting downstream devices. pci_dev_wait() here
> > is a no-op actually. This would be risky in the case which the device
> > becomes ready after more than 1 second, especially while hotplug enabled.
> > The late coming hotplug event after 1 second will trigger hotplug module
> > to remove/re-insert the device.
> >
> > Instead of waiting ready of bridge itself, changing to wait all the
> > downstream devices become ready with timeout PCIE_RESET_READY_POLL_MS
> > after SBR, considering all downstream devices are affected during SBR.
> > Once one of the devices doesn't reappear within the timeout, return
> > -ENOTTY to indicate SBR doesn't complete successfully.
> >
> > Fixes: 6b2f1351af56 ("PCI: Wait for device to become ready after secondary bus reset")
> > Signed-off-by: Sheng Bi <windy.bi.enflame@...il.com>
> > ---
> >  drivers/pci/pci.c | 30 +++++++++++++++++++++++++++++-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index eb7c0a08ff57..32b7a5c1fa3a 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5049,6 +5049,34 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
> >       }
> >  }
> >
> > +static int pci_bridge_secondary_bus_wait(struct pci_dev *bridge, int timeout)
> > +{
> > +     struct pci_dev *dev;
> > +     int delay = 0;
> > +
> > +     if (!bridge->subordinate || list_empty(&bridge->subordinate->devices))
> > +             return 0;
> > +
> > +     list_for_each_entry(dev, &bridge->subordinate->devices, bus_list) {
> > +             while (!pci_device_is_present(dev)) {
> > +                     if (delay > timeout) {
> > +                             pci_warn(dev, "not ready %dms after secondary bus reset; giving up\n",
> > +                                     delay);
> > +                             return -ENOTTY;
> > +                     }
> > +
> > +                     msleep(20);
> > +                     delay += 20;
>
> Your previous version used the same exponential back-off as used in
> pci_dev_wait(), why the change here to poll at 20ms intervals?  Thanks,
>
> Alex

Many thanks for your time. The change is to get a more accurate
timeout, to align with
previous statement "we shouldn't incur any extra delay once timeout has passed".
Previous binary exponential back-off incurred probable unexpected
extra delay, like
60,000 ms timeout but actual 65,535 ms, and the difference probably
goes worse by
timeout setting changes. Thanks,

windy

>
> > +             }
> > +
> > +             if (delay > 1000)
> > +                     pci_info(dev, "ready %dms after secondary bus reset\n",
> > +                             delay);
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> >  void pci_reset_secondary_bus(struct pci_dev *dev)
> >  {
> >       u16 ctrl;
> > @@ -5092,7 +5120,7 @@ int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
> >  {
> >       pcibios_reset_secondary_bus(dev);
> >
> > -     return pci_dev_wait(dev, "bus reset", PCIE_RESET_READY_POLL_MS);
> > +     return pci_bridge_secondary_bus_wait(dev, PCIE_RESET_READY_POLL_MS);
> >  }
> >  EXPORT_SYMBOL_GPL(pci_bridge_secondary_bus_reset);
> >
> >
> > base-commit: 617c8a1e527fadaaec3ba5bafceae7a922ebef7e
>