lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180420140049.GP28657@bhelgaas-glaptop.roam.corp.google.com>
Date:   Fri, 20 Apr 2018 09:00:49 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Sinan Kaya <okaya@...eaurora.org>
Cc:     Jason Gunthorpe <jgg@...pe.ca>,
        Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
        sulrich@...eaurora.org, timur@...eaurora.org,
        linux-arm-msm@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        Mike Marciniszyn <mike.marciniszyn@...el.com>,
        Dennis Dalessandro <dennis.dalessandro@...el.com>,
        Doug Ledford <dledford@...hat.com>,
        "open list:HFI1 DRIVER" <linux-rdma@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        Alex Deucher <alexander.deucher@....com>,
        Rajat Jain <rajatja@...gle.com>,
        Alex Williamson <alex.williamson@...hat.com>
Subject: Re: [PATCH 1/2] IB/hfi1: Try slot reset before secondary bus reset

[+cc Rajat, Alex because of their interest in the reset/hotplug issue]

For context, Sinan's patch is this:

> diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
> index 83d66e8..75f49e3 100644
> --- a/drivers/infiniband/hw/hfi1/pcie.c
> +++ b/drivers/infiniband/hw/hfi1/pcie.c
> @@ -908,7 +908,8 @@ static int trigger_sbr(struct hfi1_devdata *dd)
>          * delay after a reset is required.  Per spec requirements,
>          * the link is either working or not after that point.
>          */
> -       pci_reset_bridge_secondary_bus(dev->bus->self);
> +       if (pci_reset_slot(dev->slot))
> +               pci_reset_bridge_secondary_bus(dev->bus->self);

On Thu, Apr 19, 2018 at 06:19:32PM -0400, Sinan Kaya wrote:
> On 4/19/2018 5:47 PM, Bjorn Helgaas wrote:
> >> Bjorn, would be appropriate to export pci_parent_bus_reset() or some
> >> variation therin??
> > I agree it would be really nice if the PCI core could help out somehow
> > so we could get some of this code out of individual drivers.

What I was really thinking here was about the whole Gen3 transition
thing, not the reset thing by itself.

> I can create a function called pci_reset_link() and move both slot and
> secondary bus reset inside.

What exactly is your patch fixing?  Is it the following?

  If the HFI link is not operating at 8GT/s, the driver's .probe()
  method tries to transition it to 8GT/s, which involves resetting the
  HFI device with pci_reset_bridge_secondary_bus().  If the HFI device
  is in a hotplug slot, the reset causes a "Link Down" event, which
  causes the pciehp driver to remove the HFI device and re-enumerate
  it when the link comes back up.

  When pciehp removes the device, it calls the HFI .remove() method,
  which is a problem because the .probe() method is still active.

  It looks like this should deadlock because __device_attach() holds
  the device_lock while calling .probe() and the
  device_release_driver() path tries to acquire it.

Your patch uses pci_reset_slot(), which connects with Rajat's work
(06a8d89af551 ("PCI: pciehp: Disable link notification across slot
reset")) to avoid hotplug events for intentional resets.

So I think I just reverse-engineered the whole rationale for your
patch :)  Sorry about the long detour.

I'm having a hard time articulating my thoughts here.  I think my
concern is that knowledge about this reset/link down/hotplug issue is
leaking out and we'll end up with different reset interfaces that may
or may not result in hotplug events.  This seems like a confusing API
because it's hard to explain which interface a driver should use.

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ