lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180221232040.GA52685@bhelgaas-glaptop.roam.corp.google.com>
Date:   Wed, 21 Feb 2018 17:20:40 -0600
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     George Cherian <gcherian@...iumnetworks.com>
Cc:     Lukas Wunner <lukas@...ner.de>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        bhelgaas@...gle.com, Jayachandran.Nair@...ium.com,
        Robert.Richter@...ium.com,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Huang Ying <ying.huang@...el.com>
Subject: Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

On Wed, Feb 21, 2018 at 04:25:08PM +0530, George Cherian wrote:
> On 02/21/2018 03:24 PM, Lukas Wunner wrote:
> > On Wed, Feb 21, 2018 at 02:58:13PM +0530, George Cherian wrote:
> > > I will explain the setup used
> > > To the Cavium ThunderX RC the following PLX device is connected.
> > > PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s)
> > > Switch
> > > There is no device connected downstream to the PLX switch.
> > > 
> > > AFAIU the pcie_port driver probes PLX and enters autosuspend after 100ms
> > > since pci_bridge_d3_possible() returns true.
> > > 
> > > And later pci_sysfs_init() ends up doing a config access of PLX which fails
> > > with a "synchronous external abort"

Thanks for the details!

This one *should* be fixed by this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization&id=bf6c089ee2ac67eb22c0ff0ac9cc7f9ccd619d90

Any chance you could try that out?

> > Then you're missing a pci_config_pm_runtime_get() in pci_sysfs_init() or
> > further down in the call stack, rather than a quirk which just papers
> > over the issue.
> 
> I have found another configuration where this fails.
> Following is the configuration
> 1) Connected a PCIe Intel i40 card under the root port.
> 2) unbind the i40 driver and bind with vfio-pci driver.
> 3) Run lspci in a loop. "lspci -s xx:xx.xx -vvv"
> 
> I get the same synchronous external abort.
> In this case the vfio-pci driver probe it moves the device (i40) to
> D3hot provided disable_idle_d3 is not set. lspci tries to do
> the config_access which fails with synchronous external abort when
> the root port transitions to D3hot.

This one sounds like we're missing something in this path:

  pci_read_config
    pci_config_pm_runtime_get
      if (parent)
        pm_runtime_get_sync
          __pm_runtime_resume(dev, RPM_GET_PUT)
            rpm_resume

It *looks* like rpm_resume() should resume parent devices, i.e., the
root port, but I don't know that code at all.  Maybe Rafael or Lukas
could confirm that?

pci_config_pm_runtime_get() knows that config space is always
accessible unless the device is in D3cold, so if the target device is
in D3hot, it will leave it there.  I assume that if/when rpm_resume()
resumes the parent bridges, it will resume them all the way to D0.

I'm *really* glad you're finding these issues, because on most
platforms we would just silently read invalid data (all ones) and the
caller would have no idea what's going wrong.

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ