lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Feb 2018 14:58:13 +0530
From:   George Cherian <gcherian@...iumnetworks.com>
To:     Bjorn Helgaas <helgaas@...nel.org>, Lukas Wunner <lukas@...ner.de>
Cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        George Cherian <george.cherian@...ium.com>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        bhelgaas@...gle.com, Jayachandran.Nair@...ium.com,
        Robert.Richter@...ium.com,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Huang Ying <ying.huang@...el.com>
Subject: Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

Hi Bjorn,


On 02/21/2018 12:30 AM, Bjorn Helgaas wrote:
> [+cc Huang]
> 
> On Tue, Feb 20, 2018 at 02:54:33AM +0100, Lukas Wunner wrote:
>> On Mon, Feb 19, 2018 at 12:21:56PM +0100, Rafael J. Wysocki wrote:
>>> On Friday, February 16, 2018 9:34:34 PM CET Bjorn Helgaas wrote:
>>>> On Fri, Feb 16, 2018 at 01:40:37PM +0100, Rafael J. Wysocki wrote:
>>>>> On Friday, February 16, 2018 12:39:00 AM CET Bjorn Helgaas wrote:
>>>>>> On Thu, Feb 15, 2018 at 10:57:25PM +0100, Rafael J. Wysocki wrote:
>>>>>>> On Wednesday, February 14, 2018 9:16:53 PM CET Bjorn Helgaas wrote:
>>>>>>>> I don't know how this runtime PM works, but maybe Rafael can help
>>>>>>>> us out.
>>
>> This has nothing to do with runtime PM AFAICS.
>>
>> The device seems to be in D3hot on boot, is that correct?
> 
> I actually don't know for sure what device caused the issue or what
> state it was in.  George, if you could clarify that, it would be
> helpful.
> 
I will explain the setup used
To the Cavium ThunderX RC the following PLX device is connected.
PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 
GT/s) Switch
There is no device connected downstream to the PLX switch.

AFAIU the pcie_port driver probes PLX and enters autosuspend after 100ms 
since pci_bridge_d3_possible() returns true.

And later pci_sysfs_init() ends up doing a config access of PLX which 
fails with a "synchronous external abort"

> My assumption (possibly incorrect) is that the issue happens when we
> read config space of an endpoint behind a PCIe bridge that is in D3.
> The PCIe portdrv binds to the bridge before the pci_sysfs_init()
> late_initcall happens, and it enables runtime PM on the bridge, so
> it's possible the bridge is in D3 before pci_sysfs_init() happens.
> 
>> The PCI core assumes that unbound devices remain in D0
>> (see comments in pci_pm_runtime_resume() / pci_pm_runtime_suspend()).
> 
> That comment was added by 967577b06241 ("PCI/PM: Keep runtime PM
> enabled for unbound PCI devices"), apparently because "some devices do
> not work again after being put into D3".  That commit mentions
> https://bugzilla.kernel.org/show_bug.cgi?id=48201 but I don't think
> it's completely convincing about the requirement for keeping things in
> D0.
> 
> Even if it's true that some devices don't work after being in D3, that
> doesn't seem like sufficient reason to keep *all* devices on all
> systems in D0.  We would normally use quirks to address device issues
> like this.
> 
>>>>>>> I'm not sure what the question is to be honest.
>>>>>>
>>>>>> My questions are basically "What does the PCI core need to do to make
>>>>>> sure a device is in D0 before it operates on it?  And where do we need
>>>>>> to do that?"
>>
>> When scanning the bus and discovering the device is not in D0,
>> call pci_power_up().  This could probably go into pci_init_pm().
>> Once a driver binds to it, it may choose to runtime suspend it to
>> D3hot again.
> 
> s/pci_init_pm/pci_pm_init/
> 
> That's an interesting idea.  I hadn't thought of the case where a
> device is not in D0 when we enumerate it.  We *do* make sure bridges
> are in D0 before scanning behind them (pci_scan_bridge_extend() calls
> pm_runtime_get_sync()).
> 
> Since enumeration should only perform config accesses, and devices
> must respond to config accesses even when in D3hot, I guess I'm not
> convinced that we need to power up everything (other than bridges)
> during enumeration.
> 
> Bjorn
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ