lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <302b3ac9-b34f-c061-8c59-6b3b0c56b1f7@molgen.mpg.de>
Date:   Tue, 8 May 2018 15:22:54 +0200
From:   Paul Menzel <pmenzel+linux-pci@...gen.mpg.de>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     okaya@...eaurora.org, Bjorn Helgaas <bhelgaas@...gle.com>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        Lukas Wunner <lukas@...ner.de>
Subject: Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038
 (issued 65284 msec ago)

Dear Bjorn,


Am 08.05.2018 um 14:34 schrieb Bjorn Helgaas:
> On Tue, May 08, 2018 at 08:59:34AM +0200, Paul Menzel wrote:

>> Am 07.05.2018 um 23:33 schrieb Bjorn Helgaas:
>>> On Fri, May 04, 2018 at 08:33:27AM -0500, Bjorn Helgaas wrote:
>>>> commit b0d6f2230e12c85ae3b65a854a53c67c7c1f6406
>>>> Author: Bjorn Helgaas <bhelgaas@...gle.com>
>>>> Date:   Thu May 3 18:39:38 2018 -0500
>>>>
>>>>       PCI: pciehp: Add quirk for Intel Command Completed erratum
>>>>       The Intel CF118 erratum means the controller does not set the Command
>>>>       Completed bit unless writes to the Slot Command register change "Control"
>>>>       bits.  Command Completed is never set for writes that only change software
>>>>       notification "Enable" bits.  This results in timeouts like this:
>>>>         pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
>>>>       When this erratum is present, avoid these timeouts by marking commands
>>>>       "completed" immediately unless they change the "Control" bits.
>>>>       Here's the text of the erratum from the Intel document:
>>>>         CF118        PCIe Slot Status Register Command Completed bit not always
>>>>                      updated on any configuration write to the Slot Control
>>>>                      Register
>>>>         Problem:     For PCIe root ports (devices 0 - 10) supporting hot-plug,
>>>>                      the Slot Status Register (offset AAh) Command Completed
>>>>                      (bit[4]) status is updated under the following condition:
>>>>                      IOH will set Command Completed bit after delivering the new
>>>>                      commands written in the Slot Controller register (offset
>>>>                      A8h) to VPP. The IOH detects new commands written in Slot
>>>>                      Control register by checking the change of value for Power
>>>>                      Controller Control (bit[10]), Power Indicator Control
>>>>                      (bits[9:8]), Attention Indicator Control (bits[7:6]), or
>>>>                      Electromechanical Interlock Control (bit[11]) fields. Any
>>>>                      other configuration writes to the Slot Control register
>>>>                      without changing the values of these fields will not cause
>>>>                      Command Completed bit to be set.
>>>>                      The PCIe Base Specification Revision 2.0 or later describes
>>>>                      the “Slot Control Register” in section 7.8.10, as follows
>>>>                      (Reference section 7.8.10, Slot Control Register, Offset
>>>>                      18h). In hot-plug capable Downstream Ports, a write to the
>>>>                      Slot Control register must cause a hot-plug command to be
>>>>                      generated (see Section 6.7.3.2 for details on hot-plug
>>>>                      commands). A write to the Slot Control register in a
>>>>                      Downstream Port that is not hotplug capable must not cause a
>>>>                      hot-plug command to be executed.
>>>>                      The PCIe Spec intended that every write to the Slot Control
>>>>                      Register is a command and expected a command complete status
>>>>                      to abstract the VPP implementation specific nuances from the
>>>>                      OS software. IOH PCIe Slot Control Register implementation
>>>>                      is not fully conforming to the PCIe Specification in this
>>>>                      respect.
>>>>         Implication: Software checking on the Command Completed status after
>>>>                      writing to the Slot Control register may time out.
>>>>         Workaround:  Software can read the Slot Control register and compare the
>>>>                      existing and new values to determine if it should check the
>>>>                      Command Completed status after writing to the Slot Control
>>>>                      register.
>>>>       Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
>>>>       Link: https://lkml.kernel.org/r/8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de
>>>>       Reported-by: Paul Menzel <pmenzel+linux-pci@...gen.mpg.de>
>>>>       Signed-off-by: Bjorn Helgaas <bhelgaas@...gle.com>
>>>
>>> I applied this with Paul's tested-by on pci/hotplug for v4.18.
>>
>> Thank you very much. Will this also be picked up by the stable Linux kernel
>> series?
> 
> I did not tag it for stable because I didn't think it was a serious enough
> problem, based on this from Documentation/process/stable-kernel-rules.rst:
> 
>   - It must fix a problem that causes a build error (but not for things
>     marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
>     security issue, or some "oh, that's not good" issue.  In short, something
>     critical.
> 
> I know I'm on the conservative end of the stable-tagging spectrum, so maybe
> I could be convinced to add a stable tag.
> 
> My impression was that this bug caused annoying messages and annoying
> delays of a couple seconds during shutdown and resume.  Is it more serious
> than that?

No, not more then that. But “oh, that’s not good” fits in my opinion. My 
impression was, that’s how most stable patches get in.


Kind regards,

Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ