netdev - RE: [net-next 5/5] PCI: disable FLR for 82579 device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <630A6B92B7EDEB45A87E20D3D286660153E14465@hasmsx109.ger.corp.intel.com>
Date:   Wed, 28 Sep 2016 15:33:52 +0000
From:   "Neftin, Sasha" <sasha.neftin@...el.com>
To:     Alexander Duyck <alexander.duyck@...il.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        "Duyck, Alexander H" <alexander.h.duyck@...el.com>,
        "Ruinskiy, Dima" <dima.ruinskiy@...el.com>,
        "Avargil, Raanan" <raanan.avargil@...el.com>,
        "Neftin, Sasha" <sasha.neftin@...el.com>
CC:     Bjorn Helgaas <helgaas@...nel.org>,
        "Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        "Bjorn Helgaas" <bhelgaas@...gle.com>,
        Netdev <netdev@...r.kernel.org>,
        Neil Horman <nhorman@...hat.com>,
        "sassmann@...hat.com" <sassmann@...hat.com>,
        "John Greene" <jogreene@...hat.com>,
        "guru.anbalagane@...cle.com" <guru.anbalagane@...cle.com>
Subject: RE: [net-next 5/5] PCI: disable FLR for 82579 device


Since I worked with Sasha on this I will provide a bit of information from what I understand of this bug as well.

On Tue, Sep 27, 2016 at 12:13 PM, Alex Williamson <alex.williamson@...hat.com> wrote:
> On Tue, 27 Sep 2016 13:17:02 -0500
> Bjorn Helgaas <helgaas@...nel.org> wrote:
>
>> On Sun, Sep 25, 2016 at 10:02:43AM +0300, Neftin, Sasha wrote:
>> > On 9/24/2016 12:05 AM, Jeff Kirsher wrote:
>> > >On Fri, 2016-09-23 at 09:01 -0500, Bjorn Helgaas wrote:
>> > >>On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:
>> > >>>From: Sasha Neftin <sasha.neftin@...el.com>
>> > >>>
>> > >>>82579 has a problem reattaching itself after the device is detached.
>> > >>>The bug was reported by Redhat. The suggested fix is to disable 
>> > >>>FLR capability in PCIe configuration space.
>> > >>>
>> > >>>Reproduction:
>> > >>>Attach the device to a VM, then detach and try to attach again.
>> > >>>
>> > >>>Fix:
>> > >>>Disable FLR capability to prevent the 82579 from hanging.
>> > >>Is there a bugzilla or other reference URL to include here?  
>> > >>Should this be marked for stable?
>> > >So the author is in Israel, meaning it is their weekend now.  I do 
>> > >not believe Sasha monitors email over the weekend, so a response 
>> > >to your questions won't happen for a few days.
>> > >
>> > >I tried searching my archives for more information, but had no 
>> > >luck finding any additional information.
>> > >

I agree that we do probably need to update the patch description since it isn't exactly clear what this is fixing or what was actually broken.

>> > >>>Signed-off-by: Sasha Neftin <sasha.neftin@...el.com>
>> > >>>Tested-by: Aaron Brown <aaron.f.brown@...el.com>
>> > >>>Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@...el.com>
>> > >>>---
>> > >>>  drivers/pci/quirks.c | 21 +++++++++++++++++++++
>> > >>>  1 file changed, 21 insertions(+)
>> > >>>
>> > >>>diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 
>> > >>>44e0ff3..59fba6e 100644
>> > >>>--- a/drivers/pci/quirks.c
>> > >>>+++ b/drivers/pci/quirks.c
>> > >>>@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct 
>> > >>>pci_dev *pdev)
>> > >>>         }
>> > >>>  }
>> > >>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, 
>> > >>>quirk_intel_qat_vf_cap);
>> > >>>+/*
>> > >>>+ * Workaround FLR issues for 82579
>> > >>>+ * This code disables the FLR (Function Level Reset) via PCIe, 
>> > >>>+in
>> > >>>order
>> > >>>+ * to workaround a bug found while using device passthrough, 
>> > >>>+ where the
>> > >>>+ * interface would become non-responsive.
>> > >>>+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, 
>> > >>>+ so if
>> > >>>+ * the BIOS or kernel writes this register * then this 
>> > >>>+ workaround will
>> > >>>+ * not work.
>> > >>This doesn't sound like a root cause.  Is the issue a hardware 
>> > >>erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?
>> > >>
>> > >>The changelog suggests that the problem only affects passthrough, 
>> > >>which suggests some sort of kernel bug related to how passthrough 
>> > >>is implemented.
>>
>> If this bug affects all scenarios, not just passthrough, the 
>> changelog should not mention passthrough.
>>
>> > >>>+ */
>> > >>>+static void quirk_intel_flr_cap_dis(struct pci_dev *dev) {
>> > >>>+        int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
>> > >>>+        if (pos) {
>> > >>>+                u8 cap;
>> > >>>+                pci_read_config_byte(dev, pos + PCI_AF_CAP, &cap);
>> > >>>+                cap = cap & (~PCI_AF_CAP_FLR);
>> > >>>+                pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
>> > >>>+        }
>> > >>>+}
>> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
>> > >>>quirk_intel_flr_cap_dis);
>> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
>> > >>>quirk_intel_flr_cap_dis);
>> > >>>--
>> > >>>2.7.4
>> > >>>
>> > >>>--
>> > >>>To unsubscribe from this list: send the line "unsubscribe 
>> > >>>linux-pci" in the body of a message to majordomo@...r.kernel.org 
>> > >>>More majordomo info at  
>> > >>>http://vger.kernel.org/majordomo-info.html
>> >
>> > Hello,
>> >
>> > Original bugzilla thread could be found here:
>> > https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=966840
>>
>> That bugzilla is private and I can't read it.
>
> Hmm, I can, but I don't see anything in it that supports this.  Is 
> that really the right bz?  It's the right hardware, but has all sorts 
> of FUD about the version of various other components in the stack.

It looks like we had a local copy of the bugzilla saved here, though it only goes up to comment 13 which is where I think we started working this on our side.  I believe what this patch is attempting to resolve is related to comment 8 where the driver returned "probe of
0000:00:19.0 failed with error ‐2" instead of correctly probing the interface.

So the bug as reported was that e1000e had a problem reattaching itself to the PHY after it was attached to a VM.  Sasha, please feel free to correct me if I have this bit wrong.  I had been told the problem was the FLR functionality wasn't fully implemented so that was why they were wanting to defeature it.  I had assumed that there was support for a reset on D0->D3->D0, but I just realized that probably isn't the case since it looks like my local system sets the NoSoftRST bit.

>> > This is our HW bug, exist only in 82579 devices. More new devices 
>> > have no such problem. We have found root cause and suggested this 
>> > solution.
>>
>> Is there an erratum you can reference?
>>
>> > This solution should work for a 95% of cases, so I do not think 
>> > that this is fragile. For another cases possible solution is get up 
>> > working system and manually disable FLR, before VM start use our 
>> > adapter.
>>
>> I don't think a 95% solution is sufficient.  Can you use the
>> pci_dev_specific_reset() framework to make a 100% solution?

I can try working with Sasha on this to see what we can do.

> Right, plus when this does work I suspect it removes the one mechanism 
> we have to reset the device, which depending on how obscure the 
> failure scenario is, isn't a clear cut improvement for device assignment.
> Thanks,
>
> Alex

I'll work with Sasha to see what we can do.  Odds are there is some sort of problem between the MAC/PHY that needs to be resolved when we perform the function level reset so we will probably need to add code so that we reset the part and re-establish the link with the PHY after the reset.

- Alex


Hello,
I would like clarify a few points.

1. 82579 client devices do not support functional level reset (FLR). Unfortunately, it advertises FLR capability -  this is a bug. So, when VM software even accidentally tries to access FLR, that causes our device to hang. To eliminate this problem we decided to disable FLR via drivers/pci/quirks.c 
In previous e-mail I wrote that the solution should work for a 95% of cases. Cases where it might not work if BIOS writes FLR capability before the OS (In 82579 device FLR capability is RWO - read write once). But this is a very weird scenario, so probably the workaround will work close to 100%. Also, I would like to add that we have run lots of tests over 82579 in our lab and ensured that all flows work properly.

2. D0->D3-D0 flow is not affected and is in fact supported by 82579.

3. I will look for an erratum explaining this.

Thanks,
Sasha