[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180502080631.sriwquzcky6t24c2@localhost>
Date: Wed, 2 May 2018 10:06:31 +0200
From: Domenico Andreoli <domenico.andreoli@...ux.com>
To: Mathias Nyman <mathias.nyman@...el.com>,
Bjorn Helgaas <bhelgaas@...gle.com>
Cc: Ard Biesheuvel <ard.biesheuvel@...aro.org>,
Marc Zyngier <marc.zyngier@....com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
linux-kernel@...r.kernel.org
Subject: Regression due to "Workaround for uPD72020x USB3 chips"
Dear all,
my home machine stopped to boot starting from kernel version 4.12.7.
The last message I read is about resetting some USB3 bus. It's 100%
reproducible also with any recent kernel up to 4.17.0-rc3.
I bisected down to the following commit:
commit 0e1f0eaed6c20db41ff61e024b361ee3ec9d686c (tag: my_broken_xhci)
Author: Marc Zyngier <marc.zyngier@....com>
Date: Tue Aug 1 20:11:08 2017 -0500
xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue
commit 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 upstream.
The Renesas uPD72020x XHCI controller seems to suffer from a really
annoying bug, where it may retain some of its DMA programming across a XHCI
reset, and despite the driver correctly programming new DMA addresses.
This is visible if the device has been using 64-bit DMA addresses, and is
then switched to using 32-bit DMA addresses. The top 32 bits of the
address (now zero) are ignored are replaced by the 32 bits from the
*previous* programming. Sticking with 64-bit DMA always works, but doesn't
seem very appropriate.
A PCI reset of the device restores the normal functionality, which is done
at probe time. Unfortunately, this has to be done before any quirk has
been discovered, hence the intrusive nature of the fix.
Tested-by: Ard Biesheuvel <ard.biesheuvel@...aro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@....com>
Signed-off-by: Bjorn Helgaas <bhelgaas@...gle.com>
Acked-by: Mathias Nyman <mathias.nyman@...ux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
diff --git drivers/usb/host/pci-quirks.c drivers/usb/host/pci-quirks.c
index 5f4ca7890435..c8f38649f749 100644
--- drivers/usb/host/pci-quirks.c
+++ drivers/usb/host/pci-quirks.c
@@ -1157,3 +1157,23 @@ static void quirk_usb_early_handoff(struct pci_dev *pdev)
}
DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID,
PCI_CLASS_SERIAL_USB, 8, quirk_usb_early_handoff);
+
+bool usb_xhci_needs_pci_reset(struct pci_dev *pdev)
+{
+ /*
+ * Our dear uPD72020{1,2} friend only partially resets when
+ * asked to via the XHCI interface, and may end up doing DMA
+ * at the wrong addresses, as it keeps the top 32bit of some
+ * addresses from its previous programming under obscure
+ * circumstances.
+ * Give it a good wack at probe time. Unfortunately, this
+ * needs to happen before we've had a chance to discover any
+ * quirk, or the system will be in a rather bad state.
+ */
+ if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
+ (pdev->device == 0x0014 || pdev->device == 0x0015))
+ return true;
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(usb_xhci_needs_pci_reset);
diff --git drivers/usb/host/pci-quirks.h drivers/usb/host/pci-quirks.h
index 655994480198..5582cbafecd4 100644
--- drivers/usb/host/pci-quirks.h
+++ drivers/usb/host/pci-quirks.h
@@ -15,6 +15,7 @@ void usb_asmedia_modifyflowcontrol(struct pci_dev *pdev);
void usb_enable_intel_xhci_ports(struct pci_dev *xhci_pdev);
void usb_disable_xhci_ports(struct pci_dev *xhci_pdev);
void sb800_prefetch(struct device *dev, int on);
+bool usb_xhci_needs_pci_reset(struct pci_dev *pdev);
#else
struct pci_dev;
static inline void usb_amd_quirk_pll_disable(void) {}
diff --git drivers/usb/host/xhci-pci.c drivers/usb/host/xhci-pci.c
index 1ef622ededfd..cefa223f9f08 100644
--- drivers/usb/host/xhci-pci.c
+++ drivers/usb/host/xhci-pci.c
@@ -285,6 +285,13 @@ static int xhci_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
driver = (struct hc_driver *)id->driver_data;
+ /* For some HW implementation, a XHCI reset is just not enough... */
+ if (usb_xhci_needs_pci_reset(dev)) {
+ dev_info(&dev->dev, "Resetting\n");
+ if (pci_reset_function_locked(dev))
+ dev_warn(&dev->dev, "Reset failed");
+ }
+
/* Prevent runtime suspending between USB-2 and USB-3 initialization */
pm_runtime_get_noresume(&dev->dev);
---
Commenting out the call to pci_reset_function_locked() makes 4.17.0-rc3
boot again so I think this bug qualifies as regression.
I investigated a bit, the culprit is pci_reset_secondary_bus().
pci_reset_function_locked ->
__pci_reset_function_locked ->
pci_reset_bridge_secondary_bus ->
pcibios_reset_secondary_bus ->
pci_reset_secondary_bus ->>>>>> here the system dies/hangs/stops
I can read the printk I put right before PCI_BRIDGE_CTL_BUS_RESET is
written in PCI_BRIDGE_CONTROL. I cannot read the printk I put right
after.
It seems my system doesn't like that PCI reset at all. I cannot swear
that it is completely frozen, some disk activity might be happening
shortly after, but my only option is to power cycle.
I cannot debug easily. I tried to boot with a patched module and just
attempted to work at it on a live machine but as soon I unload the
module I'm left without keyboard/mouse (apparently all the accessible
USB ports are going through the xhci-pci module) and at the moment I
cannot go via network.
This is the output of lspci -vt:
-[0000:00]-+-00.0 Intel Corporation 4th Gen Core Processor DRAM Controller
+-01.0-[01]----00.0 NVIDIA Corporation GM108M [GeForce 840M]
+-02.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller
+-03.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller
+-14.0 Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI
+-16.0 Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1
+-1a.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2
+-1b.0 Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller
+-1c.0-[02]--
+-1c.2-[03]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
+-1c.3-[04]----00.0 Realtek Semiconductor Co., Ltd. RTS5229 PCI Express Card Reader
+-1c.4-[05]----00.0 Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter
+-1c.5-[06]----00.0 Renesas Technology Corp. uPD720202 USB 3.0 Host Controller
+-1d.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1
+-1f.0 Intel Corporation C220 Series Chipset Family H81 Express LPC Controller
+-1f.2 Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]
\-1f.3 Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller
The call to pci_reset_secondary_bus() is correctly applied to -1c.5-.
I suspect that -1c.5- might benefit from having PCI_DEV_FLAGS_NO_BUS_RESET
in its dev->dev_flags but I'm not sure that it's the proper fix and how
exactly it could end up there.
Any suggestion on how to proceed further?
Please ask more info if needed.
Kind regards,
Domenico
--
3B10 0CA1 8674 ACBA B4FE FCD2 CE5B CF17 9960 DE13
Powered by blists - more mailing lists