[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111003151158.GA21955@myri.com>
Date: Mon, 3 Oct 2011 10:12:01 -0500
From: Jon Mason <mason@...i.com>
To: Avi Kivity <avi@...hat.com>
Cc: Sven Schnelle <svens@...ckframe.org>, Simon Kirby <sim@...tway.ca>,
Eric Dumazet <eric.dumazet@...il.com>,
Niels Ole Salscheider <niels_ole@...scheider-online.de>,
Jesse Barnes <jbarnes@...tuousgeek.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
Ben Hutchings <bhutchings@...arflare.com>
Subject: Re: Workaround for Intel MPS errata
On Mon, Oct 03, 2011 at 12:11:53PM +0200, Avi Kivity wrote:
> On 10/03/2011 06:58 AM, Jon Mason wrote:
> >On Sun, Oct 02, 2011 at 11:26:12AM +0200, Avi Kivity wrote:
> >> On 09/30/2011 03:16 AM, Jon Mason wrote:
> >> >Hey Avi,
> >> >Can you try this patch? It should resolve the issue you are seeing.
> >>
> >> It doesn't; the fixup: label is not reached (though I do have an
> >> 0x25d4 device).
> >>
> >> -- > error compiling committee.c: too many arguments to
> >function
> >>
> >
> >I found a system with a 5000X Memory controller (which should have the
> >same errata). It doesn't have the faulty bit (perhaps better BIOS). I
> >was able to findout why the code in the previous patch wasn't working,
> >but wasn't able to cause the crash by setting the bit from the errata.
> >The reworked version of the previous patch found below should resolve
> >the issue. Please test it if you can.
>
> Will be happy to test, but patch appears to be against a different tree?
>
> $ git apply -C2 .git/rebase-apply/patch
> .git/rebase-apply/patch:75: trailing whitespace, shock horror.
> *
> Context reduced to (2/2) to apply fragment at 1362
> Context reduced to (2/2) to apply fragment at 1475
> error: patch failed: drivers/pci/probe.c:1433
> error: drivers/pci/probe.c: patch does not apply
Sorry, I had the patch on top of the 3 patches I just sent to Linus.
I've rebased it and inserted it below.
Thanks,
Jon
PCI: Workaround for Intel MPS errata
Intel 5000 and 5100 series memory controllers have a known issue if read
completion coalescing is enabled (the default setting) and the PCI-E
Maximum Payload Size is set to 256B. To work around this issue, disable
read completion coalescing if the MPS is 256B.
It is worth noting that there is no function to undo the disable of read
completion coalescing, and the performance benefit of read completion
coalescing will be lost if the MPS is set from 256B to 128B. It is only
possible to have this issue via hotplug removing the only 256B MPS
device in the system (thus making all of the other devices in the system
have a performance degradation without the benefit of any 256B
transfers). Therefore, this trade off is acceptable.
http://www.intel.com/content/dam/doc/specification-update/5000-chipset-memory-controller-hub-specification-update.pdf
http://www.intel.com/content/dam/doc/specification-update/5100-memory-controller-hub-chipset-specification-update.pdf
Thanks to Jesse Brandeburg and Ben Hutchings for providing insight into
the problem.
Reported-by: Avi Kivity <avi@...hat.com>
Signed-off-by: Jon Mason <mason@...i.com>
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index f3f94a5..1dd11a5 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1361,6 +1361,90 @@ static int pcie_find_smpss(struct pci_dev *dev, void *data)
return 0;
}
+static void pcie_errata_check(int mps)
+{
+ static bool done = false;
+ struct pci_bus *bus;
+ u16 val;
+
+ if (done)
+ return;
+
+ /* pci_get_device cannot be used for these, as there are no pci_dev's
+ * created for the memory controllers. We'll have to get nasty here and
+ * check PCI config space ourselves.
+ */
+ bus = pci_find_bus(0, 0);
+ if (!bus)
+ return;
+
+ /* Intel 5000 and 5100 Memory controllers have an errata with read
+ * completion coalescing (which is enabled by default) and MPS of 256B.
+ */
+ pci_bus_read_config_word(bus, 0, PCI_VENDOR_ID, &val);
+ if (val != PCI_VENDOR_ID_INTEL) {
+ done = true;
+ return;
+ }
+
+ pci_bus_read_config_word(bus, 0, PCI_DEVICE_ID, &val);
+ switch (val) {
+ case 0x25C0: /* 5000X Chipset Memory Controller Hub */
+ case 0x25D0: /* 5000Z Chipset Memory Controller Hub */
+ case 0x25D4: /* 5000V Chipset Memory Controller Hub */
+ case 0x25D8: /* 5000P Chipset Memory Controller Hub */
+ case 0x65C0: /* 5100 Chipset Memory Controller Hub */
+ break;
+ default:
+ done = true;
+ return;
+ }
+
+ /* Disable read completion coalescing to allow an MPS of 256.
+ *
+ * It is worth noting that there is no function to undo the disable of
+ * read completion coalescing, and the performance benefit of read
+ * completion coalescing will be lost if the MPS is set from 256B to
+ * 128B. It is only possible to have this issue via hotplug removing
+ * the only 256B MPS device in the system (thus making all of the other
+ * devices in the system have a performance degradation without the
+ * benefit of any 256B transfers). Therefore, this trade off is
+ * acceptable.
+ */
+ if (mps == 256) {
+ int err;
+
+ /* Intel errata specifies bits to change but does not say what
+ * they are. Keeping them magical until such time as the
+ * registers and values can be explained.
+ */
+ err = pci_bus_read_config_word(bus, 0, 0x48, &val);
+ if (err) {
+ dev_err(&bus->dev, "Error attempting to read the read "
+ "completion coalescing register.\n");
+ return;
+ }
+
+ if (!(val & (1 << 10))) {
+ done = true;
+ return;
+ }
+
+ val |= (1 << 10);
+ err = pci_bus_write_config_word(bus, 0, 0x48, val);
+ if (err) {
+ dev_err(&bus->dev, "Error attempting to write the read "
+ "completion coalescing register.\n");
+ return;
+ }
+
+ dev_info(&bus->dev, "Read completion coalescing disabled due "
+ "to hardware errata relating to 256B MPS.\n");
+
+ done = true;
+ }
+}
+
static void pcie_write_mps(struct pci_dev *dev, int mps)
{
int rc, dev_mpss;
@@ -1390,6 +1474,8 @@ static void pcie_write_mps(struct pci_dev *dev, int mps)
dev->pcie_mpss = ffs(mps) - 8;
}
+ pcie_errata_check(mps);
+
rc = pcie_set_mps(dev, mps);
if (rc)
dev_err(&dev->dev, "Failed attempting to set the MPS\n");
@@ -1452,7 +1538,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
return 0;
}
-/* pcie_bus_configure_mps requires that pci_walk_bus work in a top-down,
+/* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
* parents then children fashion. If this changes, then this code will not
* work as designed.
*/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists