lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 2 Oct 2011 23:58:24 -0500
From:	Jon Mason <mason@...i.com>
To:	Avi Kivity <avi@...hat.com>
Cc:	Sven Schnelle <svens@...ckframe.org>, Simon Kirby <sim@...tway.ca>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Niels Ole Salscheider <niels_ole@...scheider-online.de>,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	Ben Hutchings <bhutchings@...arflare.com>
Subject: Re: Workaround for Intel MPS errata

On Sun, Oct 02, 2011 at 11:26:12AM +0200, Avi Kivity wrote:
> On 09/30/2011 03:16 AM, Jon Mason wrote:
> >Hey Avi,
> >Can you try this patch?  It should resolve the issue you are seeing.
> 
> It doesn't; the fixup: label is not reached (though I do have an
> 0x25d4 device).
> 
> -- 
> error compiling committee.c: too many arguments to function
> 

I found a system with a 5000X Memory controller (which should have the
same errata).  It doesn't have the faulty bit (perhaps better BIOS).  I
was able to findout why the code in the previous patch wasn't working,
but wasn't able to cause the crash by setting the bit from the errata.
The reworked version of the previous patch found below should resolve
the issue.  Please test it if you can.

Thanks,
Jon

---

    PCI: Workaround for Intel MPS errata
    
    Intel 5000 and 5100 series memory controllers have a known issue if read
    completion coalescing is enabled (the default setting) and the PCI-E
    Maximum Payload Size is set to 256B.  To work around this issue, disable
    read completion coalescing if the MPS is 256B.
    
    It is worth noting that there is no function to undo the disable of read
    completion coalescing, and the performance benefit of read completion
    coalescing will be lost if the MPS is set from 256B to 128B.  It is only
    possible to have this issue via hotplug removing the only 256B MPS
    device in the system (thus making all of the other devices in the system
    have a performance degradation without the benefit of any 256B
    transfers).  Therefore, this trade off is acceptable.
    
    http://www.intel.com/content/dam/doc/specification-update/5000-chipset-memory-controller-hub-specification-update.pdf
    http://www.intel.com/content/dam/doc/specification-update/5100-memory-controller-hub-chipset-specification-update.pdf
    
    Thanks to Jesse Brandeburg and Ben Hutchings for providing insight into
    the problem.
    
    Reported-by: Avi Kivity <avi@...hat.com>
    Signed-off-by: Jon Mason <mason@...i.com>

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a919db2..8f6725f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1361,6 +1361,90 @@ static int pcie_find_smpss(struct pci_dev *dev, void *data)
 	return 0;
 }
 
+static void pcie_errata_check(int mps)
+{
+	static bool done = false;
+	struct pci_bus *bus;
+	u16 val;
+
+	if (done)
+		return;
+
+	/* pci_get_device cannot be used for these, as there are no pci_dev's
+	 * created for the memory controllers.  We'll have to get nasty here and
+	 * check PCI config space ourselves.
+	 */
+	bus = pci_find_bus(0, 0);
+	if (!bus)
+		return;
+
+	/* Intel 5000 and 5100 Memory controllers have an errata with read
+	 * completion coalescing (which is enabled by default) and MPS of 256B.
+	 */
+	pci_bus_read_config_word(bus, 0, PCI_VENDOR_ID, &val);
+	if (val != PCI_VENDOR_ID_INTEL) {
+		done = true;
+		return;
+	}
+
+	pci_bus_read_config_word(bus, 0, PCI_DEVICE_ID, &val);
+	switch (val) {
+	case 0x25C0:	/* 5000X Chipset Memory Controller Hub */
+	case 0x25D0:	/* 5000Z Chipset Memory Controller Hub */
+	case 0x25D4:	/* 5000V Chipset Memory Controller Hub */
+	case 0x25D8:	/* 5000P Chipset Memory Controller Hub */
+	case 0x65C0:	/* 5100 Chipset Memory Controller Hub */
+		break;
+	default:
+		done = true;
+		return;
+	}
+
+	/* Disable read completion coalescing to allow an MPS of 256.
+	 * 
+	 * It is worth noting that there is no function to undo the disable of
+	 * read completion coalescing, and the performance benefit of read
+	 * completion coalescing will be lost if the MPS is set from 256B to
+	 * 128B.  It is only possible to have this issue via hotplug removing
+	 * the only 256B MPS device in the system (thus making all of the other
+	 * devices in the system have a performance degradation without the
+	 * benefit of any 256B transfers).  Therefore, this trade off is
+	 * acceptable.
+	 */
+	if (mps == 256) {
+		int err;
+
+		/* Intel errata specifies bits to change but does not say what
+		 * they are.  Keeping them magical until such time as the
+		 * registers and values can be explained.
+		 */
+		err = pci_bus_read_config_word(bus, 0, 0x48, &val);
+		if (err) {
+			dev_err(&bus->dev, "Error attempting to read the read "
+				"completion coalescing register.\n");
+			return;
+		}
+
+		if (!(val & (1 << 10))) {
+			done = true;
+			return;
+		}
+
+		val |= (1 << 10);
+		err = pci_bus_write_config_word(bus, 0, 0x48, val);
+		if (err) {
+			dev_err(&bus->dev, "Error attempting to write the read "
+				"completion coalescing register.\n");
+			return;
+		}
+
+		dev_info(&bus->dev, "Read completion coalescing disabled due "
+			 "to hardware errata relating to 256B MPS.\n");
+
+		done = true;
+	}
+}
+
 static void pcie_write_mps(struct pci_dev *dev, int mps)
 {
 	int rc;
@@ -1384,6 +1468,8 @@ static void pcie_write_mps(struct pci_dev *dev, int mps)
 			mps = min(mps, pcie_get_mps(dev->bus->self));
 	}
 
+	pcie_errata_check(mps);
+
 	rc = pcie_set_mps(dev, mps);
 	if (rc)
 		dev_err(&dev->dev, "Failed attempting to set the MPS\n");
@@ -1433,19 +1519,19 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
 	if (!pci_is_pcie(dev))
 		return 0;
 
-	dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
+	dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
 		 pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));
 
 	pcie_write_mps(dev, mps);
 	pcie_write_mrrs(dev);
 
-	dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
+	dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
 		 pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));
 
 	return 0;
 }
 
-/* pcie_bus_configure_mps requires that pci_walk_bus work in a top-down,
+/* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
  * parents then children fashion.  If this changes, then this code will not
  * work as designed.
  */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ