[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20250422130207.3124-1-ilpo.jarvinen@linux.intel.com>
Date: Tue, 22 Apr 2025 16:02:07 +0300
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Dan Williams <dan.j.williams@...el.com>,
Lukas Wunner <lukas@...ner.de>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Subject: [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
Tag Requester (note: there is currently no 10-Bit Tag support in the
kernel). While those can be configured to the recommended values by FW,
kernel may decide to overwrite the initial values.
Unfortunately, there is no mechanism for FW to indicate OS which parts
of PCIe configuration should not be altered. Thus, the only option is
to add such logic into the kernel as quirks.
There is a pre-existing quirk flag to disable Extended Tags. Depending
on CONFIG_PCIE_BUS_* setting, MRRS may be overwritten by what the
kernel thinks is the best for performance (the largest supported
value), resulting in performance degradation instead with these Root
Ports. (There would have been a pre-existing quirk to disallow
increasing MRRS but it is not identical to rejecting >128B MRRS.)
Add a quirk that disallows enabling Extended Tags and setting MRRS
larger than 128B for devices under Xeon 6 Root Ports if the Root Port is
bifurcated to x2. Reject >128B MRRS only when it is going to be written
by the kernel (this assumes FW configured a good initial value for MRRS
in case the kernel is not touching MRRS at all).
It was first attempted to always write MRRS when the quirk is needed
(always overwrite the initial value). That turned out to be quite
invasive change, however, given the complexity of the initial setup
callchain and various stages returning early when they decide no changes
are necessary, requiring override each. As such, the initial value for
MRRS is now left into the hands of FW.
Link: https://cdrdv2.intel.com/v1/dl/getContent/837176
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
---
v2:
- Explain in changelog why FW cannot solve this on its own
- Moved the quirk under arch/x86/pci/
- Don't NULL check value from pci_find_host_bridge()
- Added comment above the quirk about the performance degradation
- Removed all setup chain 128B quirk overrides expect for MRRS write
itself (assumes a sane initial value is set by FW)
arch/x86/pci/fixup.c | 30 ++++++++++++++++++++++++++++++
drivers/pci/pci.c | 15 ++++++++-------
include/linux/pci.h | 1 +
3 files changed, 39 insertions(+), 7 deletions(-)
diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index efefeb82ab61..aa9617bc4b55 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -294,6 +294,36 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PB1, pcie_r
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC, pcie_rootport_aspm_quirk);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC1, pcie_rootport_aspm_quirk);
+/*
+ * PCIe devices underneath Xeon6 PCIe Root Port bifurcated to 2x have slower
+ * performance with Extended Tags and MRRS > 128B. Workaround the performance
+ * problems by disabling Extended Tags and limiting MRRS to 128B.
+ *
+ * https://cdrdv2.intel.com/v1/dl/getContent/837176
+ */
+static void quirk_pcie2x_no_tags_no_mrrs(struct pci_dev *pdev)
+{
+ struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
+ u32 linkcap;
+
+ pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &linkcap);
+ if (FIELD_GET(PCI_EXP_LNKCAP_MLW, linkcap) != 0x2)
+ return;
+
+ bridge->no_ext_tags = 1;
+ bridge->only_128b_mrrs = 1;
+ pci_info(pdev, "Disabling Extended Tags and limiting MRRS to 128B (performance reasons due to 2x PCIe link)\n");
+}
+
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db0, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db1, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db2, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db3, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db6, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db7, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db8, quirk_pcie2x_no_tags_no_mrrs);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0db9, quirk_pcie2x_no_tags_no_mrrs);
+
/*
* Fixup to mark boot BIOS video selected by BIOS before it changes
*
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4d7c9f64ea24..2ca9cb30fbd3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5941,7 +5941,7 @@ EXPORT_SYMBOL(pcie_get_readrq);
int pcie_set_readrq(struct pci_dev *dev, int rq)
{
u16 v;
- int ret;
+ int ret, max_mrrs = 4096;
struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
if (rq < 128 || rq > 4096 || !is_power_of_2(rq))
@@ -5961,13 +5961,14 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
v = FIELD_PREP(PCI_EXP_DEVCTL_READRQ, ffs(rq) - 8);
- if (bridge->no_inc_mrrs) {
- int max_mrrs = pcie_get_readrq(dev);
+ if (bridge->no_inc_mrrs)
+ max_mrrs = pcie_get_readrq(dev);
+ if (bridge->only_128b_mrrs)
+ max_mrrs = 128;
- if (rq > max_mrrs) {
- pci_info(dev, "can't set Max_Read_Request_Size to %d; max is %d\n", rq, max_mrrs);
- return -EINVAL;
- }
+ if (rq > max_mrrs) {
+ pci_info(dev, "can't set Max_Read_Request_Size to %d; max is %d\n", rq, max_mrrs);
+ return -EINVAL;
}
ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0e8e3fd77e96..6dc7a05f4d4b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -603,6 +603,7 @@ struct pci_host_bridge {
unsigned int ignore_reset_delay:1; /* For entire hierarchy */
unsigned int no_ext_tags:1; /* No Extended Tags */
unsigned int no_inc_mrrs:1; /* No Increase MRRS */
+ unsigned int only_128b_mrrs:1; /* Only 128B MRRS */
unsigned int native_aer:1; /* OS may use PCIe AER */
unsigned int native_pcie_hotplug:1; /* OS may use PCIe hotplug */
unsigned int native_shpc_hotplug:1; /* OS may use SHPC hotplug */
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
--
2.39.5
Powered by blists - more mailing lists