[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6t4ahnarwfa6xzhmdnhv2tzwrd6w7lincrjfwwwpr7xcpvityd@g7ypm4olmk7n>
Date: Mon, 2 Jun 2025 11:59:09 +0530
From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc: Dan Williams <dan.j.williams@...el.com>,
Lukas Wunner <lukas@...ner.de>, Bjorn Helgaas <bhelgaas@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
linux-pci@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 1/1] PCI: Add Extended Tag + MRRS quirk for Xeon 6
On Mon, Jun 02, 2025 at 08:10:42AM +0300, Ilpo Järvinen wrote:
> On Sun, 1 Jun 2025, Manivannan Sadhasivam wrote:
>
> > On Tue, Apr 22, 2025 at 04:02:07PM +0300, Ilpo Järvinen wrote:
> > > When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the
> > > configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit
> > > Tag Requester (note: there is currently no 10-Bit Tag support in the
> > > kernel). While those can be configured to the recommended values by FW,
> > > kernel may decide to overwrite the initial values.
> > >
> > > Unfortunately, there is no mechanism for FW to indicate OS which parts
> > > of PCIe configuration should not be altered. Thus, the only option is
> > > to add such logic into the kernel as quirks.
> > >
> > > There is a pre-existing quirk flag to disable Extended Tags. Depending
> > > on CONFIG_PCIE_BUS_* setting, MRRS may be overwritten by what the
> > > kernel thinks is the best for performance (the largest supported
> > > value), resulting in performance degradation instead with these Root
> > > Ports. (There would have been a pre-existing quirk to disallow
> > > increasing MRRS but it is not identical to rejecting >128B MRRS.)
> > >
> > > Add a quirk that disallows enabling Extended Tags and setting MRRS
> > > larger than 128B for devices under Xeon 6 Root Ports if the Root Port is
> > > bifurcated to x2. Reject >128B MRRS only when it is going to be written
> > > by the kernel (this assumes FW configured a good initial value for MRRS
> > > in case the kernel is not touching MRRS at all).
> > >
> > > It was first attempted to always write MRRS when the quirk is needed
> > > (always overwrite the initial value). That turned out to be quite
> > > invasive change, however, given the complexity of the initial setup
> > > callchain and various stages returning early when they decide no changes
> > > are necessary, requiring override each. As such, the initial value for
> > > MRRS is now left into the hands of FW.
> > >
> > > Link: https://cdrdv2.intel.com/v1/dl/getContent/837176
> > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
> > > ---
> > >
> > > v2:
> > > - Explain in changelog why FW cannot solve this on its own
> > > - Moved the quirk under arch/x86/pci/
> > > - Don't NULL check value from pci_find_host_bridge()
> > > - Added comment above the quirk about the performance degradation
> > > - Removed all setup chain 128B quirk overrides expect for MRRS write
> > > itself (assumes a sane initial value is set by FW)
> > >
> > > arch/x86/pci/fixup.c | 30 ++++++++++++++++++++++++++++++
> > > drivers/pci/pci.c | 15 ++++++++-------
> > > include/linux/pci.h | 1 +
> > > 3 files changed, 39 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > > index efefeb82ab61..aa9617bc4b55 100644
> > > --- a/arch/x86/pci/fixup.c
> > > +++ b/arch/x86/pci/fixup.c
> > > @@ -294,6 +294,36 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PB1, pcie_r
> > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC, pcie_rootport_aspm_quirk);
> > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC1, pcie_rootport_aspm_quirk);
> > >
> > > +/*
> > > + * PCIe devices underneath Xeon6 PCIe Root Port bifurcated to 2x have slower
> > > + * performance with Extended Tags and MRRS > 128B. Workaround the performance
> > > + * problems by disabling Extended Tags and limiting MRRS to 128B.
> > > + *
> > > + * https://cdrdv2.intel.com/v1/dl/getContent/837176
> > > + */
> > > +static void quirk_pcie2x_no_tags_no_mrrs(struct pci_dev *pdev)
> > > +{
> > > + struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
> > > + u32 linkcap;
> > > +
> > > + pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &linkcap);
> > > + if (FIELD_GET(PCI_EXP_LNKCAP_MLW, linkcap) != 0x2)
> > > + return;
> > > +
> > > + bridge->no_ext_tags = 1;
> > > + bridge->only_128b_mrrs = 1;
> >
> > My 2 cents here. Wouldn't it work if you hardcode MRRS to 128 in PCI_EXP_DEVCTL
> > here and then set pci_host_bridge::no_inc_mrrs to 1? This would avoid
> > introducing an extra flag and also serve the same purpose.
>
> Hi Mani,
>
> Thanks for the suggestion but it won't work because this is the Root Port.
> The devices underneath it need this setting so we cannot set them to 128B
> reliable here (is there anything that guarantees those devices have been
> enumerated at this point?).
>
I was under assumption that the kernel would honor the Root Port MRRS value
while setting the device MRRS, but I was wrong. Kernel just takes the MPS
value (but that considers the Root Port's MPS) for MRRS and scales it down if
the hardware doesn't support it.
So yes, setting the Root Port's MRRS would have no effect in limiting the device
MRRS. And I believe, Xeon6 Root Ports doesn't suffer from same 128B MRRS
limitation for MPS, otherwise, you could just set Root Port MPS to 128B and it
will make sure that both MPS and MRRS of endpoint devices will be capped to this
value.
> I've v3 already prepared which uses the enable device hook as suggested by
> Lukas. I'll send it soon.
>
The callback is supposed to be used for performing enablement steps required by
the *Host Bridge* devices, not Root Ports. So I do have a concern to use it to
address the Root Port quirks. That's why I thought of limiting it to the PCI
quirks.
But feel free to submit the patch. Let's see what Bjorn and others feel about
it.
- Mani
--
மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists