[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251216100332.6610-1-zhangz@hygon.cn>
Date: Tue, 16 Dec 2025 18:03:32 +0800
From: Yang Zhang <zhangz@...on.cn>
To: <tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
<dave.hansen@...ux.intel.com>, <hpa@...or.com>, <bhelgaas@...gle.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<linux-pci@...r.kernel.org>, Yang Zhang <zhangz@...on.cn>
Subject: [PATCH] X86/PCI: Prioritize MMCFG access to hardware registers
As CPU performance demands increase, the configuration of some internal CPU
registers needs to be dynamically configured in the program, such as
configuring memory controller strategies within specific time windows.
These configurations place high demands on the efficiency of the
configuration instructions themselves, requiring them to retire and
take effect as quickly as possible.
However, the current kernel code forces the use of the IO Port method for
PCI accesses with domain=0 and offset less than 256. The IO Port method is
more like a legacy from historical reasons, and its performance is lower
than that of the MMCFG method. We conducted comparative tests on AMD and
Hygon CPUs respectively, even without considering the impact of indirect
access (IO Ports use 0xCF8 and 0xCFC), simply comparing the performance of
the following two code:
1)outl(0x400702,0xCFC);
2)mmio_config_writel(data_addr,0x400702);
while both codes access the same register. The results shows the MMCFG
(400+ cycle per access) method outperforms the IO Port (1000+ cycle
per access) by twice.
Through PMC/PMU event statistics within the AMD/Hygon microarchitecture,
we found IO Port access causes more stalls within the CPU's internal
dispatch module, and these stalls are mainly due to the front-end's
inability to decode the corresponding uops in a timely manner.
Therefore the main reason for the performance difference between the
two access methods is that the in/out instructions corresponding to
the IO Port access belong to microcode, and therefore their decoding
efficiency is lower than that of mmcfg.
For CPUs that support both MMCFG and IO Port access methods, if a hardware
register only supports IO Port access, this configuration may lead to
illegal access. However, we think registers that support I/O Port access
have corresponding MMCFG addresses. Even we test several AMD/Hygon CPUs
with this patch and found no problems, we still cannot rule out the
possibility that all CPUs are problem-free, especially older CPUs. To
address this risk, we have created a new macro, PREFER MMCONFIG, allowing
users to choose whether or not to enable this feature.
Signed-off-by: Yang Zhang <zhangz@...on.cn>
---
arch/x86/Kconfig | 15 +++++++++++++++
arch/x86/pci/common.c | 14 ++++++++++++++
2 files changed, 29 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80527299f..037d56690 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2932,6 +2932,21 @@ config PCI_MMCONFIG
Say Y otherwise.
+config PREFER_MMCONFIG
+ bool "Perfer to use mmconfig over IO Port"
+ depends on PCI_MMCONFIG
+ help
+ This setting will prioritize the use of mmcfg, which is superior to
+ io port from a performance perspective, mainly for the following reasons:
+ 1) io port is an indirect access; 2) io port instructions are decoded
+ by microcode, which is more likely to cause CPU front-end bound compared
+ to mmcfg using mov instructions.
+
+ For CPUs that support both MMCFG and IO Port access methods, if a
+ hardware register only supports IO Port access, this configuration
+ may lead to illegal access. Therefore, users must ensure that the
+ configuration will not cause any exceptions before enabling it.
+
config PCI_OLPC
def_bool y
depends on PCI && OLPC && (PCI_GOOLPC || PCI_GOANY)
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index ddb798603..8bde5d1df 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -40,20 +40,34 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops;
int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *val)
{
+#ifdef CONFIG_PREFER_MMCONFIG
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
+ if (domain == 0 && reg < 256 && raw_pci_ops)
+ return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
+#else
if (domain == 0 && reg < 256 && raw_pci_ops)
return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
if (raw_pci_ext_ops)
return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
+#endif
return -EINVAL;
}
int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 val)
{
+#ifdef CONFIG_PREFER_MMCONFIG
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
+ if (domain == 0 && reg < 256 && raw_pci_ops)
+ return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
+#else
if (domain == 0 && reg < 256 && raw_pci_ops)
return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
if (raw_pci_ext_ops)
return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
+#endif
return -EINVAL;
}
--
2.34.1
Powered by blists - more mailing lists