[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <07428f84-5fa3-713f-caac-f69c0e92c779@linux.intel.com>
Date: Tue, 16 Dec 2025 12:08:48 +0200 (EET)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Yang Zhang <zhangz@...on.cn>
cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, bhelgaas@...gle.com,
x86@...nel.org, LKML <linux-kernel@...r.kernel.org>,
linux-pci@...r.kernel.org
Subject: Re: [PATCH] X86/PCI: Prioritize MMCFG access to hardware registers
On Tue, 16 Dec 2025, Yang Zhang wrote:
> As CPU performance demands increase, the configuration of some internal CPU
> registers needs to be dynamically configured in the program, such as
> configuring memory controller strategies within specific time windows.
> These configurations place high demands on the efficiency of the
> configuration instructions themselves, requiring them to retire and
> take effect as quickly as possible.
>
> However, the current kernel code forces the use of the IO Port method for
> PCI accesses with domain=0 and offset less than 256. The IO Port method is
> more like a legacy from historical reasons, and its performance is lower
> than that of the MMCFG method. We conducted comparative tests on AMD and
> Hygon CPUs respectively, even without considering the impact of indirect
> access (IO Ports use 0xCF8 and 0xCFC), simply comparing the performance of
> the following two code:
>
> 1)outl(0x400702,0xCFC);
>
> 2)mmio_config_writel(data_addr,0x400702);
>
> while both codes access the same register. The results shows the MMCFG
> (400+ cycle per access) method outperforms the IO Port (1000+ cycle
> per access) by twice.
>
> Through PMC/PMU event statistics within the AMD/Hygon microarchitecture,
> we found IO Port access causes more stalls within the CPU's internal
> dispatch module, and these stalls are mainly due to the front-end's
> inability to decode the corresponding uops in a timely manner.
> Therefore the main reason for the performance difference between the
> two access methods is that the in/out instructions corresponding to
> the IO Port access belong to microcode, and therefore their decoding
> efficiency is lower than that of mmcfg.
>
> For CPUs that support both MMCFG and IO Port access methods, if a hardware
> register only supports IO Port access, this configuration may lead to
> illegal access. However, we think registers that support I/O Port access
> have corresponding MMCFG addresses. Even we test several AMD/Hygon CPUs
> with this patch and found no problems, we still cannot rule out the
> possibility that all CPUs are problem-free, especially older CPUs. To
> address this risk, we have created a new macro, PREFER MMCONFIG, allowing
> users to choose whether or not to enable this feature.
>
> Signed-off-by: Yang Zhang <zhangz@...on.cn>
> ---
> arch/x86/Kconfig | 15 +++++++++++++++
> arch/x86/pci/common.c | 14 ++++++++++++++
> 2 files changed, 29 insertions(+)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 80527299f..037d56690 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2932,6 +2932,21 @@ config PCI_MMCONFIG
>
> Say Y otherwise.
>
> +config PREFER_MMCONFIG
> + bool "Perfer to use mmconfig over IO Port"
Prefer
--
i.
> + depends on PCI_MMCONFIG
> + help
> + This setting will prioritize the use of mmcfg, which is superior to
> + io port from a performance perspective, mainly for the following reasons:
> + 1) io port is an indirect access; 2) io port instructions are decoded
> + by microcode, which is more likely to cause CPU front-end bound compared
> + to mmcfg using mov instructions.
> +
> + For CPUs that support both MMCFG and IO Port access methods, if a
> + hardware register only supports IO Port access, this configuration
> + may lead to illegal access. Therefore, users must ensure that the
> + configuration will not cause any exceptions before enabling it.
> +
> config PCI_OLPC
> def_bool y
> depends on PCI && OLPC && (PCI_GOOLPC || PCI_GOANY)
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index ddb798603..8bde5d1df 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -40,20 +40,34 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops;
> int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
> int reg, int len, u32 *val)
> {
> +#ifdef CONFIG_PREFER_MMCONFIG
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
> + if (domain == 0 && reg < 256 && raw_pci_ops)
> + return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
> +#else
> if (domain == 0 && reg < 256 && raw_pci_ops)
> return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
> if (raw_pci_ext_ops)
> return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
> +#endif
> return -EINVAL;
> }
>
> int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
> int reg, int len, u32 val)
> {
> +#ifdef CONFIG_PREFER_MMCONFIG
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
> + if (domain == 0 && reg < 256 && raw_pci_ops)
> + return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
> +#else
> if (domain == 0 && reg < 256 && raw_pci_ops)
> return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
> if (raw_pci_ext_ops)
> return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
> +#endif
> return -EINVAL;
> }
>
>
Powered by blists - more mailing lists