[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <38072ab3-379d-4e7d-85c8-de1d4f4960b4@amd.com>
Date: Wed, 25 Jun 2025 11:58:01 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org, linux-kernel@...r.kernel.org
Cc: "H. Peter Anvin" <hpa@...or.com>, Naveen rao <naveen.rao@....com>,
Sairaj Kodilkar <sarunkod@....com>,
Mario Limonciello <mario.limonciello@....com>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
"Gautham R. Shenoy" <gautham.shenoy@....com>, Babu Moger
<babu.moger@....com>, Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: Re: [PATCH 0/2] x86/cpu/topology: Work around the nuances of
virtualization on AMD/Hygon
On 6/12/2025 12:59 PM, K Prateek Nayak wrote:
> When running an AMD guest on QEMU with > 255 cores, the following FW_BUG
> was noticed with recent kernels:
>
> [Firmware Bug]: CPU 512: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0200
>
> Naveen, Sairaj debugged the cause to commit c749ce393b8f ("x86/cpu: Use
> common topology code for AMD") where, after the rework, the initial
> APICID was set using the CPUID leaf 0x8000001e EAX[31:0] as opposed to
> the value from CPUID leaf 0xb EDX[31:0] previously.
>
> This led us down a rabbit hole of XTOPOEXT vs TOPOEXT support, preferred
> order of their parsing, and QEMU nuances like [1] where QEMU 0's out the
> CPUID leaf 0x8000001e on CPUs where Core ID crosses 255 fearing a
> Core ID collision in the 8 bit field which leads to the reported FW_BUG.
>
> Following were major observations during the debug which the two
> patches address respectively:
>
> 1. The support for CPUID leaf 0xb is independent of the TOPOEXT feature
> and is rather linked to the x2APIC enablement. On baremetal, this has
> not been a problem since TOPOEXT support (Fam 0x15 and above)
> predates the support for CPUID leaf 0xb (Fam 0x17[Zen2] and above)
> however, in virtualized environment, the support for x2APIC can be
> enabled independent of topoext where QEMU expects the guest to parse
> the topology and the APICID from CPUID leaf 0xb.
>
> 2. Since CPUID leaf 0x8000001e cannot represent Core ID without
> collision for guests with > 255 cores, and QEMU 0's out the entire
> leaf when Core ID crosses 255. Prefer initial APIC read from the
> XTOPOEXT leaf before falling back to the APICID from 0x8000001e
> which is still better than 8-bit APICID from leaf 0x1 EBX[31:24].
>
> More details are enclosed in the commit logs.
>
> Ideally, these changes should not affect baremetal AMD/Hygon platforms
> as they have supported TOPOEXT long before the support for CPUID leaf
> 0xb and the extended CPUID leaf 0x80000026 (famous last words).
>
> This series has been tested on baremetal Zen1 (contains topoext but not
> 0xb leaf), Zen3 (contains both topoext and 0xb leaf), and Zen4 (contains
> topoext, 0xb leaf, and 0x80000026 leaf) servers with no changes
> observed in "/sys/kernel/debug/x86/topo/" directory.
>
> The series was also tested on 255 and 512 vCPU (each vCPU is an
> individual core from QEMU topology being passed) EPYC-Genoa guest with
> and without x2apic and topoext enabled and this series solves the FW_BUG
> seen on guest with > 255 VCPUs. No changes observed in
> "/sys/kernel/debug/x86/topo/" for all other cases without warning.
> 0xb leaf is provided unconditionally on these guests (with or without
> topoext, even with x2apic disabled on guests with <= 255 vCPU).
>
> Relevant bits of QEMU cmdline used during testing are as follows:
>
> qemu-system-x86_64 \
> -enable-kvm -m 32G -smp cpus=255,cores=255 \
> -cpu EPYC-Genoa,x2apic=on,kvm-msi-ext-dest-id=on,+kvm-pv-unhalt,kvm-pv-tlb-flush,kvm-pv-ipi,kvm-pv-sched-yield,[-topoext] \
> -machine q35,kernel_irqchip=split \
> -global kvm-pit.lost_tick_policy=discard
> ...
>
> References:
>
> [1] https://github.com/qemu/qemu/commit/35ac5dfbcaa4b
>
> Series is based on tip:x86/cpu at tag v6.15-rc6.
>
> ---
> K Prateek Nayak (2):
> x86/cpu/topology: Always try cpu_parse_topology_ext() on AMD/Hygon
> x86/cpu/topology: Use initial APICID from XTOPOEXT on AMD/HYGON
>
> arch/x86/kernel/cpu/topology_amd.c | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
>
>
> base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
Gentle ping!
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists