[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e3a8e247-0ced-4354-b7cf-25ee7beb9987@amd.com>
Date: Tue, 19 Aug 2025 19:58:52 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Borislav Petkov <bp@...en8.de>
CC: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Dave
Hansen <dave.hansen@...ux.intel.com>, Sean Christopherson
<seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, <x86@...nel.org>,
Naveen rao <naveen.rao@....com>, Sairaj Kodilkar <sarunkod@....com>, "H.
Peter Anvin" <hpa@...or.com>, "Peter Zijlstra (Intel)"
<peterz@...radead.org>, "Xin Li (Intel)" <xin@...or.com>, Pawan Gupta
<pawan.kumar.gupta@...ux.intel.com>, <linux-kernel@...r.kernel.org>,
<kvm@...r.kernel.org>, Mario Limonciello <mario.limonciello@....com>,
"Gautham R. Shenoy" <gautham.shenoy@....com>, Babu Moger
<babu.moger@....com>, Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: Re: [PATCH v3 0/4] x86/cpu/topology: Work around the nuances of
virtualization on AMD/Hygon
Hello Boris,
On 8/19/2025 5:04 PM, Borislav Petkov wrote:
> Lemme try to make some sense of this because the wild use of names and things
> is making my head spin...
>
> On Mon, Aug 18, 2025 at 06:04:31AM +0000, K Prateek Nayak wrote:
>> When running an AMD guest on QEMU with > 255 cores, the following FW_BUG
>> was noticed with recent kernels:
>>
>> [Firmware Bug]: CPU 512: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0200
>>
>> Naveen, Sairaj debugged the cause to commit c749ce393b8f ("x86/cpu: Use
>> common topology code for AMD") where, after the rework, the initial
>> APICID was set using the CPUID leaf 0x8000001e EAX[31:0] as opposed to
>
> That's
>
> CPUID_Fn8000001E_ECX [Node Identifiers] (Core::X86::Cpuid::NodeId)
Small correction here, this is actually,
CPUID_Fn8000001E_EAX [Extended APIC ID] (Core::X86::Cpuid::ExtApicId)
>
>> the value from CPUID leaf 0xb EDX[31:0] previously.
>
> That's
>
> CPUID_Fn0000000B_EDX [Extended Topology Enumeration]
> (Core::X86::Cpuid::ExtTopEnumEdx)
>
>> This led us down a rabbit hole of XTOPOEXT vs TOPOEXT support, preferred
>
> What is XTOPOEXT?
>
> CPUID_Fn0000000B_EDX?
>
> Please define all your things properly so that we can have common base when
> reading this text.
Sorry about that! This should actually be "X86_FEATURE_XTOPOLOGY" which
is a synthetic feature set when topology parsing via one of the following
CPUID leaf is successful:
- 0x1f
V2 Extended Topology Enumeration Leaf
(Intel only)
- 0x80000026
CPUID_Fn80000026_E[A,B,C]X_x0[0...3] [Extended CPU Topology]
Core::X86::Cpuid::ExCpuTopologyE[a,b,c]x[0..3]
(AMD only)
- 0xb
CPUID_Fn0000000B_E[A,B,C]X_x0[0..2] [Extended Topology Enumeration]
Core::X86::Cpuid::ExtTopEnumE[a,b,c]x[0..2]
(Both Intel and AMD)
The parsing of the leaves is tried in the same order as above.
>
> TOPOEXT is, I presume:
>
> #define X86_FEATURE_TOPOEXT ( 6*32+22) /* "topoext" Topology extensions CPUID leafs */
>
> Our PPR says:
>
> CPUID_Fn80000001_ECX [Feature Identifiers] (Core::X86::Cpuid::FeatureExtIdEcx)
>
> "22 TopologyExtensions: topology extensions support. Read-only. Reset:
> Fixed,1. 1=Indicates support for Core::X86::Cpuid::CachePropEax0 and
> Core::X86::Cpuid::ExtApicId."
>
> Those leafs are:
>
> CPUID_Fn8000001D_EAX_x00 [Cache Properties (DC)] (Core::X86::Cpuid::CachePropEax0)
>
> DC topology info. Probably not important for this here.
>
> and
>
> CPUID_Fn8000001E_EAX [Extended APIC ID] (Core::X86::Cpuid::ExtApicId)
>
> the extended APIC ID is there.
>
> How is this APIC ID different from the extended APIC ID in
>
> CPUID_Fn0000000B_EDX [Extended Topology Enumeration] (Core::X86::Cpuid::ExtTopEnumEdx)
>
> ?
On baremetal, they are the same. On QEMU, when we launch a guest with
a topology that contains more than 256 cores on a single socket, QEMU
0s out all the bits in CPUID_Fn8000001E [1] since it fears a collision
in the "CoreId[7:0]" field of
"CPUID_Fn8000001E_EBX [Core Identifiers] (Core::X86::Cpuid::CoreId)"
Since
"CPUID_Fn0000000B_EBX_x01 [Extended Topology Enumeration]" and
"LogProcAtThisLevel[15:0]" can describe a domain with up to 2^16 cores,
the Core ID can always be derived correctly from this even when the
number of cores in the guest topology crosses 256.
>
>> order of their parsing, and QEMU nuances like [1] where QEMU 0's out the
>> CPUID leaf 0x8000001e on CPUs where Core ID crosses 255 fearing a
>> Core ID collision in the 8 bit field which leads to the reported FW_BUG.
>
> Is that what the hw does though?
We don't have baremetal systems with more than 256 cores per socket and
when that happens, I believe the expectation from H/W is to just use
CPUID_Fn80000026 leaf or the CPUID_Fn0000000B leaf.
>
> Has this been verified instead of willy nilly clearing CPUID leafs in qemu?
>
>> Following were major observations during the debug which the two
>> patches address respectively:
>>
>> 1. The support for CPUID leaf 0xb is independent of the TOPOEXT feature
>
> Yes, PPR says so.
>
>> and is rather linked to the x2APIC enablement.
>
> Because the SDM says:
>
> "Bits 31-00: x2APIC ID of the current logical processor."
>
> ?
SDM Vol. 3A Sec. 11.12.8 "CPUID Extensions And Topology Enumeration"
reads:
For Intel 64 and IA-32 processors that support x2APIC, a value of 1
reported by CPUID.01H:ECX[21] indicates that the processor supports
x2APIC and the extended topology enumeration leaf (CPUID.0BH).
The extended topology enumeration leaf can be accessed by executing
CPUID with EAX = 0BH. Processors that do not support x2APIC may
support CPUID leaf 0BH. Software can detect the availability of the
extended topology enumeration leaf (0BH) by performing two steps:
1. Check maximum input value for basic CPUID information by executing
CPUID with EAX= 0. If CPUID.0H:EAX is greater than or equal or 11
(0BH), then proceed to next step
2. Check CPUID.EAX=0BH, ECX=0H:EBX is non-zero.
If both of the above conditions are true, extended topology
enumeration leaf is available.
>
> Is our version not containing the x2APIC ID?
We too have the Extended APIC ID in both CPUID_Fn0000000B and
CPUID_Fn8000001E_EAX and they both match on baremetal. The problem is
only for virtualized guest whose topology contains more than 256
cores per socket because of [1]
>
>> On baremetal, this has
>> not been a problem since TOPOEXT support (Fam 0x15 and above)
>> predates the support for CPUID leaf 0xb (Fam 0x17[Zen2] and above)
>> however, in virtualized environment, the support for x2APIC can be
>> enabled independent of topoext where QEMU expects the guest to parse
>> the topology and the APICID from CPUID leaf 0xb.
>
> So we're fixing a qemu bug?
>
> Why isn't qemu force-enabling TOPOEXT support when one requests x2APIC?
>
> My initial reaction: fix qemu.
This is possible, however what should be the right thing for
CPUID_Fn8000001E_EBX [Core Identifiers] (Core::X86::Cpuid::CoreId)?
Should QEMU just wrap and start counting the Core Identifiers again
from 0?
Or Should QEMU go ahead and populate just the
CPUID_Fn8000001E_EAX [Extended APIC ID] (Core::X86::Cpuid::ExtApicId)
fields and continue to zero out EBX and ECX when CoreID > 255?
[1] https://github.com/qemu/qemu/commit/35ac5dfbcaa4b
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists