lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e3a8e247-0ced-4354-b7cf-25ee7beb9987@amd.com>
Date: Tue, 19 Aug 2025 19:58:52 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Borislav Petkov <bp@...en8.de>
CC: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Dave
 Hansen <dave.hansen@...ux.intel.com>, Sean Christopherson
	<seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, <x86@...nel.org>,
	Naveen rao <naveen.rao@....com>, Sairaj Kodilkar <sarunkod@....com>, "H.
 Peter Anvin" <hpa@...or.com>, "Peter Zijlstra (Intel)"
	<peterz@...radead.org>, "Xin Li (Intel)" <xin@...or.com>, Pawan Gupta
	<pawan.kumar.gupta@...ux.intel.com>, <linux-kernel@...r.kernel.org>,
	<kvm@...r.kernel.org>, Mario Limonciello <mario.limonciello@....com>,
	"Gautham R. Shenoy" <gautham.shenoy@....com>, Babu Moger
	<babu.moger@....com>, Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: Re: [PATCH v3 0/4] x86/cpu/topology: Work around the nuances of
 virtualization on AMD/Hygon

Hello Boris,

On 8/19/2025 5:04 PM, Borislav Petkov wrote:
> Lemme try to make some sense of this because the wild use of names and things
> is making my head spin...
> 
> On Mon, Aug 18, 2025 at 06:04:31AM +0000, K Prateek Nayak wrote:
>> When running an AMD guest on QEMU with > 255 cores, the following FW_BUG
>> was noticed with recent kernels:
>>
>>     [Firmware Bug]: CPU 512: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0200
>>
>> Naveen, Sairaj debugged the cause to commit c749ce393b8f ("x86/cpu: Use
>> common topology code for AMD") where, after the rework, the initial
>> APICID was set using the CPUID leaf 0x8000001e EAX[31:0] as opposed to
> 
> That's
> 
> CPUID_Fn8000001E_ECX [Node Identifiers] (Core::X86::Cpuid::NodeId)

Small correction here, this is actually,

CPUID_Fn8000001E_EAX [Extended APIC ID] (Core::X86::Cpuid::ExtApicId)

> 
>> the value from CPUID leaf 0xb EDX[31:0] previously.
> 
> That's
> 
> CPUID_Fn0000000B_EDX [Extended Topology Enumeration]
> (Core::X86::Cpuid::ExtTopEnumEdx)
> 
>> This led us down a rabbit hole of XTOPOEXT vs TOPOEXT support, preferred
> 
> What is XTOPOEXT? 
> 
> CPUID_Fn0000000B_EDX?
> 
> Please define all your things properly so that we can have common base when
> reading this text.

Sorry about that! This should actually be "X86_FEATURE_XTOPOLOGY" which
is a synthetic feature set when topology parsing via one of the following
CPUID leaf is successful:

- 0x1f
  V2 Extended Topology Enumeration Leaf
  (Intel only)

- 0x80000026
  CPUID_Fn80000026_E[A,B,C]X_x0[0...3] [Extended CPU Topology]
  Core::X86::Cpuid::ExCpuTopologyE[a,b,c]x[0..3]
  (AMD only)

- 0xb
  CPUID_Fn0000000B_E[A,B,C]X_x0[0..2] [Extended Topology Enumeration]
  Core::X86::Cpuid::ExtTopEnumE[a,b,c]x[0..2]
  (Both Intel and AMD)

The parsing of the leaves is tried in the same order as above.

> 
> TOPOEXT is, I presume:
> 
> #define X86_FEATURE_TOPOEXT		( 6*32+22) /* "topoext" Topology extensions CPUID leafs */
> 
> Our PPR says:
> 
> CPUID_Fn80000001_ECX [Feature Identifiers] (Core::X86::Cpuid::FeatureExtIdEcx)
> 
> "22 TopologyExtensions: topology extensions support. Read-only. Reset:
> Fixed,1. 1=Indicates support for Core::X86::Cpuid::CachePropEax0 and
> Core::X86::Cpuid::ExtApicId."
> 
> Those leafs are:
> 
> CPUID_Fn8000001D_EAX_x00 [Cache Properties (DC)] (Core::X86::Cpuid::CachePropEax0)
> 
> DC topology info. Probably not important for this here.
> 
> and
> 
> CPUID_Fn8000001E_EAX [Extended APIC ID] (Core::X86::Cpuid::ExtApicId)
> 
> the extended APIC ID is there.
> 
> How is this APIC ID different from the extended APIC ID in
> 
> CPUID_Fn0000000B_EDX [Extended Topology Enumeration] (Core::X86::Cpuid::ExtTopEnumEdx)
> 
> ?

On baremetal, they are the same. On QEMU, when we launch a guest with
a topology that contains more than 256 cores on a single socket, QEMU
0s out all the bits in CPUID_Fn8000001E [1] since it fears a collision
in the "CoreId[7:0]" field of
"CPUID_Fn8000001E_EBX [Core Identifiers] (Core::X86::Cpuid::CoreId)"

Since
"CPUID_Fn0000000B_EBX_x01 [Extended Topology Enumeration]" and 
"LogProcAtThisLevel[15:0]" can describe a domain with up to 2^16 cores,
the Core ID can always be derived correctly from this even when the
number of cores in the guest topology crosses 256.

> 
>> order of their parsing, and QEMU nuances like [1] where QEMU 0's out the
>> CPUID leaf 0x8000001e on CPUs where Core ID crosses 255 fearing a
>> Core ID collision in the 8 bit field which leads to the reported FW_BUG.
> 
> Is that what the hw does though?

We don't have baremetal systems with more than 256 cores per socket and
when that happens, I believe the expectation from H/W is to just use
CPUID_Fn80000026 leaf or the CPUID_Fn0000000B leaf.

> 
> Has this been verified instead of willy nilly clearing CPUID leafs in qemu?
> 
>> Following were major observations during the debug which the two
>> patches address respectively:
>>
>> 1. The support for CPUID leaf 0xb is independent of the TOPOEXT feature
> 
> Yes, PPR says so.
> 
>>    and is rather linked to the x2APIC enablement.
> 
> Because the SDM says:
> 
> "Bits 31-00: x2APIC ID of the current logical processor."
> 
> ?

SDM Vol. 3A Sec. 11.12.8 "CPUID Extensions And Topology Enumeration"
reads:

  For Intel 64 and IA-32 processors that support x2APIC, a value of 1
  reported by CPUID.01H:ECX[21] indicates that the processor supports
  x2APIC and the extended topology enumeration leaf (CPUID.0BH).

  The extended topology enumeration leaf can be accessed by executing
  CPUID with EAX = 0BH. Processors that do not support x2APIC may
  support CPUID leaf 0BH. Software can detect the availability of the
  extended topology enumeration leaf (0BH) by performing two steps:

  1. Check maximum input value for basic CPUID information by executing
     CPUID with EAX= 0. If CPUID.0H:EAX is greater than or equal or 11
     (0BH), then proceed to next step

  2. Check CPUID.EAX=0BH, ECX=0H:EBX is non-zero.

  If both of the above conditions are true, extended topology
  enumeration leaf is available.

> 
> Is our version not containing the x2APIC ID?

We too have the Extended APIC ID in both CPUID_Fn0000000B and
CPUID_Fn8000001E_EAX and they both match on baremetal. The problem is
only for virtualized guest whose topology contains more than 256
cores per socket because of [1]

> 
>> On baremetal, this has
>>    not been a problem since TOPOEXT support (Fam 0x15 and above)
>>    predates the support for CPUID leaf 0xb (Fam 0x17[Zen2] and above)
>>    however, in virtualized environment, the support for x2APIC can be
>>    enabled independent of topoext where QEMU expects the guest to parse
>>    the topology and the APICID from CPUID leaf 0xb.
> 
> So we're fixing a qemu bug?
> 
> Why isn't qemu force-enabling TOPOEXT support when one requests x2APIC?
> 
> My initial reaction: fix qemu.

This is possible, however what should be the right thing for
CPUID_Fn8000001E_EBX [Core Identifiers] (Core::X86::Cpuid::CoreId)?

Should QEMU just wrap and start counting the Core Identifiers again
from 0?

Or Should QEMU go ahead and populate just the
CPUID_Fn8000001E_EAX [Extended APIC ID] (Core::X86::Cpuid::ExtApicId)
fields and continue to zero out EBX and ECX when CoreID > 255?

[1] https://github.com/qemu/qemu/commit/35ac5dfbcaa4b

-- 
Thanks and Regards,
Prateek


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ