lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <243b5260-3dca-4575-a8dd-d9e774de311a@amd.com>
Date: Mon, 27 Oct 2025 11:18:38 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Yazen Ghannam <yazen.ghannam@....com>,
 Michal Pecio <michal.pecio@...il.com>
Cc: x86@...nel.org, regressions@...ts.linux.dev,
 Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Eric DeVolder <eric.devolder@...cle.com>,
 linux-kernel@...r.kernel.org
Subject: Re: AMD topology broken on various 754/AM2+/AM3/AM3+ systems causes
 NB/EDAC/GART regression since 6.14

On 10/24/25 4:32 PM, Yazen Ghannam wrote:
> On Fri, Oct 24, 2025 at 08:46:58PM +0200, Michal Pecio wrote:
>> Hi,
>>
>> This report is related to discussion here:
>> https://lore.kernel.org/all/20251022011610.60d0ba6e.michal.pecio@gmail.com/
>>
>> Commit bc7b2e629e0c ("x86/amd_nb: Use topology info to get AMD node
>> count") bails out if it can't find the NB of each node reportedy by
>> topology. Then NB features like EDAC or GART IOMMU aren't available.
>>
>> Which was maybe not a bad idea, nobody expects those things to work
>> on selected nodes only. (I think?) But it relies on the optimistic
>> assumption that topology knows the true number of nodes.
>>
>> Today I tested 5 older AMD64 systems with socket 754/AM2+/AM3/AM3+
>> on MSI/ASUS motherboards. *All* of them report more than one node if
>> the CPU has fewer cores than supported by the BIOS.
>>
>> (I also have one AM4 system which is OK, but can't speak for others).
>>
>> This is due to peculiarity of their MADT tables - they report as many
>> LAPICs as the BIOS can support and excess LAPICs are simply disabled.
>> FWIW, it's also a pattern that disabled APIC IDs have 0x80 bit set.
>>
>> The kernel counts this as "hotpluggable CPUs", since supposedly it's
>> indistinguishable from actual multi-socket systems before ACPI 6.3,
>> where the "online capable" flag was added to disambiguate hotplug and
>> nonexistent but theoretically supported CPUs.
>>
>> Or at least that's what commit fed8d8773b8e ("x86/acpi/boot: Correct
>> acpi_is_processor_usable() check") seems to imply.
>>
>> On pre-ACPI 6.3 systems those disabled LAPICs inflate topology size
>> and result in breakage on recent kernels. A few examples below give
>> an idea what those MADTs look like and how the kernel reads them.
>>
>> Regards,
>> Michal
>>
>>
>> Athlon 3000+ on S754:
>>
>> [02Fh 0047 001h]               Local Apic ID : 00
>> [030h 0048 004h]       Flags (decoded below) : 00000001	# enabled
>> --
>> [037h 0055 001h]               Local Apic ID : 81
>> [038h 0056 004h]       Flags (decoded below) : 00000000
>>
>> [    0.027690] CPU topo: Max. logical packages:   2
>> [    0.027691] CPU topo: Max. logical dies:       2
>> [    0.027692] CPU topo: Max. dies per package:   1
>> [    0.027703] CPU topo: Max. threads per core:   1
>> [    0.027704] CPU topo: Num. cores per package:     1
>> [    0.027705] CPU topo: Num. threads per package:   1
>> [    0.027706] CPU topo: Allowing 1 present CPUs plus 1 hotplug CPUs
>>
>> Athlon II X2 250 on AM3+:
>>
>> [02Fh 0047 001h]               Local Apic ID : 00
>> [030h 0048 004h]       Flags (decoded below) : 00000001 # enabled
>> --
>> [037h 0055 001h]               Local Apic ID : 01
>> [038h 0056 004h]       Flags (decoded below) : 00000001 # enabled
>> --
>> [03Fh 0063 001h]               Local Apic ID : 82
>> [040h 0064 004h]       Flags (decoded below) : 00000000
>> --
>> [047h 0071 001h]               Local Apic ID : 83
>> [048h 0072 004h]       Flags (decoded below) : 00000000
>> --
>> [04Fh 0079 001h]               Local Apic ID : 84
>> [050h 0080 004h]       Flags (decoded below) : 00000000
>> --
>> [057h 0087 001h]               Local Apic ID : 85
>> [058h 0088 004h]       Flags (decoded below) : 00000000
>> --
>> [05Fh 0095 001h]               Local Apic ID : 86
>> [060h 0096 004h]       Flags (decoded below) : 00000000
>> --
>> [067h 0103 001h]               Local Apic ID : 87
>> [068h 0104 004h]       Flags (decoded below) : 00000000
>>
>> [    0.147372] CPU topo: Max. logical packages:   3 # not sure why not 4
>> [    0.147372] CPU topo: Max. logical dies:       3
>> [    0.147373] CPU topo: Max. dies per package:   1
>> [    0.147379] CPU topo: Max. threads per core:   1
>> [    0.147379] CPU topo: Num. cores per package:     2
>> [    0.147380] CPU topo: Num. threads per package:   2
>> [    0.147381] CPU topo: Allowing 2 present CPUs plus 6 hotplug CPUs
>>
>> Phenom II X4 965 on AM3:
>>
>> [02Fh 0047   1]                Local Apic ID : 00
>> [030h 0048   4]        Flags (decoded below) : 00000001 # enabled
>> --
>> [037h 0055   1]                Local Apic ID : 01
>> [038h 0056   4]        Flags (decoded below) : 00000001 # enabled
>> --
>> [03Fh 0063   1]                Local Apic ID : 02
>> [040h 0064   4]        Flags (decoded below) : 00000001 # enabled
>> --
>> [047h 0071   1]                Local Apic ID : 03
>> [048h 0072   4]        Flags (decoded below) : 00000001 # enabled
>> --
>> [04Fh 0079   1]                Local Apic ID : 84
>> [050h 0080   4]        Flags (decoded below) : 00000000
>> --
>> [057h 0087   1]                Local Apic ID : 85
>> [058h 0088   4]        Flags (decoded below) : 00000000
>>
>> [    0.072112] CPU topo: Max. logical packages:   2
>> [    0.072112] CPU topo: Max. logical dies:       2
>> [    0.072113] CPU topo: Max. dies per package:   1
>> [    0.072118] CPU topo: Max. threads per core:   1
>> [    0.072118] CPU topo: Num. cores per package:     4
>> [    0.072119] CPU topo: Num. threads per package:   4
>> [    0.072120] CPU topo: Allowing 4 present CPUs plus 2 hotplug CPUs
> 
> So far, I think the way to go is add explicit quirk for known issues.
> 
> Please see the patch below.
> 
> Thanks,
> Yazen
> 
> 
>  From eeb0745e973055d8840b536cfa842d6f2bf4ac52 Mon Sep 17 00:00:00 2001
> From: Yazen Ghannam <yazen.ghannam@....com>
> Date: Fri, 24 Oct 2025 21:19:26 +0000
> Subject: [PATCH] x86/topology: Add helper to ignore bogus MADT entries
> 
> Some older Intel and AMD systems include bogus ACPI MADT entries. These
> entries show as "disabled". And it's not clear if they are physically
> present but offline, i.e halted. Or if they are not physically present
> at all.
> 
> Ideally, if they are not physically present, then they should not be
> listed in MADT. There doesn't seem to be any explicit x86 topology info
> that can be used to verify if the entries are bogus or not.
> 
> Add a  helper function to collect vendor-specific checks to ignore bogus
> APIC IDs. Start with known quirks for an Intel SNB model and older AMD
> K10 models.
> 
> Fixes: f0551af02130 ("x86/topology: Ignore non-present APIC IDs in a present package")
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> ---
>   arch/x86/kernel/cpu/topology.c | 52 ++++++++++++++++++++++++++--------
>   1 file changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 6073a16628f9..704788b92395 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -219,6 +219,45 @@ static unsigned int topo_unit_count(u32 lvlid, enum x86_topology_domains at_leve
>   	return cnt;
>   }
>   
> +/*
> + * Some older BIOSes include extra entries in MADT.
> + * Do some vendor-specific checks to ignore them.
> + */
> +static bool ignore_extra_apic_entry(u32 apic_id)
> +{
> +	u32 pkgid = topo_apicid(apic_id, TOPO_PKG_DOMAIN);
> +	struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> +	/* Allow "physically not possible" cases if in a guest. */
> +	if (!hypervisor_is_type(X86_HYPER_NATIVE))
> +	       return false;
> +
> +	/* This model only supports 8 threads in a package. */
> +	if (c->x86_vendor == X86_VENDOR_INTEL &&
> +	    c->x86 == 0x6 && c->x86_model == 0x2d) {
> +		if (topo_unit_count(pkgid, TOPO_PKG_DOMAIN, phys_cpu_present_map) >= 8)
> +			goto reject;
> +	}
> +
> +	/*
> +	 * Various older models have extra entries. A common trait is that the
> +	 * package ID derived from the APIC ID would be more than was ever supported.
> +	 */
> +	if (c->x86_vendor == X86_VENDOR_AMD && c->x86 < 0x17) {

Maybe look for lack of X86_FEATURE_ZEN instead?

> +		pkgid >>= x86_topo_system.dom_shifts[TOPO_PKG_DOMAIN - 1];
> +
> +		if (pkgid >= 8)
> +			goto reject;
> +	}
> +
> +	return false;
> +
> +reject:
> +	pr_info_once("Ignoring hot-pluggable APIC ID %x.\n", apic_id);
> +	topo_info.nr_rejected_cpus++;
> +	return true;
> +}
> +
>   static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
>   {
>   	int cpu, dom;
> @@ -240,19 +279,8 @@ static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
>   		cpuid_to_apicid[cpu] = apic_id;
>   		topo_set_cpuids(cpu, apic_id, acpi_id);
>   	} else {
> -		u32 pkgid = topo_apicid(apic_id, TOPO_PKG_DOMAIN);
> -
> -		/*
> -		 * Check for present APICs in the same package when running
> -		 * on bare metal. Allow the bogosity in a guest.
> -		 */
> -		if (hypervisor_is_type(X86_HYPER_NATIVE) &&
> -		    topo_unit_count(pkgid, TOPO_PKG_DOMAIN, phys_cpu_present_map)) {
> -			pr_info_once("Ignoring hot-pluggable APIC ID %x in present package.\n",
> -				     apic_id);
> -			topo_info.nr_rejected_cpus++;
> +		if (ignore_extra_apic_entry(apic_id))
>   			return;
> -		}
>   
>   		topo_info.nr_disabled_cpus++;
>   	}


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ