lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251103084021.61971a89.michal.pecio@gmail.com>
Date: Mon, 3 Nov 2025 08:40:21 +0100
From: Michal Pecio <michal.pecio@...il.com>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: x86@...nel.org, regressions@...ts.linux.dev, Thomas Gleixner
 <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov
 <bp@...en8.de>, Mario Limonciello <mario.limonciello@....com>, Eric
 DeVolder <eric.devolder@...cle.com>, linux-kernel@...r.kernel.org
Subject: Re: AMD topology broken on various 754/AM2+/AM3/AM3+ systems causes
 NB/EDAC/GART regression since 6.14

On Fri, 24 Oct 2025 17:32:04 -0400, Yazen Ghannam wrote:
> So far, I think the way to go is add explicit quirk for known issues.
> 
> Please see the patch below.
> 
> Thanks,
> Yazen
> 
> 
> From eeb0745e973055d8840b536cfa842d6f2bf4ac52 Mon Sep 17 00:00:00 2001
> From: Yazen Ghannam <yazen.ghannam@....com>
> Date: Fri, 24 Oct 2025 21:19:26 +0000
> Subject: [PATCH] x86/topology: Add helper to ignore bogus MADT entries
> 
> Some older Intel and AMD systems include bogus ACPI MADT entries. These
> entries show as "disabled". And it's not clear if they are physically
> present but offline, i.e halted. Or if they are not physically present
> at all.
> 
> Ideally, if they are not physically present, then they should not be
> listed in MADT. There doesn't seem to be any explicit x86 topology info
> that can be used to verify if the entries are bogus or not.
> 
> Add a  helper function to collect vendor-specific checks to ignore bogus
> APIC IDs. Start with known quirks for an Intel SNB model and older AMD
> K10 models.
> 
> Fixes: f0551af02130 ("x86/topology: Ignore non-present APIC IDs in a present package")
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> ---
>  arch/x86/kernel/cpu/topology.c | 52 ++++++++++++++++++++++++++--------
>  1 file changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 6073a16628f9..704788b92395 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -219,6 +219,45 @@ static unsigned int topo_unit_count(u32 lvlid, enum x86_topology_domains at_leve
>  	return cnt;
>  }
>  
> +/*
> + * Some older BIOSes include extra entries in MADT.
> + * Do some vendor-specific checks to ignore them.
> + */
> +static bool ignore_extra_apic_entry(u32 apic_id)
> +{
> +	u32 pkgid = topo_apicid(apic_id, TOPO_PKG_DOMAIN);
> +	struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> +	/* Allow "physically not possible" cases if in a guest. */
> +	if (!hypervisor_is_type(X86_HYPER_NATIVE))
> +	       return false;
> +
> +	/* This model only supports 8 threads in a package. */
> +	if (c->x86_vendor == X86_VENDOR_INTEL &&
> +	    c->x86 == 0x6 && c->x86_model == 0x2d) {
> +		if (topo_unit_count(pkgid, TOPO_PKG_DOMAIN, phys_cpu_present_map) >= 8)

Looks like possible functional change compared to the code it replaces,
perhaps it should be a separate patch?

> +			goto reject;
> +	}
> +
> +	/*
> +	 * Various older models have extra entries. A common trait is that the
> +	 * package ID derived from the APIC ID would be more than was ever supported.
> +	 */
> +	if (c->x86_vendor == X86_VENDOR_AMD && c->x86 < 0x17) {
> +		pkgid >>= x86_topo_system.dom_shifts[TOPO_PKG_DOMAIN - 1];
> +
> +		if (pkgid >= 8)
> +			goto reject;

Yes, this works. The excess entries are counted as "rejected" rather
than "hotpluggable", package count is correct and EDAC/GART are back.

But to be exact, the known case is (apic_id >= 0x80) rather than
(pkgid >= 8). The latter assumes not only that there are no more than
8 packages, but also that their IDs start from zero.

I have this AM4 system with some proprietary HP BIOS:

[02Fh 0047 001h]               Local Apic ID : 10
[037h 0055 001h]               Local Apic ID : 11
[03Fh 0063 001h]               Local Apic ID : 12
[047h 0071 001h]               Local Apic ID : 13

domain: Thread     shift: 0 dom_size:     1 max_threads:     1
domain: Core       shift: 4 dom_size:    16 max_threads:    16
domain: Module     shift: 4 dom_size:     1 max_threads:    16
domain: Tile       shift: 4 dom_size:     1 max_threads:    16
domain: Die        shift: 4 dom_size:     1 max_threads:    16
domain: DieGrp     shift: 4 dom_size:     1 max_threads:    16
domain: Package    shift: 4 dom_size:     1 max_threads:    16

It seems that pkgid is 0x1 here, which is not a problem because
it's single socket, but dunno if HP or somebody else couldn't do
similar things in an 8-socket system and end up with pkgid > 8.


By the way, do you know what's the reason I don't have /sys directories
for those phantom CPUs (before this patch) and should I have them if
they were legitimate hotplug CPUs?

Maybe there is already some check performed in other part of the kernel
which rejects those CPUs and which could be replicated here.

> +	}
> +
> +	return false;
> +
> +reject:
> +	pr_info_once("Ignoring hot-pluggable APIC ID %x.\n", apic_id);
> +	topo_info.nr_rejected_cpus++;
> +	return true;
> +}
> +
>  static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
>  {
>  	int cpu, dom;
> @@ -240,19 +279,8 @@ static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
>  		cpuid_to_apicid[cpu] = apic_id;
>  		topo_set_cpuids(cpu, apic_id, acpi_id);
>  	} else {
> -		u32 pkgid = topo_apicid(apic_id, TOPO_PKG_DOMAIN);
> -
> -		/*
> -		 * Check for present APICs in the same package when running
> -		 * on bare metal. Allow the bogosity in a guest.
> -		 */
> -		if (hypervisor_is_type(X86_HYPER_NATIVE) &&
> -		    topo_unit_count(pkgid, TOPO_PKG_DOMAIN, phys_cpu_present_map)) {
> -			pr_info_once("Ignoring hot-pluggable APIC ID %x in present package.\n",
> -				     apic_id);
> -			topo_info.nr_rejected_cpus++;
> +		if (ignore_extra_apic_entry(apic_id))
>  			return;
> -		}
>  
>  		topo_info.nr_disabled_cpus++;
>  	}
> -- 
> 2.51.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ