[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251022133901.GB7243@yaz-khff2.amd.com>
Date: Wed, 22 Oct 2025 09:39:01 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Michal Pecio <michal.pecio@...il.com>
Cc: Shyam-sundar.S-k@....com, bhelgaas@...gle.com, hdegoede@...hat.com,
ilpo.jarvinen@...ux.intel.com, jdelvare@...e.com,
linux-edac@...r.kernel.org, linux-hwmon@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
linux@...ck-us.net, mario.limonciello@....com,
naveenkrishna.chatradhi@....com,
platform-driver-x86@...r.kernel.org, suma.hegde@....com,
tony.luck@...el.com, x86@...nel.org
Subject: Re: [PATCH v3 06/12] x86/amd_nb: Use topology info to get AMD node
count
On Wed, Oct 22, 2025 at 01:16:10AM +0200, Michal Pecio wrote:
> > Currently, the total AMD node count is determined by searching and
> > counting CPU/node devices using PCI IDs.
> >
> > However, AMD node information is already available through topology
> > CPUID/MSRs. The recent topology rework has made this info easier to
> > access.
> >
> > Replace the node counting code with a simple product of topology info.
> >
> > Every node/northbridge is expected to have a 'misc' device. Clear
> > everything out if a 'misc' device isn't found on a node.
>
> Hi,
>
> I have a weird/buggy AM3 machine (Asus M4A88TD-M EVO, Phenom 965) where
> the kernel believes there are two packages and this assumption fails.
>
> [ 0.072051] CPU topo: Max. logical packages: 2
> [ 0.072052] CPU topo: Max. logical dies: 2
> [ 0.072052] CPU topo: Max. dies per package: 1
> [ 0.072057] CPU topo: Max. threads per core: 1
> [ 0.072058] CPU topo: Num. cores per package: 4
> [ 0.072059] CPU topo: Num. threads per package: 4
>
> It's currently on v6.12 series and working OK, but I remember trying
> v6.15 at one point and finding that EDAC and GART IOMMU were broken
> because the NB driver failed to initialize. This fixed it:
>
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -496,8 +496,8 @@ void __init topology_init_possible_cpus(void)
> total_cpus = allowed;
> set_nr_cpu_ids(allowed);
>
> - cnta = domain_weight(TOPO_PKG_DOMAIN);
> - cntb = domain_weight(TOPO_DIE_DOMAIN);
> + cnta = 1;
> + cntb = 1;
> __max_logical_packages = cnta;
> __max_dies_per_package = 1U << (get_count_order(cntb) - get_count_order(cnta));
>
> It was a few weeks ago and the machine is currently back on v6.12,
> but I'm almost sure I tracked it down to this exact code:
>
> > + amd_northbridges.num = amd_num_nodes();
> > [...]
> > + /*
> > + * Each Northbridge must have a 'misc' device.
> > + * If not, then uninitialize everything.
> > + */
> > + if (!node_to_amd_nb(i)->misc) {
> > + amd_northbridges.num = 0;
> > + kfree(nb);
> > + return -ENODEV;
> > + }
>
Hi Michal,
Thanks for reporting this.
Can you please share the full output from dmesg and lspci?
Also, can you please share the raw CPUID output (cpuid -r)?
Thanks,
Yazen
Powered by blists - more mailing lists