[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251023181546.GA771720@yaz-khff2.amd.com>
Date: Thu, 23 Oct 2025 14:15:46 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Michal Pecio <michal.pecio@...il.com>
Cc: Shyam-sundar.S-k@....com, bhelgaas@...gle.com, hdegoede@...hat.com,
ilpo.jarvinen@...ux.intel.com, jdelvare@...e.com,
linux-edac@...r.kernel.org, linux-hwmon@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
linux@...ck-us.net, mario.limonciello@....com,
naveenkrishna.chatradhi@....com,
platform-driver-x86@...r.kernel.org, suma.hegde@....com,
tony.luck@...el.com, x86@...nel.org
Subject: Re: [PATCH v3 06/12] x86/amd_nb: Use topology info to get AMD node
count
On Thu, Oct 23, 2025 at 06:31:54PM +0200, Michal Pecio wrote:
> On Thu, 23 Oct 2025 12:09:06 -0400, Yazen Ghannam wrote:
> > On Thu, Oct 23, 2025 at 05:01:07PM +0200, Michal Pecio wrote:
> > > On Thu, 23 Oct 2025 09:59:35 -0400, Yazen Ghannam wrote:
> > > > Thanks Michal.
> > > >
> > > > I don't see anything obviously wrong.
> > >
> > > Which code is responsible for setting up those bitmaps which
> > > are counted by topology_init_possible_cpus()?
> > >
> > > I guess I could add some printks there and reboot.
> > >
> >
> > The kernel seems to think there are 6 CPUs on your system:
> >
> > [ 0.072059] CPU topo: Allowing 4 present CPUs plus 2 hotplug CPUs
>
> I thought this is because I have NR_CPUS set to 6, as this config
> originally came from the X6 machine, but I am not sure.
>
I'm thinking we should look here: acpi_parse_lapic().
If you add printks in there, I think you'll see the extra CPUs get
registered as "not present" based on the table entries below.
> >
> > We don't seem them enabled, but they may still get APIC IDs. If so, then
> > the IDs would be beyond the core shift of 2.
> >
> > APIC IDs b'0 00 -> CPU0 on logical package 0
> > b'0 01 -> CPU1 on logical package 0
> > b'0 10 -> CPU2 on logical package 0
> > b'0 11 -> CPU3 on logical package 0
> > b'1 00 -> CPU0 on logical package 1
> > b'1 01 -> CPU1 on logical package 1
> >
> >
> > Please try booting with "possible_cpus=4".
>
> OK, will try it next time I'm rebooting.
>
> > The "number of possible CPUs" comes from the ACPI Multiple APIC
> > Description Table (MADT). This has the signature "APIC".
> >
> > Can you please provide the disassembly of this table?
>
> Interesting, it looks like there are indeed 6 LAPICs there.
> BIOS bug? Attaching apic.dsl.
>
> > Can you please share the dmesg output from that system? And the ACPI
> > table too?
>
> Will try later but I don't recall any anomalies there.
> I remember checking the topology output and it made sense:
> 1 package, 1 die, 6 cores, 6 threads.
Thanks, yeah it's likely just fine since the topology matches.
[...]
>
> [04Ch 0076 1] Subtable Type : 00 [Processor Local APIC]
> [04Dh 0077 1] Length : 08
> [04Eh 0078 1] Processor ID : 05
> [04Fh 0079 1] Local Apic ID : 84
> [050h 0080 4] Flags (decoded below) : 00000000
> Processor Enabled : 0
> Runtime Online Capable : 0
>
> [054h 0084 1] Subtable Type : 00 [Processor Local APIC]
> [055h 0085 1] Length : 08
> [056h 0086 1] Processor ID : 06
> [057h 0087 1] Local Apic ID : 85
> [058h 0088 4] Flags (decoded below) : 00000000
> Processor Enabled : 0
> Runtime Online Capable : 0
>
These APIC IDs seem bogus too. I'd expect them to be sequential, but
they jump to 84 and 85. It probably doesn't matter, though we could try
to use these as some secondary indicator that the entries should be
totally ignored.
I expect the 6-core will be sequential though.
I don't know if this is really a BIOS bug, because those entries are
indeed not enabled. This may have just been an optimization they used,
and it seemed to fit within the ambiguity of the ACPI spec at the time.
A quick solution would be to do a quirk for this. Though maybe we can
come up with a generic solution based on what we see so far.
Thanks,
Yazen
Powered by blists - more mailing lists