[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <77LRvZMtvNz9KxSX1LGsh_VparNGAmJL2gYXBH7oY_3de_ka2avlfbuHE_BL3OgtGHyVMU34Ln2TSLrT1l1rTpBvUUtI9QUTH1je0jFFlkM=@uplinklabs.net>
Date: Thu, 04 Dec 2025 20:16:36 +0000
From: Steven Noonan <steven@...inklabs.net>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: linux-kernel@...r.kernel.org, Ariadne Conill <ariadne@...adne.space>, x86@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH 1/2] x86/amd_node: fix integer divide by zero during init
(I apologize in advance if my email comes out formatted strangely, I haven't used ProtonMail for LKML before. I don't think it is line-wrapping properly.)
On Wednesday, December 3rd, 2025 at 12:45 PM, Yazen Ghannam <yazen.ghannam@....com> wrote:
> On Fri, Nov 14, 2025 at 07:57:35PM +0000, Steven Noonan wrote:
>
> Thanks Steven for the patch.
>
> > On a Xen dom0 boot, this feature does not behave, and we end up
> > calculating:
> >
> > num_roots = 1
> > num_nodes = 2
> > roots_per_node = 0
> >
> > This causes a divide-by-zero in the modulus inside the loop.
>
>
> Can you please share more details of the system topology?
>
> I think the list of PCI devices is a good start.
Sure, but it's running as the paravirtual control domain for Xen. The `lspci` topology output won't differ between bare-metal and dom0, but dom0's accesses to certain MSRs and PCI registers may be masked and manipulated, which is probably why this is breaking.
I've attached `lspci -nn` and a CPUID dump from CPU0 -- both of these are while running under Xen.
> > This change adds a couple of guards for invalid states where we might
> > get a divide-by-zero.
>
>
> This statement should be imperative, ex. "Add a couple of guards...".
>
> Also, the commit message should generally be in a passive voice (no
> "we"), ex. "...where a divide-by-zero may result."
Ack. I can fix these and the subsequent suggestions for version 2.
> > Signed-off-by: Steven Noonan steven@...inklabs.net
> > Signed-off-by: Ariadne Conill ariadne@...adne.space
>
>
> The Signed-off-by lines should be in the order of handling. If you are
> sending the patch, then your line should be last. If there are other
> contributors, then they should have a Co-developed-by tag in addition to
> Signed-off-by.
>
> > CC: Yazen Ghannam yazen.ghannam@....com
> > CC: x86@...r.kernel.org
> > CC: stable@...r.kernel.org
>
>
> There should be a Fixes tag along with "Cc: stable", if possible.
Ack.
> > ---
> > arch/x86/kernel/amd_node.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c
> > index 3d0a4768d603c..cdc6ba224d4ad 100644
> > --- a/arch/x86/kernel/amd_node.c
> > +++ b/arch/x86/kernel/amd_node.c
> > @@ -282,6 +282,17 @@ static int __init amd_smn_init(void)
> > return -ENODEV;
> >
> > num_nodes = amd_num_nodes();
> > +
> > + if (!num_nodes)
> > + return -ENODEV;
>
>
> This is generally a good check. But I think it is unnecessary in this
> case, since the minimum value is '1'. The topology init code initializes
> the factors used in amd_num_nodes() to '1' before trying to find the
> true values from CPUID, etc.
>
> > +
> > + /* Possibly a virtualized environment (e.g. Xen) where we wi
>
>
> Multi-line comments should start on the next line according to kernel
> coding style.
>
> /*
> * Comment
> * Info
> */
>
> > ll get
> > + * roots_per_node=0 if the number of roots is fewer than number of
> > + * nodes
> > + */
> > + if (num_roots < num_nodes)
> > + return -ENODEV;
>
>
> I think this is a fair check. But I'd like to understand how the
> topology looks in this case.
>
> Thanks,
> Yazen
View attachment "dom0-cpuid.txt" of type "text/plain" (4657 bytes)
View attachment "dom0-lspci.txt" of type "text/plain" (5567 bytes)
Download attachment "signature.asc" of type "application/pgp-signature" (344 bytes)
Powered by blists - more mailing lists