[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230329103943.GAZCQVb1n3tKlGOAWI@fat_crate.local>
Date: Wed, 29 Mar 2023 12:39:43 +0200
From: Borislav Petkov <bp@...en8.de>
To: Gabriel David <ultracoolguy@...root.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
David R <david@...olicited.net>,
Kishon Vijay Abraham I <kvijayab@....com>
Subject: Re: Panic starting 6.2.x and later 6.1.x kernels
On Tue, Mar 28, 2023 at 09:26:16PM -0400, Gabriel David wrote:
>
> On 3/28/23 1:10 PM, Borislav Petkov wrote:
> > On Tue, Mar 28, 2023 at 04:06:41PM +0100, David R wrote:
> > > Yes, that patch fixes it also. By all means add my tested by:
> > Ok, thanks for checking. That issue is still weird, tho, and we don't have
> > an idea why that happens.
> >
> > If you could test your original, failing kernel with "nointremap" on the
> > command line, that would be cool.
> >
> > Thx.
> >
> I have the same problem, and while I haven't tested the commit you mentioned
> earlier, `nointremap` on the failing kernels(6.1.x and 6.2.3) worked.
>
> So far, apart from this mail thread I've found this reddit thread with the
> issue https://reddit.com/r/archlinux/comments/11ux6uh/stuck_at_loading_initial_ramdisk/
> , and to them updating the BIOS worked. However, to me it didn't. Another
> thing is that David, that person, and me all use 1st gen Ryzen processors(in
> my case, a Ryzen 3 1200).
Yeah, this looks like something's borked with interrupt remapping and
timer interrupt when the code looks at that online capable bit. I guess
interrupt remapping doesn't consider that bit and still remaps to cores
which are now *not* onlined, leading to the panic.
But this is all conjecture of me trying to connect the IO-APIC
observation to this online capable bit.
And, ofcourse, I cannot trigger it:
[ 0.000000] Linux version 6.1.21 (root@...c) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT_DYNAMIC Wed Mar 29 12:00:57 CEST 2023
...
[ 0.200425] smpboot: CPU0: AMD EPYC 7251 8-Core Processor (family: 0x17, model: 0x1, stepping: 0x2)
...
[ 4.019751] AMD-Vi: Interrupt remapping enabled
So it looks like only some Zen1 client BIOSes are b0rked. Which is
swell, again. ;-\
But let's wait for tglx to look at this first.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists