[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240115085141.MSS4LLsR@linutronix.de>
Date: Mon, 15 Jan 2024 09:51:41 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Petr Mladek <pmladek@...e.com>
Cc: John Ogness <john.ogness@...utronix.de>,
Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
Francesco Dolcini <francesco@...cini.it>,
kernel test robot <oliver.sang@...el.com>
Subject: Re: [PATCH printk v3 02/14] printk: Adjust mapping for 32bit seq
macros
On 2024-01-12 19:14:44 [+0100], Petr Mladek wrote:
>
> That said, I am a bit nervous that a bug like this might cause
> workqueue stall and panic() the kernel.
> At least, this is how I read
> https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com/
well, workqueue stalls and RCU as well because the CPU spins. That is a
natural consequence because the CPU makes no progress (at boot). The
panic _might_ be due to panic_on_error or so.
There is no scheduler, nothing so one CPU is blocked and the world ends…
> It looks like it caused some loop and refcout overlow or so.
> But I might be wrong.
>
> I would like to better understand this and check if we could prevent
> it somehow.
Based on memory: the problem is that the sign extension bug (the fixed
bug) returned the wrong or too low sequence number. So the printk code
tried again to obtain a new sequence number. And got the wrong
again. And this is what looped during boot.
I'm not sure if this sort of lockup can happen now after the bug is
fixed. I can issue a NMI backtrace on all CPUs (32) without the sync (so
they all can printk immediately and not one after the other) and it
prints and continues…
> Best Regards,
> Petr
Sebastian
Powered by blists - more mailing lists