linux-kernel - Re: [PATCH printk v3 02/14] printk: Adjust mapping for 32bit seq macros

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240115085141.MSS4LLsR@linutronix.de>
Date: Mon, 15 Jan 2024 09:51:41 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Petr Mladek <pmladek@...e.com>
Cc: John Ogness <john.ogness@...utronix.de>,
	Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	Francesco Dolcini <francesco@...cini.it>,
	kernel test robot <oliver.sang@...el.com>
Subject: Re: [PATCH printk v3 02/14] printk: Adjust mapping for 32bit seq
 macros

On 2024-01-12 19:14:44 [+0100], Petr Mladek wrote:
> 
> That said, I am a bit nervous that a bug like this might cause
> workqueue stall and panic() the kernel.

> At least, this is how I read
> https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com/

well, workqueue stalls and RCU as well because the CPU spins. That is a
natural consequence because the CPU makes no progress (at boot). The
panic _might_ be due to panic_on_error or so.
There is no scheduler, nothing so one CPU is blocked and the world ends…

> It looks like it caused some loop and refcout overlow or so.
> But I might be wrong.
> 
> I would like to better understand this and check if we could prevent
> it somehow.

Based on memory: the problem is that the sign extension bug (the fixed
bug) returned the wrong or too low sequence number. So the printk code
tried again to obtain a new sequence number. And got the wrong
again. And this is what looped during boot. 

I'm not sure if this sort of lockup can happen now after the bug is
fixed. I can issue a NMI backtrace on all CPUs (32) without the sync (so
they all can printk immediately and not one after the other) and it
prints and continues…

> Best Regards,
> Petr

Sebastian