lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240115085141.MSS4LLsR@linutronix.de>
Date: Mon, 15 Jan 2024 09:51:41 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Petr Mladek <pmladek@...e.com>
Cc: John Ogness <john.ogness@...utronix.de>,
	Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	Francesco Dolcini <francesco@...cini.it>,
	kernel test robot <oliver.sang@...el.com>
Subject: Re: [PATCH printk v3 02/14] printk: Adjust mapping for 32bit seq
 macros

On 2024-01-12 19:14:44 [+0100], Petr Mladek wrote:
> 
> That said, I am a bit nervous that a bug like this might cause
> workqueue stall and panic() the kernel.


> At least, this is how I read
> https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com/

well, workqueue stalls and RCU as well because the CPU spins. That is a
natural consequence because the CPU makes no progress (at boot). The
panic _might_ be due to panic_on_error or so.
There is no scheduler, nothing so one CPU is blocked and the world ends…

> It looks like it caused some loop and refcout overlow or so.
> But I might be wrong.
> 
> I would like to better understand this and check if we could prevent
> it somehow.

Based on memory: the problem is that the sign extension bug (the fixed
bug) returned the wrong or too low sequence number. So the printk code
tried again to obtain a new sequence number. And got the wrong
again. And this is what looped during boot. 

I'm not sure if this sort of lockup can happen now after the bug is
fixed. I can issue a NMI backtrace on all CPUs (32) without the sync (so
they all can printk immediately and not one after the other) and it
prints and continues…

> Best Regards,
> Petr

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ