lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wh+cxX2Sxc6RPBKkgYO67o2mdVfW6sQNMYc_x2QoP4LOQ@mail.gmail.com>
Date: Tue, 23 Jul 2024 14:07:12 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: John Ogness <john.ogness@...utronix.de>
Cc: Petr Mladek <pmladek@...e.com>, Sergey Senozhatsky <senozhatsky@...omium.org>, 
	Steven Rostedt <rostedt@...dmis.org>, Andy Shevchenko <andriy.shevchenko@...ux.intel.com>, 
	Rasmus Villemoes <linux@...musvillemoes.dk>, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Thomas Gleixner <tglx@...utronix.de>, Jan Kara <jack@...e.cz>, 
	Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] printk for 6.11

On Tue, 23 Jul 2024 at 13:41, John Ogness <john.ogness@...utronix.de> wrote:
>
> Petr's pull request provides the functionality for a CPU to call
> printk() during emergencies so that each line only goes into the
> buffer. We also include a function to perform the flush at any time. As
> the series is implemented now, that flush happens after the warning is
> completely stored into the buffer. In cases where there is lots of data
> in the warning (such as in RCU stalls or lockdep splats), the flush
> happens after significant parts of the warning.

I really think the flushing needs to be *way* more aggressive for any
oops. The "flush at end" is not even remotely sane.

Some amount of buffering can make sense, eg when printing out the
regular register state over a few lines, there certainly shouldn't be
anything there that can cause problems.

Let me pick a very specific example of a common thing:

   int __die(const char *str, struct pt_regs *regs, long err)

in arch/x86/kernel/dumpstack.c.

Look, do I expect problems in "__die_header()"? No.

But the *moment* you call "notify_die()", you are now calling random
debug code. The register state NEEDS TO HAVE BEEN FLUSHED before this
point.

This is not something I'm willing to debate. Some of the most painful
debugging sessions I have *EVER* had have been due to "debug code that
failed".

Are these things rare? Yes they are. Very. Thankfully.

But the scars left behind by things like "buggy kgdb hook meant that
oops printout never happened at all when kgdb wasn't even enabled" and
having wasted literally *days* on something that would have been
obvious had the oops printout just happened means that I'm very much
in the "once bitten, twice shy" camp.

So that's why I absolutely *ABOHOR* that code in "oops_begin()" that
stops printouts until "oops_end()". It's *EXACTLY* the wrong thing to
do if there's some problem in the middle.

And yes, those problems have happened. Again - rarely, but it's *so*
painful when they do, that I refuse to pull something that I consider
to be this broken.

And yes, I'm convinced we have many other situations where a problem
during printout will silence things (the obvious one being locking
issues with the printing itself). But I refuse to have that silence be
an integral part of the die() code.

                 Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ