linux-kernel - Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r1xog3hw.fsf@linutronix.de>
Date:   Thu, 19 Mar 2020 09:20:27 +0100
From:   John Ogness <john.ogness@...utronix.de>
To:     Eugeniu Rosca <erosca@...adit-jv.com>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        <linux-kernel@...r.kernel.org>, Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Jisheng Zhang <Jisheng.Zhang@...aptics.com>,
        Valdis Kletnieks <valdis.kletnieks@...edu>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Andrew Gabbasov <andrew_gabbasov@...tor.com>,
        Dirk Behme <dirk.behme@...bosch.com>,
        Eugeniu Rosca <roscaeugeniu@...il.com>
Subject: Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup

On 2020-03-18, Eugeniu Rosca <erosca@...adit-jv.com> wrote:
> On Tue, Mar 17, 2020 at 11:18:18AM +0900, Sergey Senozhatsky wrote:
>> On (20/03/15 18:09), Eugeniu Rosca wrote:
>>> @@ -428,6 +428,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>>>  			}
>>>  		}
>>>  
>>> +		console_verbose_start();
>>> +
>>>  		pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
>>>  			smp_processor_id(), duration,
>>>  			current->comm, task_pid_nr(current));
>>> @@ -453,6 +455,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>>>  		if (softlockup_panic)
>>>  			panic("softlockup: hung tasks");
>>>  		__this_cpu_write(soft_watchdog_warn, true);
>>> +
>>> +		console_verbose_end();
>>>  	} else
>>>  		__this_cpu_write(soft_watchdog_warn, false);
>> 
>> I'm afraid, as of now, this approach is not going to work the way it's
>> supposed to work in 100% of cases. Because the only thing that printk()
>> call sort of guarantees is that the message will be stored somewhere.
>> Either in the main kernel log buffer, on in one of auxiliary per-CPU
>> log buffers. It does not guarantee, generally speaking, that the message
>> will be printed on the console immediately.
>
> I take this passage as an acknowledgement of the problem being _real_,
> in spite of the fix being not perfect.
>
> One aspect I would like to emphasize is that (please, NAK this
> statement if it's not accurate) the problem reported in this patch is
> not specific to the existing printk mechanism, but also applies to the
> upcoming kthread-based printk. If that's true, then IMHO this is a
> compelling argument to join forces and try to find a working, safe and
> future-proof solution.

Let me clarify that the upcoming rework is _not_ simply to make a
kthread-based printk. There upcoming rework addresses several issues
(kthreads being only a piece):

1. allow printk-callers to get their messages immediately and locklessly
   into the log buffer from any context

2. during normal operation, printk-callers are completely decoupled from
   console printing

3. in the case of an emergency, every printk-caller will directly and
   synchronously perform console printing of its messages on consoles
   that support atomic writing

For the rework we decided that triggering an oops is what irreversibly
transitions the system from "normal operation" to "emergency".

One could interpret the current "console_verbose()" to be some sort of
equivalent to irreversibly transitioning the system to "emergency". But
that is not how it was decided to be interpreted. For the rework,
printk-callers do not have any influence on forcing immediate console
printing (unless they trigger an oops). console_verbose() is still
relevant in the rework. console_verbose() is affecting _what_ is printed
to consoles and _not when_.

Once the printk rework is complete, the mechanisms for atomic and
synchronous console printing will be in place. It would be possible that
these mechanisms could also be used in non-oops scenarios. But that
would need to be discussed in depth and clearly defined with caution for
abuse.

John Ogness