lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200318180525.GA5790@lxhi-065.adit-jv.com>
Date:   Wed, 18 Mar 2020 19:05:25 +0100
From:   Eugeniu Rosca <erosca@...adit-jv.com>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
CC:     <linux-kernel@...r.kernel.org>,
        John Ogness <john.ogness@...utronix.de>,
        Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Jisheng Zhang <Jisheng.Zhang@...aptics.com>,
        Valdis Kletnieks <valdis.kletnieks@...edu>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Andrew Gabbasov <andrew_gabbasov@...tor.com>,
        Dirk Behme <dirk.behme@...bosch.com>,
        Eugeniu Rosca <roscaeugeniu@...il.com>,
        Eugeniu Rosca <erosca@...adit-jv.com>
Subject: Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when
 reporting softlockup

Hi Sergey,

Many thanks for your feedback!

On Tue, Mar 17, 2020 at 11:18:18AM +0900, Sergey Senozhatsky wrote:
> On (20/03/15 18:09), Eugeniu Rosca wrote:
> 
> [..]
> 
> > @@ -428,6 +428,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> >  			}
> >  		}
> >  
> > +		console_verbose_start();
> > +
> >  		pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
> >  			smp_processor_id(), duration,
> >  			current->comm, task_pid_nr(current));
> > @@ -453,6 +455,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> >  		if (softlockup_panic)
> >  			panic("softlockup: hung tasks");
> >  		__this_cpu_write(soft_watchdog_warn, true);
> > +
> > +		console_verbose_end();
> >  	} else
> >  		__this_cpu_write(soft_watchdog_warn, false);
> 
> 
> I'm afraid, as of now, this approach is not going to work the way it's
> supposed to work in 100% of cases. Because the only thing that printk()
> call sort of guarantees is that the message will be stored somewhere.
> Either in the main kernel log buffer, on in one of auxiliary per-CPU
> log buffers. It does not guarantee, generally speaking, that the message
> will be printed on the console immediately.

I take this passage as an acknowledgement of the problem being _real_,
in spite of the fix being not perfect.

One aspect I would like to emphasize is that (please, NAK this
statement if it's not accurate) the problem reported in this patch is
not specific to the existing printk mechanism, but also applies to the
upcoming kthread-based printk. If that's true, then IMHO this is a
compelling argument to join forces and try to find a working, safe and
future-proof solution.

> 
> Consider the following example:
> 
> 	CPU0				CPU1
> 	console_lock();
> 	schedule();
> 
> 					watchdog()
> 					 console_verbose_start();
> 					  printk()
> 					   log_store()
> 					    if (!console_trylock())
> 					      return;
> 					 console_verbose_end();
> 
> 	...
> 	console_unlock()
> 	 print logbuf messages to the consoles
> 	 we missed the console_verbose_start/end
> 	 on CPU1

This looks plausible. However, I wonder to which degree the same
scenario is a concern in the kthread-based approach?

My current standpoint is that as long as points [A-D] are met, it
should do no harm to accept a (partial) fix like seen in my series:

 - [A] the patch tackles at least a subset of problematic use-cases
 - [B] the fix is non-intrusive and easy to review
 - [C] there is hope to reuse it in the new lockless buffer based printk
 - [D] there are no regressions employing the major console knobs
       (ignore_loglevel, quiet, loglevel, etc) as it happened in
  a6ae928c25835c ("Revert "printk: make sure to print log on console."")

>From the above points, my only major concern is that current series
breaks the expectations of users who pass loglevel=0 on kernel
command line and expect the system to be totally silent. This has
already been expressed in the cover letter. I would especially
appreciate if the same view is shared (or invalidated) by others.

> 
> IIRC, we had a similar approach in the past. See commit 375899cddcbb26
> ("printk: make sure to print log on console"). And we reverted it, see
> a6ae928c25835 ("Revert "printk: make sure to print log on console.").

Thanks for this reference. It looks to me that in spite of being
relatively compact, commit 375899cddcbb26 ("printk: make sure to print
log on console.") broke criteria [D] listed above. I intend to avoid
it by testing multiple console knob values on my arm64 system.

Looking forward to your feedback on the questions posted above. TIA!

-- 
Best Regards
Eugeniu Rosca

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ