linux-kernel - Re: [printk] fbc14616f4: BUG:kernel_reboot-without-warning_in_test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170407081449.GA12859@amd>
Date:   Fri, 7 Apr 2017 10:14:49 +0200
From:   Pavel Machek <pavel@....cz>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:     Jan Kara <jack@...e.cz>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Ye Xiaolong <xiaolong.ye@...el.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Petr Mladek <pmladek@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Rafael J . Wysocki" <rjw@...ysocki.net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jiri Slaby <jslaby@...e.com>, Len Brown <len.brown@...el.com>,
        linux-kernel@...r.kernel.org, lkp@...org
Subject: Re: [printk]  fbc14616f4:
 BUG:kernel_reboot-without-warning_in_test_stage

On Fri 2017-04-07 16:46:34, Sergey Senozhatsky wrote:
> On (04/07/17 09:15), Pavel Machek wrote:
> > On Fri 2017-04-07 13:44:40, Sergey Senozhatsky wrote:
> > > Hello,
> > > 
> > > On (04/06/17 19:33), Pavel Machek wrote:
> > > > > This patch set gives up part of the printk() reliability for bounded
> > > > > latency (at least unless we detect we are really in trouble) which is IMHO
> > > > > a good trade-off for lots of users (and others can just turn this feature
> > > > > off).
> > > > 
> > > > If they can ever realize they were bitten by this feature.
> > > > 
> > > > Can we go for different tradeoff?
> > > > 
> > > > In console_unlock(), if you detect too much work, print "Too many
> > > > messages to print, %d bytes delayed" and wake up kernel thread.
> > > 
> > > "too many messages" is undefined. console_unlock() can be called from
> > > IRQ handler or with preemtion disabled, or under spin_lock, or under
> > > RCU read lock, etc. etc. By the time we decide to wake up printk_kthread
> > > from console_unlock() it may be already too late.
> > 
> > So lets define "too many messages" as 240 characters. We know printk
> > worked rather well for us for more than 20 years. Kernel code is used
> > to printk taking few miliseconds.
> 
> serial console can be quite slow. and port->lock, that is acquired by
> console_unlock()->call_console_drivers()->write(), is also accessible
> by serial driver's IRQ handler, and this lock may be busy long
> enough -- as long as that IRQ handler transmits/receives chars. but
> that's not the point.

Well. This is what we had for 20 years.

> [..]
> > Yeah? So you know modified printk() does not work, that's why
> > "emergency mode" exists. Unfortunately, you can't rely on fact that
> > you can detect half-crashed machines by printk levels. You usually
> > can't.
> 
> I'm not happy with those printk_emergency_begin()/end(), sure. but that's
> the reality -- every single solution that would offload printing duty implies
> that there will be cases when offloading would not be possible. either
> PENDING_PRINTK_IPI to other CPUs, or irq_work(PENDING_OUTPUT) on a local CPU,
> or anything else (um... what it is?... softirq? tasklet? print one logbuf
> entry from every IRQ handler? dunno, anything else?). There will be cases
> when we won't be able to expect that something will take over and finish
> printing for us. Well, may be I'm missing some other solution that would
> offload printing, eliminating lockup conditions, and at the same time work
> in 100% of the cases.

I don't have magic solution in my sleeve. You made a good case that
spending 30 seconds in printk() is a bad idea. I agree with that. Your
solution is to introduce printk_emergency_begin()/end(). I don't agree
there.

I believe "spend at most 2 seconds in printk(), then print a warning
and offload" is a solution closer to what we had before.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)