[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170407081449.GA12859@amd>
Date: Fri, 7 Apr 2017 10:14:49 +0200
From: Pavel Machek <pavel@....cz>
To: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc: Jan Kara <jack@...e.cz>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Ye Xiaolong <xiaolong.ye@...el.com>,
Steven Rostedt <rostedt@...dmis.org>,
Petr Mladek <pmladek@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
"Rafael J . Wysocki" <rjw@...ysocki.net>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jslaby@...e.com>, Len Brown <len.brown@...el.com>,
linux-kernel@...r.kernel.org, lkp@...org
Subject: Re: [printk] fbc14616f4:
BUG:kernel_reboot-without-warning_in_test_stage
On Fri 2017-04-07 16:46:34, Sergey Senozhatsky wrote:
> On (04/07/17 09:15), Pavel Machek wrote:
> > On Fri 2017-04-07 13:44:40, Sergey Senozhatsky wrote:
> > > Hello,
> > >
> > > On (04/06/17 19:33), Pavel Machek wrote:
> > > > > This patch set gives up part of the printk() reliability for bounded
> > > > > latency (at least unless we detect we are really in trouble) which is IMHO
> > > > > a good trade-off for lots of users (and others can just turn this feature
> > > > > off).
> > > >
> > > > If they can ever realize they were bitten by this feature.
> > > >
> > > > Can we go for different tradeoff?
> > > >
> > > > In console_unlock(), if you detect too much work, print "Too many
> > > > messages to print, %d bytes delayed" and wake up kernel thread.
> > >
> > > "too many messages" is undefined. console_unlock() can be called from
> > > IRQ handler or with preemtion disabled, or under spin_lock, or under
> > > RCU read lock, etc. etc. By the time we decide to wake up printk_kthread
> > > from console_unlock() it may be already too late.
> >
> > So lets define "too many messages" as 240 characters. We know printk
> > worked rather well for us for more than 20 years. Kernel code is used
> > to printk taking few miliseconds.
>
> serial console can be quite slow. and port->lock, that is acquired by
> console_unlock()->call_console_drivers()->write(), is also accessible
> by serial driver's IRQ handler, and this lock may be busy long
> enough -- as long as that IRQ handler transmits/receives chars. but
> that's not the point.
Well. This is what we had for 20 years.
> [..]
> > Yeah? So you know modified printk() does not work, that's why
> > "emergency mode" exists. Unfortunately, you can't rely on fact that
> > you can detect half-crashed machines by printk levels. You usually
> > can't.
>
> I'm not happy with those printk_emergency_begin()/end(), sure. but that's
> the reality -- every single solution that would offload printing duty implies
> that there will be cases when offloading would not be possible. either
> PENDING_PRINTK_IPI to other CPUs, or irq_work(PENDING_OUTPUT) on a local CPU,
> or anything else (um... what it is?... softirq? tasklet? print one logbuf
> entry from every IRQ handler? dunno, anything else?). There will be cases
> when we won't be able to expect that something will take over and finish
> printing for us. Well, may be I'm missing some other solution that would
> offload printing, eliminating lockup conditions, and at the same time work
> in 100% of the cases.
I don't have magic solution in my sleeve. You made a good case that
spending 30 seconds in printk() is a bad idea. I agree with that. Your
solution is to introduce printk_emergency_begin()/end(). I don't agree
there.
I believe "spend at most 2 seconds in printk(), then print a warning
and offload" is a solution closer to what we had before.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)
Powered by blists - more mailing lists