[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180109200620.GQ3668920@devbig577.frc2.facebook.com>
Date: Tue, 9 Jan 2018 12:06:20 -0800
From: Tejun Heo <tj@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Petr Mladek <pmladek@...e.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Jan Kara <jack@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Rafael Wysocki <rjw@...ysocki.net>,
Pavel Machek <pavel@....cz>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
linux-kernel@...r.kernel.org,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
Hello, Steven.
My apologies for the late reply. Was traveling and then got sick.
On Thu, Dec 21, 2017 at 11:19:32PM -0500, Steven Rostedt wrote:
> You don't think handing off printks to an offloaded thread isn't more
> complex nor can it cause even more issues (like more likely to lose
> relevant information on kernel crashes)?
Sergey's patch seems more complex (and probably handles more
requirements) but my original patch was pretty simple.
http://lkml.kernel.org/r/20171102135258.GO3252168@devbig577.frc2.facebook.com
> > static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
> > {
> > int i;
> >
> > if (READ_ONCE(in_printk))
> > for (i = 0; i < 10000; i++)
> > printk("%-80s\n", "XXX TIMER");
>
> WTF!
>
> You are printing 10,000 printk messages from an interrupt context???
> And to top it off, I ran this on my box, switching printk() to
> trace_printk() (which is extremely low overhead). And it is triggered
> on the same CPU that did the printk() itself on. Yeah, there is no hand
> off, because you are doing a shitload of printks on one CPU and nothing
> on any of the other CPUs. This isn't the problem that my patch was set
> out to solve, nor is it a very realistic problem. I added a counter to
> the printk as well, to keep track of how many printks there were:
The code might suck but I think this does replicate what we've been
seeing regularly in the fleet. The console side is pretty slow - IPMI
faithfully emulating serial console. I don't know it's doing 115200
or even slower. Please consider something like the following.
* The kernel isn't preemptible. Machine runs out of memory, hits OOM
condition. It starts printing OOM information.
* Netconsole tries to send out OOM messages and tries memory
allocation which fails which then prints allocation failed messages.
Because this happens while already printing, it just queues the
messages to the buffer. This repeats.
* We're still in the middle of OOM and hasn't killed nobody, so memory
keeps being short and the printk ring buffer is continuously getting
filled by the above. Also, after a bit, RCU stall warnings kick in
too producing more messages.
What's happening is that the OOM killer is trapped flushing printk
failing to clear the memory condition and that leads irq / softirq
contexts to produce messages faster than can be flushed. I don't see
how we'd be able to clear the condition without introducing an
independent context to flush the ring buffer.
Again, this is an actual problem that we've been seeing fairly
regularly in production machines.
Thanks.
--
tejun
Powered by blists - more mailing lists