[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180110183659.GN3668920@devbig577.frc2.facebook.com>
Date: Wed, 10 Jan 2018 10:36:59 -0800
From: Tejun Heo <tj@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Petr Mladek <pmladek@...e.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
akpm@...ux-foundation.org,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
linux-mm@...ck.org, Cong Wang <xiyou.wangcong@...il.com>,
Dave Hansen <dave.hansen@...el.com>,
Johannes Weiner <hannes@...xchg.org>,
Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>,
Peter Zijlstra <peterz@...radead.org>, Jan Kara <jack@...e.cz>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
rostedt@...e.goodmis.org, Byungchul Park <byungchul.park@....com>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Pavel Machek <pavel@....cz>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hello,
On Wed, Jan 10, 2018 at 01:22:55PM -0500, Steven Rostedt wrote:
> > Can you please chime in? Would you be opposed to offloading to an
> > independent context even if it were only for cases where we were
> > already punting? The thing with the current offloading is that we
> > don't know who we're offloading to. It might end up in faster or
> > slower context, or more importantly a dangerous one.
>
> And how is that different to what we have today? It could be the
> "dangerous one" that did the first printk, and 100 other CPUs in "non
> dangerous" locations are constantly calling printk and making that
> "dangerous" one NEVER STOP.
So, the dangerous one would punt to the dedicated safe one beyond
certain point. The posted version just flushes to the last message
that it saw on entry to flush.
> > The particular case that we've been seeing regularly in the fleet was
> > the following scenario.
> >
> > 1. Console is IPMI emulated serial console. Super slow. Also
> > netconsole is in use.
> > 2. System runs out of memory, OOM triggers.
> > 3. OOM handler is printing out OOM debug info.
> > 4. While trying to emit the messages for netconsole, the network stack
> > / driver tries to allocate memory and then fail, which in turn
> > triggers allocation failure or other warning messages. printk was
> > already flushing, so the messages are queued on the ring.
>
> This looks like a bug in the netconsole, as the net console shouldn't
> print warnings if the warning is caused by it doing a print.
>
> Totally unrelated problem to my and Petr's patch set. Basically your
> argument is "I see this bug, and your patch doesn't fix it". Well maybe
> we are not solving your bug. Not to mention, it looks like printk isn't
> the bug, but net console is.
Sure, that could be the case, especially if punting to a safe context
can't be done reasonably (and there are downsides to silencing the
recursive messages too), but it'd also be really great to have printk
generaly safe from brining down a machine this way, right? I just
don't yet see why punting to a safe context is so difficult /
undesirable that we can't solve the issue in a general manner.
Thanks.
--
tejun
Powered by blists - more mailing lists