[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180117091208.ezvuhumnsarz5thh@pathway.suse.cz>
Date: Wed, 17 Jan 2018 10:12:08 +0100
From: Petr Mladek <pmladek@...e.com>
To: Tejun Heo <tj@...nel.org>
Cc: Steven Rostedt <rostedt@...dmis.org>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
akpm@...ux-foundation.org, linux-mm@...ck.org,
Cong Wang <xiyou.wangcong@...il.com>,
Dave Hansen <dave.hansen@...el.com>,
Johannes Weiner <hannes@...xchg.org>,
Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>,
Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jan Kara <jack@...e.cz>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
rostedt@...e.goodmis.org, Byungchul Park <byungchul.park@....com>,
Pavel Machek <pavel@....cz>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
On Tue 2018-01-16 11:44:56, Tejun Heo wrote:
> Hello, Steven.
>
> On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote:
> > All I did was start off a work queue on each CPU, and each CPU does one
> > printk() followed by a millisecond sleep. No 10,000 printks, nothing
> > in an interrupt handler. Preemption is disabled while the printk
> > happens, but that's normal.
> >
> > This is much closer to an OOM happening all over the system, where OOMs
> > stack dumps are occurring on different CPUS.
>
> OOMs can't happen all over the system. It can only happen on a single
> CPU at a time. If you're printing from multiple CPUs, your solution
> would work great. That is the situation your patches are designed to
> address to begin with. That isn't the problem that I reported tho. I
> understand that your solution works for that class of problems and
> that is great. I really wish that it could address the other class of
> problems too tho, and it doesn't seem like it would be that difficult
> to cover both cases, right?
IMHO, the bad scenario with OOM was that any printk() called in
the OOM report became console_lock owner and was responsible
for pushing all new messages to the console. There was a possible
livelock because OOM Killer was blocked in console_unlock() while
other CPUs repeatedly complained about failed allocations.
Even the current patch should help. It allows to hand off
the console_lock to another CPU and OOM killer could eventually
continue.
Of course, it is possible that it might not be enough. For example,
there might still be too many messages to print when the memory is
freed. Therefore there will be no more complains, no more
hand offs and the last console_lock owner might still
cause softlockup. But it still will be better than
the livelockup. Of course, we will need to address
the softlockup. But let's see how this works in practice.
Best Regards,
Petr
Powered by blists - more mailing lists