linux-kernel - Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180110183055.GM3668920@devbig577.frc2.facebook.com>
Date:   Wed, 10 Jan 2018 10:30:55 -0800
From:   Tejun Heo <tj@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Petr Mladek <pmladek@...e.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        akpm@...ux-foundation.org, Steven Rostedt <rostedt@...dmis.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        linux-mm@...ck.org, Cong Wang <xiyou.wangcong@...il.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jan Kara <jack@...e.cz>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        rostedt@...e.goodmis.org, Byungchul Park <byungchul.park@....com>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Pavel Machek <pavel@....cz>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

Hello, Peter.

On Wed, Jan 10, 2018 at 07:21:53PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> > 2. System runs out of memory, OOM triggers.
> > 3. OOM handler is printing out OOM debug info.
> > 4. While trying to emit the messages for netconsole, the network stack
> >    / driver tries to allocate memory and then fail, which in turn
> >    triggers allocation failure or other warning messages.  printk was
> >    already flushing, so the messages are queued on the ring.
> > 5. OOM handler keeps flushing but 4 repeats and the queue is never
> >    shrinking.  Because OOM handler is trapped in printk flushing, it
> >    never manages to free memory and no one else can enter OOM path
> >    either, so the system is trapped in this state.
> 
> Why not kill recursive OOM (msgs) ?

Sure, we can do that too, e.g. marking flushing thread and ignoring
new messages from it, although that does come with its own downsides.
The choices are

* If we can make printk safe without much downside, that'd be the best
  option.

* If we decide that we can't do that in a reasonable way, we sure can
  try to plug the identified cases.  We might have to play a bit of
  whack-a-mole (e.g. the feedback loop might not necessarily be from
  the same context) but there likely are very few repeatable cases.

It could be me not knowing the history of the discussion but up until
now the discussion hasn't really gotten to that point since I brought
up the case that we've been seeing.

Thanks.

-- 
tejun