lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180110183055.GM3668920@devbig577.frc2.facebook.com>
Date:   Wed, 10 Jan 2018 10:30:55 -0800
From:   Tejun Heo <tj@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Petr Mladek <pmladek@...e.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        akpm@...ux-foundation.org, Steven Rostedt <rostedt@...dmis.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        linux-mm@...ck.org, Cong Wang <xiyou.wangcong@...il.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jan Kara <jack@...e.cz>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        rostedt@...e.goodmis.org, Byungchul Park <byungchul.park@....com>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Pavel Machek <pavel@....cz>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

Hello, Peter.

On Wed, Jan 10, 2018 at 07:21:53PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> > 2. System runs out of memory, OOM triggers.
> > 3. OOM handler is printing out OOM debug info.
> > 4. While trying to emit the messages for netconsole, the network stack
> >    / driver tries to allocate memory and then fail, which in turn
> >    triggers allocation failure or other warning messages.  printk was
> >    already flushing, so the messages are queued on the ring.
> > 5. OOM handler keeps flushing but 4 repeats and the queue is never
> >    shrinking.  Because OOM handler is trapped in printk flushing, it
> >    never manages to free memory and no one else can enter OOM path
> >    either, so the system is trapped in this state.
> 
> Why not kill recursive OOM (msgs) ?

Sure, we can do that too, e.g. marking flushing thread and ignoring
new messages from it, although that does come with its own downsides.
The choices are

* If we can make printk safe without much downside, that'd be the best
  option.

* If we decide that we can't do that in a reasonable way, we sure can
  try to plug the identified cases.  We might have to play a bit of
  whack-a-mole (e.g. the feedback loop might not necessarily be from
  the same context) but there likely are very few repeatable cases.

It could be me not knowing the history of the discussion but up until
now the discussion hasn't really gotten to that point since I brought
up the case that we've been seeing.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ