linux-kernel - Re: [PATCH 2/5] kernel.h: Add non_block

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20190815151509.9ddbd1f11fb9c4c3e97a67a5@linux-foundation.org>
Date:   Thu, 15 Aug 2019 15:15:09 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Daniel Vetter <daniel.vetter@...ll.ch>,
        LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
        DRI Development <dri-devel@...ts.freedesktop.org>,
        Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Christian König <christian.koenig@....com>,
        Jérôme Glisse <jglisse@...hat.com>,
        Masahiro Yamada <yamada.masahiro@...ionext.com>,
        Wei Wang <wvw@...gle.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jann Horn <jannh@...gle.com>, Feng Tang <feng.tang@...el.com>,
        Kees Cook <keescook@...omium.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        Daniel Vetter <daniel.vetter@...el.com>
Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end()

On Thu, 15 Aug 2019 10:44:29 +0200 Michal Hocko <mhocko@...nel.org> wrote:

> > I continue to struggle with this.  It introduces a new kernel state
> > "running preemptibly but must not call schedule()".  How does this make
> > any sense?
> > 
> > Perhaps a much, much more detailed description of the oom_reaper
> > situation would help out.
>  
> The primary point here is that there is a demand of non blockable mmu
> notifiers to be called when the oom reaper tears down the address space.
> As the oom reaper is the primary guarantee of the oom handling forward
> progress it cannot be blocked on anything that might depend on blockable
> memory allocations. These are not really easy to track because they
> might be indirect - e.g. notifier blocks on a lock which other context
> holds while allocating memory or waiting for a flusher that needs memory
> to perform its work. If such a blocking state happens that we can end up
> in a silent hang with an unusable machine.
> Now we hope for reasonable implementations of mmu notifiers (strong
> words I know ;) and this should be relatively simple and effective catch
> all tool to detect something suspicious is going on.
> 
> Does that make the situation more clear?

Yes, thanks, much.  Maybe a code comment along the lines of

  This is on behalf of the oom reaper, specifically when it is
  calling the mmu notifiers.  The problem is that if the notifier were
  to block on, for example, mutex_lock() and if the process which holds
  that mutex were to perform a sleeping memory allocation, the oom
  reaper is now blocked on completion of that memory allocation.

btw, do we need task_struct.non_block_count?  Perhaps the oom reaper
thread could set a new PF_NONBLOCK (which would be more general than
PF_OOM_REAPER).  If we run out of PF_ flags, use (current == oom_reaper_th).