linux-kernel - Re: [PATCH v1] kernel/trace:check the val against the available mem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180405151359.GB28128@bombadil.infradead.org>
Date:   Thu, 5 Apr 2018 08:13:59 -0700
From:   Matthew Wilcox <willy@...radead.org>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Joel Fernandes <joelaf@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Zhaoyang Huang <huangzhaoyang@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        kernel-patch-test@...ts.linaro.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        "open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH v1] kernel/trace:check the val against the available mem

On Thu, Apr 05, 2018 at 04:27:49PM +0200, Michal Hocko wrote:
> On Thu 05-04-18 07:22:58, Matthew Wilcox wrote:
> > On Wed, Apr 04, 2018 at 09:12:52PM -0700, Joel Fernandes wrote:
> > > On Wed, Apr 4, 2018 at 7:58 PM, Matthew Wilcox <willy@...radead.org> wrote:
> > > > I still don't get why you want RETRY_MAYFAIL.  You know that tries
> > > > *harder* to allocate memory than plain GFP_KERNEL does, right?  And
> > > > that seems like the exact opposite of what you want.

Argh.  The comment confused me.  OK, now I've read the source and
understand that GFP_KERNEL | __GFP_RETRY_MAYFAIL tries exactly as hard
as GFP_KERNEL *except* that it won't cause OOM itself.  But any other
simultaneous GFP_KERNEL allocation without __GFP_RETRY_MAYFAIL will
cause an OOM.  (And that's why we're having a conversation)

That's a problem because we have places in the kernel that call
kv[zm]alloc(very_large_size, GFP_KERNEL), and that will turn into vmalloc,
which will do the exact same thing, only it will trigger OOM all by itself
(assuming the largest free chunk of address space in the vmalloc area
is larger than the amount of free memory).

I considered an alloc_page_array(), but that doesn't fit well with the
design of the ring buffer code.  We could have:

struct page *alloc_page_list_node(int nid, gfp_t gfp_mask, unsigned long nr);

and link the allocated pages together through page->lru.

We could also have a GFP flag that says to only succeed if we're further
above the existing watermark than normal.  __GFP_LOW (==ALLOC_LOW),
if you like.  That would give us the desired behaviour of trying all of
the reclaim methods that GFP_KERNEL would, but not being able to exhaust
all the memory that GFP_KERNEL allocations would take.