linux-kernel - Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170116084717.GA13641@dhcp22.suse.cz>
Date:   Mon, 16 Jan 2017 09:47:17 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     John Hubbard <jhubbard@...dia.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        David Rientjes <rientjes@...gle.com>,
        Mel Gorman <mgorman@...e.de>,
        Johannes Weiner <hannes@...xchg.org>,
        Al Viro <viro@...iv.linux.org.uk>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>,
        Anatoly Stepanov <astepanov@...udlinux.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Mike Snitzer <snitzer@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Theodore Ts'o <tytso@....edu>
Subject: Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers

On Sun 15-01-17 20:34:13, John Hubbard wrote:
> 
> 
> On 01/12/2017 07:37 AM, Michal Hocko wrote:
[...]
> > diff --git a/mm/util.c b/mm/util.c
> > index 3cb2164f4099..7e0c240b5760 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
> >  }
> >  EXPORT_SYMBOL(vm_mmap);
> > 
> > +/**
> > + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
> 
> Hi Michal,
> 
> How about this wording instead:
> 
> kvmalloc_node - attempt to allocate physically contiguous memory, but upon
> failure, fall back to non-contiguous (vmalloc) allocation.

OK, why not.
 
> > + * @size: size of the request.
> > + * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
> > + * @node: numa node to allocate from
> > + *
> > + * Uses kmalloc to get the memory but if the allocation fails then falls back
> > + * to the vmalloc allocator. Use kvfree for freeing the memory.
> > + *
> > + * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> 
> Is that "Reclaim modifiers" line still true, or is it a leftover from an
> earlier approach? I am having trouble reconciling it with rest of the
> patchset, because:
> 
> a) the flags argument below is effectively passed on to either kmalloc_node
> (possibly adding, but not removing flags), or to __vmalloc_node_flags.

The above only says thos are _unsupported_ - in other words the behavior
is not defined. Even if flags are passed down to kmalloc resp. vmalloc
it doesn't mean they are used that way.  Remember that vmalloc uses
some hardcoded GFP_KERNEL allocations.  So while I could be really
strict about this and mask away these flags I doubt this is worth the
additional code.
 
> b) In patch 6/6, you are in fact passing in __GFP_REPEAT to the wrappers
> (kvzalloc, for example), and again, only adding, not removing flags.

Patch 2 adds a support for __GFP_REPEAT and updates the above line as
well.
 
> > + */
> > +void *kvmalloc_node(size_t size, gfp_t flags, int node)
> > +{
> > +	gfp_t kmalloc_flags = flags;
> > +	void *ret;
> > +
> > +	/*
> > +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> > +	 * so the given set of flags has to be compatible.
> > +	 */
> > +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> > +
> > +	/*
> > +	 * Make sure that larger requests are not too disruptive - no OOM
> > +	 * killer and no allocation failure warnings as we have a fallback
> > +	 */
> > +	if (size > PAGE_SIZE)
> > +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> > +
> > +	ret = kmalloc_node(size, kmalloc_flags, node);
> 
> Along those lines (dealing with larger requests), is there any value in
> picking some threshold value, and going straight to vmalloc if size is
> greater than that threshold?

I am not a fan of thresholds. PAGE_ALLOC_COSTLY_ORDER which is
internally used by the page allocator has turned out to be a major pain.
I do not want to repeat the same mistake again here. Besides that you
could hard find a "one suits all" value so it would have to be a part of
the API. If we ever grow users who would really like to do something
like that then a specialized API should be added.

> It's less flexible and might even require
> occasional maintenance over the years, but it would save some time on *some*
> systems in some cases...OK, I think I just talked myself out of the whole
> idea. But I still want to put the question out there, because I think others
> may also ask it, and I'd like to hear a more experienced opinion.


-- 
Michal Hocko
SUSE Labs