[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170125132124.GS32377@dhcp22.suse.cz>
Date: Wed, 25 Jan 2017 14:21:24 +0100
From: Michal Hocko <mhocko@...nel.org>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
David Rientjes <rientjes@...gle.com>,
Mel Gorman <mgorman@...e.de>,
Johannes Weiner <hannes@...xchg.org>,
Al Viro <viro@...iv.linux.org.uk>, linux-mm@...ck.org,
LKML <linux-kernel@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Anatoly Stepanov <astepanov@...udlinux.com>,
Andreas Dilger <adilger@...ger.ca>,
Andreas Dilger <andreas.dilger@...el.com>,
Anton Vorontsov <anton@...msg.org>,
Ben Skeggs <bskeggs@...hat.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Colin Cross <ccross@...roid.com>,
Dan Williams <dan.j.williams@...el.com>,
David Sterba <dsterba@...e.com>,
Eric Dumazet <edumazet@...gle.com>,
Eric Dumazet <eric.dumazet@...il.com>,
Hariprasad S <hariprasad@...lsio.com>,
Heiko Carstens <heiko.carstens@...ibm.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
Ilya Dryomov <idryomov@...il.com>,
Kees Cook <keescook@...omium.org>,
Kent Overstreet <kent.overstreet@...il.com>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Mike Snitzer <snitzer@...hat.com>,
Oleg Drokin <oleg.drokin@...el.com>,
Paolo Bonzini <pbonzini@...hat.com>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Santosh Raspatur <santosh@...lsio.com>,
Tariq Toukan <tariqt@...lanox.com>,
Theodore Ts'o <tytso@....edu>,
Tom Herbert <tom@...bertland.com>,
Tony Luck <tony.luck@...el.com>,
"Yan, Zheng" <zyan@...hat.com>,
Yishai Hadas <yishaih@...lanox.com>,
Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH 0/6 v3] kvmalloc
On Wed 25-01-17 14:10:06, Michal Hocko wrote:
> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
> > On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote:
> > > On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> > > > Hi,
> > > > this has been previously posted as a single patch [1] but later on more
> > > > built on top. It turned out that there are users who would like to have
> > > > __GFP_REPEAT semantic. This is currently implemented for costly >64B
> > > > requests. Doing the same for smaller requests would require to redefine
> > > > __GFP_REPEAT semantic in the page allocator which is out of scope of
> > > > this series.
> > > >
> > > > There are many open coded kmalloc with vmalloc fallback instances in
> > > > the tree. Most of them are not careful enough or simply do not care
> > > > about the underlying semantic of the kmalloc/page allocator which means
> > > > that a) some vmalloc fallbacks are basically unreachable because the
> > > > kmalloc part will keep retrying until it succeeds b) the page allocator
> > > > can invoke a really disruptive steps like the OOM killer to move forward
> > > > which doesn't sound appropriate when we consider that the vmalloc
> > > > fallback is available.
> > > >
> > > > As it can be seen implementing kvmalloc requires quite an intimate
> > > > knowledge if the page allocator and the memory reclaim internals which
> > > > strongly suggests that a helper should be implemented in the memory
> > > > subsystem proper.
> > > >
> > > > Most callers I could find have been converted to use the helper instead.
> > > > This is patch 5. There are some more relying on __GFP_REPEAT in the
> > > > networking stack which I have converted as well but considering we do
> > > > not have a support for __GFP_REPEAT for requests smaller than 64kB I
> > > > have marked it RFC.
> > >
> > > Are there any more comments? I would really appreciate to hear from
> > > networking folks before I resubmit the series.
> >
> > while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
> > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
> > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
> > it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
> > So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
>
> OK, will do. Thanks for the heads up.
Just for the record, I will fold the following into the patch 1
---
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19b6129eab23..8697f43cf93c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
void *bpf_map_area_alloc(size_t size)
{
- /* We definitely need __GFP_NORETRY, so OOM killer doesn't
- * trigger under memory pressure as we really just want to
- * fail instead.
- */
- const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
- void *area;
-
- if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
- area = kmalloc(size, GFP_USER | flags);
- if (area != NULL)
- return area;
- }
-
- return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
- PAGE_KERNEL);
+ return kvzalloc(size, GFP_USER);
}
void bpf_map_area_free(void *area)
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists