netdev - Re: [PATCH 0/6 v3] kvmalloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5889DEA3.7040106@iogearbox.net>
Date:   Thu, 26 Jan 2017 12:33:55 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Michal Hocko <mhocko@...nel.org>
CC:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>, Mel Gorman <mgorman@...e.de>,
        Johannes Weiner <hannes@...xchg.org>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        marcelo.leitner@...il.com
Subject: Re: [PATCH 0/6 v3] kvmalloc

On 01/26/2017 11:08 AM, Michal Hocko wrote:
> On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
>> On 01/26/2017 08:43 AM, Michal Hocko wrote:
>>> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
> [...]
>>>> I assume that kvzalloc() is still the same from [1], right? If so, then
>>>> it would unfortunately (partially) reintroduce the issue that was fixed.
>>>> If you look above at flags, they're also passed to __vmalloc() to not
>>>> trigger OOM in these situations I've experienced.
>>>
>>> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
>>> think it would. It can still trigger the OOM killer becauset the flags
>>> are no propagated all the way down to all allocations requests (e.g.
>>> page tables). This is the same reason why GFP_NOFS is not supported in
>>> vmalloc.
>>
>> Ok, good to know, is that somewhere clearly documented (like for the
>> case with kmalloc())?
>
> I am afraid that we really suck on this front. I will add something.

Thanks for doing that, much appreciated!

>> If not, could we do that for non-mm folks, or
>> at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make
>> it obvious to users that a given flag combination is not supported all
>> the way down?
>
> I am not sure that triggering a warning that somebody has used
> __GFP_NOWARN is very helpful ;). I also do not think that covering all the
> supported flags is really feasible. Most of them will not have bad side
> effects. I have added the warning because this API is new and I wanted
> to catch new abusers. Old ones would have to die slowly.

Okay, makes sense then. Just the kdoc comment from your other
mail should help fine already.

>>>> This is effectively the
>>>> same requirement as in other networking areas f.e. that 5bad87348c70
>>>> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
>>>> In your comment in kvzalloc() you eventually say that some of the above
>>>> modifiers are not supported. So there would be two options, i) just leave
>>>> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
>>>> it later (along with similar code from 5bad87348c70), or ii) implement
>>>> support for these modifiers as well to your original set. I guess it's not
>>>> too urgent, so we could also proceed with i) if that is easier for you to
>>>> proceed (I don't mind either way).
>>>
>>> Could you clarify why the oom killer in vmalloc matters actually?
>>
>> For both mentioned commits, (privileged) user space can potentially
>> create large allocation requests, where we thus switch to vmalloc()
>> flavor eventually and then OOM starts killing processes to try to
>> satisfy the allocation request. This is bad, because we want the
>> request to just fail instead as it's non-critical and f.e. not kill
>> ssh connection et al. Failing is totally fine in this case, whereas
>> triggering OOM is not.
>
> I see your intention but does it really make any real difference?
> Consider you would back off right before you would have OOMed. Any
> parallel request would just hit the OOM for you. You are (almost) never
> doing an allocation in an isolation.
>
>> In my testing, __GFP_NORETRY did satisfy this
>> just fine, but as you say it seems it's not enough.
>
> Yeah, ptes have been most probably popullated already.
>
>> Given there are
>> multiple places like these in the kernel, could we instead add an
>> option such as __GFP_NOOOM, or just make __GFP_NORETRY supported?
>
> As said above I do not really think that suppressing the OOM killer
> makes any difference because it might be just somebody else doing that
> for you. Also the OOM killer is the MM internal implementation "detail"
> users shouldn't really care. I agree that callers should have a way to
> say they do not want to try really hard and that is not that simple
> for vmalloc unfortunatelly. The main problem here is that gfp mask
> propagation is not that easy to fix without a lot of code churn as some
> of those hardcoded allocation requests are deep in call chains.

I see, that's unfortunate. I understand that there are requests
in parallel and that we might end up with OOM eventually if we're
unlucky, but having some way to tell vmalloc to just not try as
hard as usual would be nice.

> I know this sucks and it would be great to support __GFP_NORETRY to
> [k]vmalloc and maybe we will get there eventually. But for the mean time
> I really think that using kvmalloc wherever possible is much better than
> open coded variants whith expectations which do not hold sometimes.

I totally agree with you that having kvmalloc() as helper is awesome
and probably long overdue as well. :)

> If you disagree I can drop the bpf part of course...

If we could consolidate these spots with kvmalloc() eventually, I'm
all for it. But even if __GFP_NORETRY is not covered down to all
possible paths, it kind of does have an effect already of saying
'don't try too hard', so would it be harmful to still keep that for
now? If it's not, I'd personally prefer to just leave it as is until
there's some form of support by kvmalloc() and friends.

Thanks for your input, Michal!

Cheers,
Daniel