linux-kernel - Re: [PATCH bpf-next v8 06/34] bpf: prepare for memcg-based memory accounting for bpf maps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAADnVQ+eohA6drYFbbw4ZD-H91xESf=WjZT-oPK85dpdJJAYhQ@mail.gmail.com>
Date:   Thu, 26 Nov 2020 09:12:55 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Roman Gushchin <guro@...com>
Cc:     Daniel Borkmann <daniel@...earbox.net>, bpf <bpf@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Network Development <netdev@...r.kernel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Kernel Team <kernel-team@...com>,
        Johannes Weiner <hannes@...xchg.org>, Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH bpf-next v8 06/34] bpf: prepare for memcg-based memory
 accounting for bpf maps

On Wed, Nov 25, 2020 at 6:30 PM Roman Gushchin <guro@...com> wrote:
>
> I did consider this option. There are pros and cons. In general we tend to charge the cgroup
> which actually allocates the memory, and I decided to stick with this rule. I agree, it's fairly
> easy to come with arguments why always charging the map creator is better. The opposite is
> also true: it's not clear why bpf is different here. So I'm fine with both options, if there
> is a wide consensus, I'm happy to switch to the other option. In general, I believe that
> the current scheme is more flexible.

I don't understand the 'more flexible' part.
The current_memcg or map_memcg approach makes it less predictable.
pre-alloc vs not is somewhat orthogonal.
I've grepped through the kernel where set_active_memcg() is used
and couldn't find a conditional pattern of its usage.
If memcg is known it's used. I couldn't come up with the use case where
using current memcg is the more correct thing to do.

> In general we tend to charge the cgroup which actually allocates the memory

that makes sense where allocation is driven by the user process.
Like user space doing a syscall then all kernel allocation would be
from memcg of that process.
But bpf tracing allocations are not something that the user process requested
the kernel to do. It's like another user space process tapped into another.
Arguably when bpf prog is running the two user processes are active.
One that created the map and loaded the prog and another that is being traced.
I think there will be cases where bpf prog will request the kernel to allocate
memory on behalf of the process being traced, but that memory should be
given back to the process and accessible by it in some form.
Like bpf prog could ask the kernel to grow heap of that process or
trigger readahead.
In such case current_memcg would be the right thing to charge.