netdev - Re: [PATCH bpf-next v2 00/12] bpf: Introduce selectable memcg for bpf map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yv67MRQLPreR9GU5@slm.duckdns.org>
Date:   Thu, 18 Aug 2022 12:20:33 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Yafang Shao <laoar.shao@...il.com>
Cc:     ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
        kafai@...com, songliubraving@...com, yhs@...com,
        john.fastabend@...il.com, kpsingh@...nel.org, sdf@...gle.com,
        haoluo@...gle.com, jolsa@...nel.org, hannes@...xchg.org,
        mhocko@...nel.org, roman.gushchin@...ux.dev, shakeelb@...gle.com,
        songmuchun@...edance.com, akpm@...ux-foundation.org,
        lizefan.x@...edance.com, cgroups@...r.kernel.org,
        netdev@...r.kernel.org, bpf@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH bpf-next v2 00/12] bpf: Introduce selectable memcg for
 bpf map

Hello,

On Thu, Aug 18, 2022 at 02:31:06PM +0000, Yafang Shao wrote:
> After switching to memcg-based bpf memory accounting to limit the bpf
> memory, some unexpected issues jumped out at us.
> 1. The memory usage is not consistent between the first generation and
> new generations.
> 2. After the first generation is destroyed, the bpf memory can't be
> limited if the bpf maps are not preallocated, because they will be
> reparented.
> 
> This patchset tries to resolve these issues by introducing an
> independent memcg to limit the bpf memory.

memcg folks would have better informed opinions but from generic cgroup pov
I don't think this is a good direction to take. This isn't a problem limited
to bpf progs and it doesn't make whole lot of sense to solve this for bpf.

We have the exact same problem for any resources which span multiple
instances of a service including page cache, tmpfs instances and any other
thing which can persist longer than procss life time. My current opinion is
that this is best solved by introducing an extra cgroup layer to represent
the persistent entity and put the per-instance cgroup under it.

It does require reorganizing how things are organized from userspace POV but
the end result is really desirable. We get entities accurately representing
what needs to be tracked and control over the granularity of accounting and
control (e.g. folks who don't care about telling apart the current
instance's usage can simply not enable controllers at the persistent entity
level).

We surely can discuss other approaches but my current intuition is that it'd
be really difficult to come up with a better solution than layering to
introduce persistent service entities.

So, please consider the approach nacked for the time being.

Thanks.

-- 
tejun