netdev - Re: [PATCH bpf-next v2 00/12] bpf: Introduce selectable memcg for bpf map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALOAHbDt9NUv9qK_J1_9CU0tmW9kiJ+nig_0NfzGJgJmrSk2fw@mail.gmail.com>
Date:   Fri, 19 Aug 2022 08:59:20 +0800
From:   Yafang Shao <laoar.shao@...il.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>, Martin Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        john fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, jolsa@...nel.org,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        lizefan.x@...edance.com, Cgroups <cgroups@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>
Subject: Re: [PATCH bpf-next v2 00/12] bpf: Introduce selectable memcg for bpf map

On Fri, Aug 19, 2022 at 6:20 AM Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
> On Thu, Aug 18, 2022 at 02:31:06PM +0000, Yafang Shao wrote:
> > After switching to memcg-based bpf memory accounting to limit the bpf
> > memory, some unexpected issues jumped out at us.
> > 1. The memory usage is not consistent between the first generation and
> > new generations.
> > 2. After the first generation is destroyed, the bpf memory can't be
> > limited if the bpf maps are not preallocated, because they will be
> > reparented.
> >
> > This patchset tries to resolve these issues by introducing an
> > independent memcg to limit the bpf memory.
>
> memcg folks would have better informed opinions but from generic cgroup pov
> I don't think this is a good direction to take. This isn't a problem limited
> to bpf progs and it doesn't make whole lot of sense to solve this for bpf.
>

This change is bpf specific. It doesn't refactor a whole lot of things.

> We have the exact same problem for any resources which span multiple
> instances of a service including page cache, tmpfs instances and any other
> thing which can persist longer than procss life time. My current opinion is
> that this is best solved by introducing an extra cgroup layer to represent
> the persistent entity and put the per-instance cgroup under it.
>

It is not practical on k8s.
Because, before the persistent entity, the cgroup dir is stateless.
After, it is stateful.
Pls, don't continue keeping blind eyes on k8s.

> It does require reorganizing how things are organized from userspace POV but
> the end result is really desirable. We get entities accurately representing
> what needs to be tracked and control over the granularity of accounting and
> control (e.g. folks who don't care about telling apart the current
> instance's usage can simply not enable controllers at the persistent entity
> level).
>

Pls.s also think about why k8s refuse to use cgroup2.

> We surely can discuss other approaches but my current intuition is that it'd
> be really difficult to come up with a better solution than layering to
> introduce persistent service entities.
>
> So, please consider the approach nacked for the time being.
>

It doesn't make sense to nack it.
I will explain to you by replying to your other email.

-- 
Regards
Yafang