lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjkBkIHde+fWHw9K@carbon.dhcp.thefacebook.com>
Date:   Mon, 21 Mar 2022 15:52:00 -0700
From:   Roman Gushchin <roman.gushchin@...ux.dev>
To:     Yafang Shao <laoar.shao@...il.com>
Cc:     ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
        kafai@...com, songliubraving@...com, yhs@...com,
        john.fastabend@...il.com, kpsingh@...nel.org, shuah@...nel.org,
        netdev@...r.kernel.org, bpf@...r.kernel.org,
        linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 00/14] bpf: Allow not to charge bpf memory

Hello, Yafang!

Thank you for continuing working on this!

On Sat, Mar 19, 2022 at 05:30:22PM +0000, Yafang Shao wrote:
> After switching to memcg-based bpf memory accounting, the bpf memory is
> charged to the loader's memcg by defaut, that causes unexpected issues for
> us. For instance, the container of the loader-which loads the bpf programs
> and pins them on bpffs-may restart after pinning the progs and maps. After
> the restart, the pinned progs and maps won't belong to the new container
> any more, while they actually belong to an offline memcg left by the
> previous generation. That inconsistent behavior will make trouble for the
> memory resource management for this container.

I'm looking at this text and increasingly feeling that it's not a bpf-specific
problem and it shouldn't be solved as one.

Is there any significant reason why the loader can't temporarily enter the
root cgroup before creating bpf maps/progs? I know it will require some changes
in the userspace code, but so do new bpf flags.

> 
> The reason why these progs and maps have to be persistent across multiple
> generations is that these progs and maps are also used by other processes
> which are not in this container. IOW, they can't be removed when this
> container is restarted. Take a specific example, bpf program for clsact
> qdisc is loaded by a agent running in a container, which not only loads
> bpf program but also processes the data generated by this program and do
> some other maintainace things.
> 
> In order to keep the charging behavior consistent, we used to consider a
> way to recharge these pinned maps and progs again after the container is
> restarted, but after the discussion[1] with Roman, we decided to go
> another direction that don't charge them to the container in the first
> place. TL;DR about the mentioned disccussion: recharging is not a generic
> solution and it may take too much risk.
> 
> This patchset is the solution of no charge. Two flags are introduced in
> union bpf_attr, one for bpf map and another for bpf prog. The user who
> doesn't want to charge to current memcg can use these two flags. These two
> flags are only permitted for sys admin as these memory will be accounted to
> the root memcg only.

If we're going with bpf-specific flags (which I'd prefer not to), let's
define them as the way to create system-wide bpf objects which are expected
to outlive the original cgroup. With expectations that they will be treated
as belonging to the root cgroup by any sort of existing or future resource
accounting (e.g. if we'll start accounting CPU used by bpf prgrams).

But then again: why just not create them in the root cgroup?

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ