netdev - Re: [PATCH bpf-next] libbpf: Improve probing for memcg-based memory accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzZYWGKA7S_1jWcGcNyPmrDJeGo8YfKXpmboRdDSeEmOZw@mail.gmail.com>
Date:   Thu, 9 Jun 2022 11:37:52 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Stanislav Fomichev <sdf@...gle.com>
Cc:     Quentin Monnet <quentin@...valent.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Harsh Modi <harshmodi@...gle.com>,
        Paul Chaignon <paul@...ium.io>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next] libbpf: Improve probing for memcg-based memory accounting

On Thu, Jun 9, 2022 at 11:13 AM <sdf@...gle.com> wrote:
>
> On 06/09, Quentin Monnet wrote:
> > To ensure that memory accounting will not hinder the load of BPF
> > objects, libbpf may raise the memlock rlimit before proceeding to some
> > operations. Whether this limit needs to be raised depends on the version
> > of the kernel: newer versions use cgroup-based (memcg) memory
> > accounting, and do not require any adjustment.
>
> > There is a probe in libbpf to determine whether memcg-based accounting
> > is supported. But this probe currently relies on the availability of a
> > given BPF helper, bpf_ktime_get_coarse_ns(), which landed in the same
> > kernel version as the memory accounting change. This works in the
> > generic case, but it may fail, for example, if the helper function has
> > been backported to an older kernel. This has been observed for Google
> > Cloud's Container-Optimized OS (COS), where the helper is available but
> > rlimit is still in use. The probe succeeds, the rlimit is not raised,
> > and probing features with bpftool, for example, fails.
>
> > Here we attempt to improve this probe and to effectively rely on memory
> > accounting. Function probe_memcg_account() in libbpf is updated to set
> > the rlimit to 0, then attempt to load a BPF object, and then to reset
> > the rlimit. If the load still succeeds, then this means we're running
> > with memcg-based accounting.
>
> > This probe was inspired by the similar one from the cilium/ebpf Go
> > library [0].
>
> > [0] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
>
> > Signed-off-by: Quentin Monnet <quentin@...valent.com>
> > ---
> >   tools/lib/bpf/bpf.c | 23 ++++++++++++++++++-----
> >   1 file changed, 18 insertions(+), 5 deletions(-)
>
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 240186aac8e6..781387e6f66b 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -99,31 +99,44 @@ static inline int sys_bpf_prog_load(union bpf_attr
> > *attr, unsigned int size, int
>
> >   /* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to
> >    * memcg-based memory accounting for BPF maps and progs. This was done
> > in [0].
> > - * We use the support for bpf_ktime_get_coarse_ns() helper, which was
> > added in
> > - * the same 5.11 Linux release ([1]), to detect memcg-based accounting
> > for BPF.
> > + * To do so, we lower the soft memlock rlimit to 0 and attempt to create
> > a BPF
> > + * object. If it succeeds, then memcg-based accounting for BPF is
> > available.
> >    *
> >    *   [0]
> > https://lore.kernel.org/bpf/20201201215900.3569844-1-guro@fb.com/
> > - *   [1] d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper")
> >    */
> >   int probe_memcg_account(void)
> >   {
> >       const size_t prog_load_attr_sz = offsetofend(union bpf_attr,
> > attach_btf_obj_fd);
> >       struct bpf_insn insns[] = {
> > -             BPF_EMIT_CALL(BPF_FUNC_ktime_get_coarse_ns),
> >               BPF_EXIT_INSN(),
> >       };
> > +     struct rlimit rlim_init, rlim_cur_zero = {};
> >       size_t insn_cnt = ARRAY_SIZE(insns);
> >       union bpf_attr attr;
> >       int prog_fd;
>
> > -     /* attempt loading freplace trying to use custom BTF */
> >       memset(&attr, 0, prog_load_attr_sz);
> >       attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
> >       attr.insns = ptr_to_u64(insns);
> >       attr.insn_cnt = insn_cnt;
> >       attr.license = ptr_to_u64("GPL");
>
> > +     if (getrlimit(RLIMIT_MEMLOCK, &rlim_init))
> > +             return -1;
> > +
> > +     /* Drop the soft limit to zero. We maintain the hard limit to its
> > +      * current value, because lowering it would be a permanent operation
> > +      * for unprivileged users.
> > +      */
> > +     rlim_cur_zero.rlim_max = rlim_init.rlim_max;
> > +     if (setrlimit(RLIMIT_MEMLOCK, &rlim_cur_zero))
> > +             return -1;
> > +
> >       prog_fd = sys_bpf_fd(BPF_PROG_LOAD, &attr, prog_load_attr_sz);
> > +
> > +     /* reset soft rlimit as soon as possible */
> > +     setrlimit(RLIMIT_MEMLOCK, &rlim_init);
>
> Isn't that adding more flakiness to the other daemons running as
> the same user? Also, there might be surprises if another daemon that
> has libbpf in it starts right when we've set the limit temporarily to zero.
>

I agree, it briefly changes global process state and can introduce
some undesirable (and very non-obvious) side effects.

> Can we push these decisions to the users as part of libbpf 1.0 cleanup?


Quentin, at least for bpftool, I think it's totally fine to just
always bump RLIMIT_MEMLOCK to avoid this issue. That would solve the
issue with probing, right? And for end applications I think I agree
with Stanislav that application might need to ensure rlimit bumping
for such backported changes.


>
> > +
> >       if (prog_fd >= 0) {
> >               close(prog_fd);
> >               return 1;
> > --
> > 2.34.1
>