[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzZYWGKA7S_1jWcGcNyPmrDJeGo8YfKXpmboRdDSeEmOZw@mail.gmail.com>
Date: Thu, 9 Jun 2022 11:37:52 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Quentin Monnet <quentin@...valent.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Harsh Modi <harshmodi@...gle.com>,
Paul Chaignon <paul@...ium.io>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next] libbpf: Improve probing for memcg-based memory accounting
On Thu, Jun 9, 2022 at 11:13 AM <sdf@...gle.com> wrote:
>
> On 06/09, Quentin Monnet wrote:
> > To ensure that memory accounting will not hinder the load of BPF
> > objects, libbpf may raise the memlock rlimit before proceeding to some
> > operations. Whether this limit needs to be raised depends on the version
> > of the kernel: newer versions use cgroup-based (memcg) memory
> > accounting, and do not require any adjustment.
>
> > There is a probe in libbpf to determine whether memcg-based accounting
> > is supported. But this probe currently relies on the availability of a
> > given BPF helper, bpf_ktime_get_coarse_ns(), which landed in the same
> > kernel version as the memory accounting change. This works in the
> > generic case, but it may fail, for example, if the helper function has
> > been backported to an older kernel. This has been observed for Google
> > Cloud's Container-Optimized OS (COS), where the helper is available but
> > rlimit is still in use. The probe succeeds, the rlimit is not raised,
> > and probing features with bpftool, for example, fails.
>
> > Here we attempt to improve this probe and to effectively rely on memory
> > accounting. Function probe_memcg_account() in libbpf is updated to set
> > the rlimit to 0, then attempt to load a BPF object, and then to reset
> > the rlimit. If the load still succeeds, then this means we're running
> > with memcg-based accounting.
>
> > This probe was inspired by the similar one from the cilium/ebpf Go
> > library [0].
>
> > [0] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
>
> > Signed-off-by: Quentin Monnet <quentin@...valent.com>
> > ---
> > tools/lib/bpf/bpf.c | 23 ++++++++++++++++++-----
> > 1 file changed, 18 insertions(+), 5 deletions(-)
>
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 240186aac8e6..781387e6f66b 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -99,31 +99,44 @@ static inline int sys_bpf_prog_load(union bpf_attr
> > *attr, unsigned int size, int
>
> > /* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to
> > * memcg-based memory accounting for BPF maps and progs. This was done
> > in [0].
> > - * We use the support for bpf_ktime_get_coarse_ns() helper, which was
> > added in
> > - * the same 5.11 Linux release ([1]), to detect memcg-based accounting
> > for BPF.
> > + * To do so, we lower the soft memlock rlimit to 0 and attempt to create
> > a BPF
> > + * object. If it succeeds, then memcg-based accounting for BPF is
> > available.
> > *
> > * [0]
> > https://lore.kernel.org/bpf/20201201215900.3569844-1-guro@fb.com/
> > - * [1] d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper")
> > */
> > int probe_memcg_account(void)
> > {
> > const size_t prog_load_attr_sz = offsetofend(union bpf_attr,
> > attach_btf_obj_fd);
> > struct bpf_insn insns[] = {
> > - BPF_EMIT_CALL(BPF_FUNC_ktime_get_coarse_ns),
> > BPF_EXIT_INSN(),
> > };
> > + struct rlimit rlim_init, rlim_cur_zero = {};
> > size_t insn_cnt = ARRAY_SIZE(insns);
> > union bpf_attr attr;
> > int prog_fd;
>
> > - /* attempt loading freplace trying to use custom BTF */
> > memset(&attr, 0, prog_load_attr_sz);
> > attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
> > attr.insns = ptr_to_u64(insns);
> > attr.insn_cnt = insn_cnt;
> > attr.license = ptr_to_u64("GPL");
>
> > + if (getrlimit(RLIMIT_MEMLOCK, &rlim_init))
> > + return -1;
> > +
> > + /* Drop the soft limit to zero. We maintain the hard limit to its
> > + * current value, because lowering it would be a permanent operation
> > + * for unprivileged users.
> > + */
> > + rlim_cur_zero.rlim_max = rlim_init.rlim_max;
> > + if (setrlimit(RLIMIT_MEMLOCK, &rlim_cur_zero))
> > + return -1;
> > +
> > prog_fd = sys_bpf_fd(BPF_PROG_LOAD, &attr, prog_load_attr_sz);
> > +
> > + /* reset soft rlimit as soon as possible */
> > + setrlimit(RLIMIT_MEMLOCK, &rlim_init);
>
> Isn't that adding more flakiness to the other daemons running as
> the same user? Also, there might be surprises if another daemon that
> has libbpf in it starts right when we've set the limit temporarily to zero.
>
I agree, it briefly changes global process state and can introduce
some undesirable (and very non-obvious) side effects.
> Can we push these decisions to the users as part of libbpf 1.0 cleanup?
Quentin, at least for bpftool, I think it's totally fine to just
always bump RLIMIT_MEMLOCK to avoid this issue. That would solve the
issue with probing, right? And for end applications I think I agree
with Stanislav that application might need to ensure rlimit bumping
for such backported changes.
>
> > +
> > if (prog_fd >= 0) {
> > close(prog_fd);
> > return 1;
> > --
> > 2.34.1
>
Powered by blists - more mailing lists