[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALOAHbCFsqy1DXk5c_NLyi2TPnJd_tP5CtU==thESmTbF6oQDA@mail.gmail.com>
Date: Wed, 29 Jun 2022 22:24:53 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Quentin Monnet <quentin@...valent.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, bpf <bpf@...r.kernel.org>,
netdev <netdev@...r.kernel.org>,
Stanislav Fomichev <sdf@...gle.com>
Subject: Re: [PATCH bpf-next v2] bpftool: Probe for memcg-based accounting
before bumping rlimit
On Wed, Jun 29, 2022 at 7:13 PM Quentin Monnet <quentin@...valent.com> wrote:
>
> Bpftool used to bump the memlock rlimit to make sure to be able to load
> BPF objects. After the kernel has switched to memcg-based memory
> accounting [0] in 5.11, bpftool has relied on libbpf to probe the system
> for memcg-based accounting support and for raising the rlimit if
> necessary [1]. But this was later reverted, because the probe would
> sometimes fail, resulting in bpftool not being able to load all required
> objects [2].
>
> Here we add a more efficient probe, in bpftool itself. We first lower
> the rlimit to 0, then we attempt to load a BPF object (and finally reset
> the rlimit): if the load succeeds, then memcg-based memory accounting is
> supported.
>
> This approach was earlier proposed for the probe in libbpf itself [3],
> but given that the library may be used in multithreaded applications,
> the probe could have undesirable consequences if one thread attempts to
> lock kernel memory while memlock rlimit is at 0. Since bpftool is
> single-threaded and the rlimit is process-based, this is fine to do in
> bpftool itself.
>
> This probe was inspired by the similar one from the cilium/ebpf Go
> library [4].
>
> v2:
> - Simply use sizeof(attr) instead of hardcoding a size via
> offsetofend().
> - Set r0 = 0 before returning in sample program.
>
> [0] commit 97306be45fbe ("Merge branch 'switch to memcg-based memory accounting'")
> [1] commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK")
> [2] commit 6b4384ff1088 ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"")
> [3] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/t/#u
> [4] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
>
> Cc: Stanislav Fomichev <sdf@...gle.com>
> Cc: Yafang Shao <laoar.shao@...il.com>
> Suggested-by: Daniel Borkmann <daniel@...earbox.net>
> Signed-off-by: Quentin Monnet <quentin@...valent.com>
LGTM
Acked-by: Yafang Shao <laoar.shao@...il.com>
> ---
> tools/bpf/bpftool/common.c | 71 ++++++++++++++++++++++++++++++++++++--
> 1 file changed, 68 insertions(+), 3 deletions(-)
>
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index a0d4acd7c54a..fc8172a4969a 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -13,14 +13,17 @@
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> -#include <linux/limits.h>
> -#include <linux/magic.h>
> #include <net/if.h>
> #include <sys/mount.h>
> #include <sys/resource.h>
> #include <sys/stat.h>
> #include <sys/vfs.h>
>
> +#include <linux/filter.h>
> +#include <linux/limits.h>
> +#include <linux/magic.h>
> +#include <linux/unistd.h>
> +
> #include <bpf/bpf.h>
> #include <bpf/hashmap.h>
> #include <bpf/libbpf.h> /* libbpf_num_possible_cpus */
> @@ -73,11 +76,73 @@ static bool is_bpffs(char *path)
> return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> }
>
> +/* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to
> + * memcg-based memory accounting for BPF maps and programs. This was done in
> + * commit 97306be45fbe ("Merge branch 'switch to memcg-based memory
> + * accounting'"), in Linux 5.11.
> + *
> + * Libbpf also offers to probe for memcg-based accounting vs rlimit, but does
> + * so by checking for the availability of a given BPF helper and this has
> + * failed on some kernels with backports in the past, see commit 6b4384ff1088
> + * ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"").
> + * Instead, we can probe by lowering the process-based rlimit to 0, trying to
> + * load a BPF object, and resetting the rlimit. If the load succeeds then
> + * memcg-based accounting is supported.
> + *
> + * This would be too dangerous to do in the library, because multithreaded
> + * applications might attempt to load items while the rlimit is at 0. Given
> + * that bpftool is single-threaded, this is fine to do here.
> + */
> +static bool known_to_need_rlimit(void)
> +{
> + struct rlimit rlim_init, rlim_cur_zero = {};
> + struct bpf_insn insns[] = {
> + BPF_MOV64_IMM(BPF_REG_0, 0),
> + BPF_EXIT_INSN(),
> + };
> + size_t insn_cnt = ARRAY_SIZE(insns);
> + union bpf_attr attr;
> + int prog_fd, err;
> +
> + memset(&attr, 0, sizeof(attr));
> + attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
> + attr.insns = ptr_to_u64(insns);
> + attr.insn_cnt = insn_cnt;
> + attr.license = ptr_to_u64("GPL");
> +
> + if (getrlimit(RLIMIT_MEMLOCK, &rlim_init))
> + return false;
> +
> + /* Drop the soft limit to zero. We maintain the hard limit to its
> + * current value, because lowering it would be a permanent operation
> + * for unprivileged users.
> + */
> + rlim_cur_zero.rlim_max = rlim_init.rlim_max;
> + if (setrlimit(RLIMIT_MEMLOCK, &rlim_cur_zero))
> + return false;
> +
> + /* Do not use bpf_prog_load() from libbpf here, because it calls
> + * bump_rlimit_memlock(), interfering with the current probe.
> + */
> + prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
> + err = errno;
> +
> + /* reset soft rlimit to its initial value */
> + setrlimit(RLIMIT_MEMLOCK, &rlim_init);
> +
> + if (prog_fd < 0)
> + return err == EPERM;
> +
> + close(prog_fd);
> + return false;
> +}
> +
> void set_max_rlimit(void)
> {
> struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>
> - setrlimit(RLIMIT_MEMLOCK, &rinf);
> + if (known_to_need_rlimit())
> + setrlimit(RLIMIT_MEMLOCK, &rinf);
> }
>
> static int
> --
> 2.34.1
>
--
Regards
Yafang
Powered by blists - more mailing lists