netdev - Re: [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <62ba0595-8b04-3ad4-32cd-47829b503f6b@iogearbox.net>
Date:   Fri, 2 Nov 2018 10:08:20 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Quentin Monnet <quentin.monnet@...ronome.com>,
        Alexei Starovoitov <ast@...nel.org>
Cc:     netdev@...r.kernel.org, oss-drivers@...ronome.com
Subject: Re: [RFC bpf-next] libbpf: increase rlimit before trying to create
 BPF maps

On 11/01/2018 06:18 PM, Quentin Monnet wrote:
> 2018-10-30 15:23 UTC+0000 ~ Quentin Monnet <quentin.monnet@...ronome.com>
>> The limit for memory locked in the kernel by a process is usually set to
>> 64 bytes by default. This can be an issue when creating large BPF maps.
>> A workaround is to raise this limit for the current process before
>> trying to create a new BPF map. Changing the hard limit requires the
>> CAP_SYS_RESOURCE and can usually only be done by root user (but then
>> only root can create BPF maps).
> 
> Sorry, the parenthesis is not correct: non-root users can in fact create
> BPF maps as well. If a non-root user calls the function to create a map,
> setrlimit() will fail silently (but set errno), and the program will
> simply go on with its rlimit unchanged.
> 
>> As far as I know there is not API to get the current amount of memory
>> locked for a user, therefore we cannot raise the limit only when
>> required. One solution, used by bcc, is to try to create the map, and on
>> getting a EPERM error, raising the limit to infinity before giving
>> another try. Another approach, used in iproute, is to raise the limit in
>> all cases, before trying to create the map.
>>
>> Here we do the same as in iproute2: the rlimit is raised to infinity
>> before trying to load the map.
>>
>> I send this patch as a RFC to see if people would prefer the bcc
>> approach instead, or the rlimit change to be in bpftool rather than in
>> libbpf.

I'd avoid doing something like this in a generic library; it's basically an
ugly hack for the kind of accounting we're doing and only shows that while
this was "good enough" to start off with in the early days, we should be
doing something better today if every application raises it to inf anyway
then it's broken. :) It just shows that this missed its purpose. Similarly
to the jit_limit discussion on rlimit, perhaps we should be considering
switching to something else entirely from kernel side. Could be something
like memcg but this definitely needs some more evaluation first. (Meanwhile
I'd not change the lib but callers instead and once we have something better
in place we remove this type of "raising to inf" from the tree ...)

>> Signed-off-by: Quentin Monnet <quentin.monnet@...ronome.com>
>> ---
>>  tools/lib/bpf/bpf.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 03f9bcc4ef50..456a5a7b112c 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -26,6 +26,8 @@
>>  #include <unistd.h>
>>  #include <asm/unistd.h>
>>  #include <linux/bpf.h>
>> +#include <sys/resource.h>
>> +#include <sys/types.h>
>>  #include "bpf.h"
>>  #include "libbpf.h"
>>  #include <errno.h>
>> @@ -68,8 +70,11 @@ static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
>>  int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
>>  {
>>  	__u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
>> +	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>>  	union bpf_attr attr;
>>  
>> +	setrlimit(RLIMIT_MEMLOCK, &rinf);
>> +
>>  	memset(&attr, '\0', sizeof(attr));
>>  
>>  	attr.map_type = create_attr->map_type;
>>
>