netdev - Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <B167C8E2-76B9-4FAF-9903-2C3C0D7CB329@fb.com>
Date:   Wed, 31 Jul 2019 07:44:44 +0000
From:   Song Liu <songliubraving@...com>
To:     Andy Lutomirski <luto@...nel.org>
CC:     Kees Cook <keescook@...omium.org>,
        "linux-security@...r.kernel.org" <linux-security@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <Kernel-team@...com>,
        Lorenz Bauer <lmb@...udflare.com>,
        Jann Horn <jannh@...gle.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Linux API <linux-api@...r.kernel.org>
Subject: Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf

Hi Andy,

> On Jul 30, 2019, at 1:20 PM, Andy Lutomirski <luto@...nel.org> wrote:
> 
> On Sat, Jul 27, 2019 at 11:20 AM Song Liu <songliubraving@...com> wrote:
>> 
>> Hi Andy,
>> 
>>>>>> 
>>>>> 
>>>>> Well, yes. sys_bpf() is pretty powerful.
>>>>> 
>>>>> The goal of /dev/bpf is to enable special users to call sys_bpf(). In
>>>>> the meanwhile, such users should not take down the whole system easily
>>>>> by accident, e.g., with rm -rf /.
>>>> 
>>>> That’s easy, though — bpftool could learn to read /etc/bpfusers before allowing ruid != 0.
>>> 
>>> This is a great idea! fscaps + /etc/bpfusers should do the trick.
>> 
>> After some discussions and more thinking on this, I have some concerns
>> with the user space only approach.
>> 
>> IIUC, your proposal for user space only approach is like:
>> 
>> 1. bpftool (and other tools) check /etc/bpfusers and only do
>>   setuid for allowed users:
>> 
>>        int main()
>>        {
>>                if (/* uid in /etc/bpfusers */)
>>                        setuid(0);
>>                sys_bpf(...);
>>        }
>> 
>> 2. bpftool (and other tools) is installed with CAP_SETUID:
>> 
>>        setcap cap_setuid=e+p /bin/bpftool
>> 
> 
> You have this a bit backwards.  You wouldn't use CAP_SETUID.  You
> would use the setuid *mode* bit, i.e. chmod 4111 (or 4100 and use ACLs
> to further lock it down).  Or you could use setcap cap_sys_admin=p,
> although the details vary.  It woks a bit like this:
> 
> First, if you are running with elevated privilege due to SUID or
> fscaps, the kernel and glibc offer you a degree of protection: you are
> protected from ptrace(), LD_PRELOAD, etc.  You are *not* protected
> from yourself.  For example, you may be running in a context in which
> an attacker has malicious values in your environment variables, CWD,
> etc.  Do you need to carefully decide whether you are willing to run
> with elevated privilege on behalf of the user, which you learn like
> this:
> 
> uid_t real_uid = getuid();
> 
> Your decision may may depend on command-line arguments as well (i.e.
> you might want to allow tracing but not filtering, say).  Once you've
> made this decision, the details vary:
> 
> For SUID, you either continue to run with euid == 0, or you drop
> privilege using something like:
> 
> if (setresuid(real_uid, real_uid, real_uid) != 0) {
> /* optionally print an error to stderr */
> exit(1);
> }
> 
> For fscaps, if you want to be privileged, you use something like
> capng_update(); capng_apply() to set CAP_SYS_ADMIN to be effective
> when you want privilege.  If you want to be unprivileged (because
> bpfusers says so, for example), you could use capng_update() to drop
> CAP_SYS_ADMIN entirely and see if the calls still work without
> privilege.  But this is a little bit awkward, since you don't directly
> know whether the user that invoked you in the first place had
> CAP_SYS_ADMIN to begin with.
> 
> In general, SUID is a bit easier to work with.

Thanks a lot for these explanations. I learned a lot today via reading
this email and Googling. 

> 
>> This approach is not ideal, because we need to trust the tool to give
>> it CAP_SETUID. A hacked tool could easily bypass /etc/bpfusers check
>> or use other root only sys calls after setuid(0).
> 
> How?  The whole SUID mechanism is designed fairly carefully to prevent
> this.  /bin/sudo is likely to be SUID on your system, but you can't
> just "hack" it to become root.

I guess "hacked" was not the right description. I should have used 
"buggy" instead. 

SUID mechanism is great for small and solid tools, like sudo, passwd,
mount, etc. The user or sys admin could trust the tool to always do the 
right thing. On the other hand, I think SUID is not ideal for complex
tools that are under heavy development. As you mentioned, SUID doesn't 
protect against oneself. Therefore, it won't protect the system from 
buggy tool that uses SUID on something it should not. 

I think SUID is good for bpftool, because it is easy to use SUID 
correctly in bpftool. But, it is not easy to use SUID correctly in 
systemd. 

systemd and similar user space daemons use sys_bpf() to manage cgroup 
bpf programs and maps (BPF_MAP_TYPE_*CGROUP*, BPF_PROG_TYPE_CGROUP_*, 
BPF_CGROUP_*). The daemon runs in the background, and handles user 
requests to attach/detach BPF programs. If the daemon uses SUID or 
similar mechanism, it has to use euid == 0 for sys_bpf(). However, 
such daemons are usually pretty complicated. It is not easy to prove 
the daemon won't misuse euid == 0. 

Does this make sense so far? I will discuss more about cgroup in the 
next email. 

Thanks,
Song