[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrU7NbBnXXsw1B+DvTkfTVRBFWXuJ8cZERCCNvdFG6KqRw@mail.gmail.com>
Date: Sun, 4 Aug 2019 15:16:46 -0700
From: Andy Lutomirski <luto@...nel.org>
To: Song Liu <songliubraving@...com>
Cc: Andy Lutomirski <luto@...nel.org>,
Kees Cook <keescook@...omium.org>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <Kernel-team@...com>,
Lorenz Bauer <lmb@...udflare.com>,
Jann Horn <jannh@...gle.com>,
Greg KH <gregkh@...uxfoundation.org>,
Linux API <linux-api@...r.kernel.org>,
LSM List <linux-security-module@...r.kernel.org>
Subject: Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
On Fri, Aug 2, 2019 at 12:22 AM Song Liu <songliubraving@...com> wrote:
>
> Hi Andy,
>
>> I actually agree CAP_BPF_ADMIN makes sense. The hard part is to make
> >> existing tools (setcap, getcap, etc.) and libraries aware of the new CAP.
> >
> > It's been done before -- it's not that hard. IMO the main tricky bit
> > would be try be somewhat careful about defining exactly what
> > CAP_BPF_ADMIN does.
>
> Agreed. I think defining CAP_BPF_ADMIN could be a good topic for the
> Plumbers conference.
>
> OTOH, I don't think we have to wait for CAP_BPF_ADMIN to allow daemons
> like systemd to do sys_bpf() without root.
I don't understand the use case here. Are you talking about systemd
--user? As far as I know, a user is expected to be able to fully
control their systemd --user process, so giving it unrestricted bpf
access is very close to giving it superuser access, and this doesn't
sound like a good idea. I think that, if systemd --user needs bpf(),
it either needs real unprivileged bpf() or it needs a privileged
helper (SUID or a daemon) to intermediate this access.
>
> >
> >>> I don't see why you need to invent a whole new mechanism for this.
> >>> The entire cgroup ecosystem outside bpf() does just fine using the
> >>> write permission on files in cgroupfs to control access. Why can't
> >>> bpf() do the same thing?
> >>
> >> It is easier to use write permission for BPF_PROG_ATTACH. But it is
> >> not easy to do the same for other bpf commands: BPF_PROG_LOAD and
> >> BPF_MAP_*. A lot of these commands don't have target concept. Maybe
> >> we should have target concept for all these commands. But that is a
> >> much bigger project. OTOH, "all or nothing" model allows all these
> >> commands at once.
> >
> > For BPF_PROG_LOAD, I admit I've never understood why permission is
> > required at all. I think that CAP_SYS_ADMIN or similar should be
> > needed to get is_priv in the verifier, but I think that should mainly
> > be useful for tracing, and that requires lots of privilege anyway.
> > BPF_MAP_* is probably the trickiest part. One solution would be some
> > kind of bpffs, but I'm sure other solutions are possible.
>
> Improving permission management of cgroup_bpf is another good topic to
> discuss. However, it is also an overkill for current use case.
>
I looked at the code some more, and I don't think this is so hard
after all. As I understand it, all of the map..by_id stuff is, to
some extent, deprecated in favor of persistent maps. As I see it, the
map..by_id calls should require privilege forever, although I can
imagine ways to scope that privilege to a namespace if the maps
themselves were to be scoped to a namespace.
Instead, unprivileged tools would use the persistent map interface
roughly like this:
$ bpftool map create /sys/fs/bpf/my_dir/filename type hash key 8 value
8 entries 64 name mapname
This would require that the caller have either CAP_DAC_OVERRIDE or
that the caller have permission to create files in /sys/fs/bpf/my_dir
(using the same rules as for any filesystem), and the resulting map
would end up owned by the creating user and have mode 0600 (or maybe
0666, or maybe a new bpf_attr parameter) modified by umask. Then all
the various capable() checks that are currently involved in accessing
a persistent map would instead check FMODE_READ or FMODE_WRITE on the
map file as appropriate.
Half of this stuff already works. I just set my system up like this:
$ ls -l /sys/fs/bpf
total 0
drwxr-xr-x. 3 luto luto 0 Aug 4 15:10 luto
$ mkdir /sys/fs/bpf/luto/test
$ ls -l /sys/fs/bpf/luto
total 0
drwxrwxr-x. 2 luto luto 0 Aug 4 15:10 test
I bet that making the bpf() syscalls work appropriately in this
context without privilege would only be a couple of hours of work.
The hard work, creating bpffs and making it function, is already done
:)
P.S. The docs for bpftool create are less than fantastic. The
complete lack of any error message at all when the syscall returns
-EACCES is also not fantastic.
Powered by blists - more mailing lists