linux-kernel - Re: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bza_vM8HE5g+4ANW3NAAt8=+cn7Lw+DSkH42gimqzYxPdw@mail.gmail.com>
Date:   Thu, 6 Apr 2023 13:22:26 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Yafang Shao <laoar.shao@...il.com>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Song Liu <song@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
        bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace

On Wed, Apr 5, 2023 at 10:44 PM Yafang Shao <laoar.shao@...il.com> wrote:
>
> On Thu, Apr 6, 2023 at 12:24 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Wed, Apr 5, 2023 at 8:22 PM Yafang Shao <laoar.shao@...il.com> wrote:
> > >
> > > On Thu, Apr 6, 2023 at 11:06 AM Alexei Starovoitov
> > > <alexei.starovoitov@...il.com> wrote:
> > > >
> > > > On Wed, Apr 5, 2023 at 7:55 PM Yafang Shao <laoar.shao@...il.com> wrote:
> > > > >
> > > > > It seems that I didn't describe the issue clearly.
> > > > > The container doesn't have CAP_SYS_ADMIN, but the CAP_SYS_ADMIN is
> > > > > required to run bpftool,  so the bpftool running in the container
> > > > > can't get the ID of bpf objects or convert IDs to FDs.
> > > > > Is there something that I missed ?
> > > >
> > > > Nothing. This is by design. bpftool needs sudo. That's all.
> > > >
> > >
> > > Hmm, what I'm trying to do is make bpftool run without sudo.
> >
> > This is not a task that is worth solving.
> >
>
> Then the container with CAP_BPF enabled can't even iterate its bpf progs ...

I'll leave the BPF namespace discussion aside (I agree that it needs
way more thought).

I am a bit surprised that we require CAP_SYS_ADMIN for GET_NEXT_ID
operations. GET_FD_BY_ID is definitely CAP_SYS_ADMIN, as they allow
you to take over someone else's link and stuff like this. But just
iterating IDs seems like a pretty innocent functionality, so maybe we
should remove CAP_SYS_ADMIN for GET_NEXT_ID?

By itself GET_NEXT_ID is relatively useless without capabilities, but
we've been floating the idea of providing GET_INFO_BY_ID (not by FD)
for a while now, and that seems useful in itself, as it would indeed
help tools like bpftool to get *some* information even without
privileges. Whether those GET_INFO_BY_ID operations should return same
full bpf_{prog,map,link,btf}_info or some trimmed down version of them
would be up to discussion, but I think getting some info without
creating an FD seems useful in itself.

Would it be worth discussing and solving this separately from
namespacing issues?

>
> > > > > Some questions,
> > > > > - What if the process exits after attaching the bpf prog and the prog
> > > > > is not auto-detachable?
> > > > >   For example, the reuserport bpf prog is not auto-detachable. After
> > > > > pins the reuserport bpf prog, a task can attach it through the pinned
> > > > > bpf file, but if the task forgets to detach it and the pinned file is
> > > > > removed, then it seems there's no way to figure out which task or
> > > > > cgroup this prog belongs to...
> > > >
> > > > you're saying that there is a bpf prog in the kernel without
> > > > corresponding user space ?
> > >
> > > No, it is corresponding to user space. For example, it may be
> > > corresponding to a socket fd, or a cgroup fd.
> > >
> > > > Meaning no user space process has an FD
> > > > that points to this prog or FD to a map that this prog is using?
> > > > In such a case this is truly kernel bpf prog. It doesn't belong to cgroup.
> > > >
> > >
> > > Even if it is kernel bpf prog, it is created by a process. The user
> > > needs to know which one created it.
> >
> > In some situations it's certainly interesting to know which process
> > loaded a particular program.
> > In many other situations it's irrelevant.
> > For example, the process that loaded a prog could have been moved to a
> > different cgroup.
> > If you want to track the loading you need to install bpf_lsm
> > that monitors prog_load hook and collect that info.
> > It's not the job of the kernel to do it.
> >
>
> Agreed with you that we can add lots of hooks to track every detail of
> the operations.
> But it is not free. More hooks, more overhead.
> If we can change the kernel to make it lightweight, why not...
>
> > > > > - Could you pls. explain in detail how to get comm, pid, or cgroup
> > > > > from a pinned bpffs file?
> > > >
> > > > pinned bpf prog and no user space holds FD to it?
> > > > It's not part of any cgroup. Nothing to print.
> > >
> > > As I explained above, even if it holds nothing, the user needs to know
> > > the information from it. For example, if it is expected, which one
> > > created it?
> >
> > See the answer above. The kernel has enough hooks already to provide
> > this information to user space. No kernel changes necessary.
>
>
>
> --
> Regards
> Yafang