linux-kernel - Re: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZCMgpRtT6ywmtALi@google.com>
Date:   Tue, 28 Mar 2023 10:15:17 -0700
From:   Stanislav Fomichev <sdf@...gle.com>
To:     Yafang Shao <laoar.shao@...il.com>
Cc:     ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
        kafai@...com, songliubraving@...com, yhs@...com,
        john.fastabend@...il.com, kpsingh@...nel.org, haoluo@...gle.com,
        jolsa@...nel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace

On 03/28, Yafang Shao wrote:
> On Tue, Mar 28, 2023 at 1:28 AM Stanislav Fomichev <sdf@...gle.com> wrote:
> >
> > On 03/26, Yafang Shao wrote:
> > > Currently only CAP_SYS_ADMIN can iterate BPF object IDs and convert  
> IDs
> > > to FDs, that's intended for BPF's security model[1]. Not only does it
> > > prevent non-privilidged users from getting other users' bpf program,  
> but
> > > also it prevents the user from iterating his own bpf objects.
> >
> > > In container environment, some users want to run bpf programs in their
> > > containers. These users can run their bpf programs under CAP_BPF and
> > > some other specific CAPs, but they can't inspect their bpf programs  
> in a
> > > generic way. For example, the bpftool can't be used as it requires
> > > CAP_SYS_ADMIN. That is very inconvenient.
> >
> > > Without CAP_SYS_ADMIN, the only way to get the information of a bpf  
> object
> > > which is not created by the process itself is with SCM_RIGHTS, that
> > > requires each processes which created bpf object has to implement a  
> unix
> > > domain socket to share the fd of a bpf object between different
> > > processes, that is really trivial and troublesome.
> >
> > > Hence we need a better mechanism to get bpf object info without
> > > CAP_SYS_ADMIN.
> >
> > [..]
> >
> > > BPF namespace is introduced in this patchset with an attempt to remove
> > > the CAP_SYS_ADMIN requirement. The user can create bpf map, prog and
> > > link in a specific bpf namespace, then these bpf objects will not be
> > > visible to the users in a different bpf namespace. But these bpf
> > > objects are visible to its parent bpf namespace, so the sys admin can
> > > still iterate and inspect them.
> >
> > Does it essentially mean unpriv bpf?

> Right. With CAP_BPF and some other CAPs enabled.

> > Can I, as a non-root, create
> > a new bpf namespace and start loading/attaching progs?

> No, you can't create a new bpf namespace as a non-root, see also
> copy_namespaces().
> In the container environment, new namespaces are always created by
> containered, which is started by root.

Are you talking about "if (!ns_capable(user_ns, CAP_SYS_ADMIN))" part
from copy_namespaces? Isn't it trivially bypassed with a new user
namespace?

IIUC, I can create a new user namespace which gives me CAP_SYS_ADMIN
in this particular user-ns. Then I can go on and create a new bpf
namespace (with CAP_BPF) and go wild? I won't see anything from the
other namespaces, but I'll be able to load/attach bpf programs?

> > Maybe add a paragraph about now vs whatever you're proposing.

> What I'm proposing in this patchset is to put bpf objects (map, prog,
> link, and btf) into the bpf namespace. Next step I will put bpffs into
> the bpf namespace as well.
> That said, I'm trying to put  all the objects created in bpf into the
> bpf namespace. Below is a simple paragraph to illustrate it.

> Regarding the unpriv user with CAP_BPF enabled,
>                                                               Now | Future
> ------------------------------------------------------------------------
> Iterate his BPF IDs                                | N   |  Y  |
> Iterate others' BPF IDs                          | N   |  N  |
> Convert his BPF IDs to FDs                  | N   |  Y  |
> Convert others' BPF IDs to FDs            | N   |  N  |
> Get others' object info from pinned file  | Y(*) | N  |
> ------------------------------------------------------------------------

> (*) It can be improved by,
>       1). Different containers has different bpffs
>       2). Setting file permission
>       That's not perfect, for example, if one single user has two bpf
> instances, but we don't want them to inspect each other.

I think the question here is what happens to the existing
capable(CAP_BPF) checks? Do they become ns_capable(CAP_BPF) eventually?

And if not, I don't think it integrates well with the user namespaces?

> > Otherwise it's not very clear what's the security story.
> > (haven't looked at the whole series, so maybe it's answered somewhere  
> else?)
> >


> --
> Regards
> Yafang