linux-kernel - Re: [PATCH RFC 1/4] bpf: add cgroup device guard to flag a cgroup device prog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHC9VhRJ8iw7EXDqLdWEaj+EZSwHS0ytSWy+W-BoKVvw0EWA5Q@mail.gmail.com>
Date:   Mon, 11 Sep 2023 15:20:35 -0400
From:   Paul Moore <paul@...l-moore.com>
To:     Alexander Mikhalitsyn <alexander@...alicyn.com>,
        Christian Brauner <brauner@...nel.org>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Michael Weiß <michael.weiss@...ec.fraunhofer.de>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
        Quentin Monnet <quentin@...valent.com>,
        Alexander Viro <viro@...iv.linux.org.uk>, bpf@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        gyroidos@...ec.fraunhofer.de, Miklos Szeredi <miklos@...redi.hu>,
        Amir Goldstein <amir73il@...il.com>
Subject: Re: [PATCH RFC 1/4] bpf: add cgroup device guard to flag a cgroup
 device prog

On Mon, Sep 4, 2023 at 7:44 AM Christian Brauner <brauner@...nel.org> wrote:
> On Tue, Aug 29, 2023 at 03:35:46PM +0200, Alexander Mikhalitsyn wrote:
> > On Fri, Aug 18, 2023 at 12:11 AM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> > >
> > > On Tue, Aug 15, 2023 at 10:59:22AM +0200, Christian Brauner wrote:
> > > > On Mon, Aug 14, 2023 at 04:26:09PM +0200, Michael Weiß wrote:
> > > > > Introduce the BPF_F_CGROUP_DEVICE_GUARD flag for BPF_PROG_LOAD
> > > > > which allows to set a cgroup device program to be a device guard.
> > > >
> > > > Currently we block access to devices unconditionally in may_open_dev().
> > > > Anything that's mounted by an unprivileged containers will get
> > > > SB_I_NODEV set in s_i_flags.
> > > >
> > > > Then we currently mediate device access in:
> > > >
> > > > * inode_permission()
> > > >   -> devcgroup_inode_permission()
> > > > * vfs_mknod()
> > > >   -> devcgroup_inode_mknod()
> > > > * blkdev_get_by_dev() // sget()/sget_fc(), other ways to open block devices and friends
> > > >   -> devcgroup_check_permission()
> > > > * drivers/gpu/drm/amd/amdkfd // weird restrictions on showing gpu info afaict
> > > >   -> devcgroup_check_permission()
> > > >
> > > > All your new flag does is to bypass that SB_I_NODEV check afaict and let
> > > > it proceed to the devcgroup_*() checks for the vfs layer.
> > > >
> > > > But I don't get the semantics yet.
> > > > Is that a flag which is set on BPF_PROG_TYPE_CGROUP_DEVICE programs or
> > > > is that a flag on random bpf programs? It looks like it would be the
> > > > latter but design-wise I would expect this to be a property of the
> > > > device program itself.
> > >
> > > Looks like patch 4 attemps to bypass usual permission checks with:
> > > @@ -3976,9 +3979,19 @@ int vfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
> > >         if (error)
> > >                 return error;
> > >
> > > -       if ((S_ISCHR(mode) || S_ISBLK(mode)) && !is_whiteout &&
> > > -           !capable(CAP_MKNOD))
> > > -               return -EPERM;
> > > +       /*
> > > +        * In case of a device cgroup restirction allow mknod in user
> > > +        * namespace. Otherwise just check global capability; thus,
> > > +        * mknod is also disabled for user namespace other than the
> > > +        * initial one.
> > > +        */
> > > +       if ((S_ISCHR(mode) || S_ISBLK(mode)) && !is_whiteout) {
> > > +               if (devcgroup_task_is_guarded(current)) {
> > > +                       if (!ns_capable(current_user_ns(), CAP_MKNOD))
> > > +                               return -EPERM;
> > > +               } else if (!capable(CAP_MKNOD))
> > > +                       return -EPERM;
> > > +       }
> > >
> >
> > Dear colleagues,
> >
> > > which pretty much sounds like authoritative LSM that was brought up in the past
> > > and LSM folks didn't like it.
> >
> > Thanks for pointing this out, Alexei!
> > I've searched through the LKML archives and found a thread about this:
> > https://lore.kernel.org/all/CAEf4BzaBt0W3sWh_L4RRXEFYdBotzVEnQdqC7BO+PNWtD7eSUA@mail.gmail.com/
> >
> > As far as I understand, disagreement here is about a practice of
> > skipping kernel-built capability checks based
> > on LSM hooks, right?
> >
> > +CC Paul Moore <paul@...l-moore.com>
> >
> > >
> > > If vfs folks are ok with this special bypass of permissions in vfs_mknod()
> > > we can talk about kernel->bpf api details.
> > > The way it's done with BPF_F_CGROUP_DEVICE_GUARD flag is definitely no go,
> > > but no point going into bpf details now until agreement on bypass is made.
>
> Afaiu the original concern was specifically about an LSM allowing to
> bypass other LSMs or DAC permissions. But this wouldn't be the case
> here. The general inode access LSM permission mediation is separate from
> specific device access management: the security_inode_permission() LSM
> hook would still be called and thus LSMs restrictions would continue to
> apply exactly as they do now.
>
> For cgroup v1 device access management was a cgroup controller with
> management interface through files. It then was ported to an eBPF
> program attachable to cgroups for cgroup v2. Arguably, it should
> probably have been ported to an LSM hook or a separate LSM and untied
> from cgroups completely. The confusion here seems to indicate that that
> would have been the right way to go.
>
> Because right now device access management seems its own form of
> mandatory access control.

My apologies, I lost this thread in my inbox and I'm just reading it now.

Historically we haven't any real LSM controls around
cgroups/resource-management, but that is mostly because everything
that I recall being proposed has been very piecemeal and didn't
provide a comprehensive access control solution for resource
management.  If someone wanted to propose a proper set of access
control hooks for cgroups I think that is something we would be happy
to review, merge, etc.

Somewhat relatedly, we've been working on some docs around guidelines
for new LSM hooks; eventually the guidelines will work their way into
the kernel docs, but here they are now:

* https://github.com/LinuxSecurityModule/kernel/blob/main/README.md#new-lsm-hook-guidelines

-- 
paul-moore.com