linux-kernel - Re: [PATCH RFC 1/4] bpf: add cgroup device guard to flag a cgroup device prog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230817221102.6hexih3uki3jf6w3@macbook-pro-8.dhcp.thefacebook.com>
Date:   Thu, 17 Aug 2023 15:11:02 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Christian Brauner <brauner@...nel.org>
Cc:     Michael Weiß <michael.weiss@...ec.fraunhofer.de>,
        Alexander Mikhalitsyn <alexander@...alicyn.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
        Quentin Monnet <quentin@...valent.com>,
        Alexander Viro <viro@...iv.linux.org.uk>, bpf@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        gyroidos@...ec.fraunhofer.de
Subject: Re: [PATCH RFC 1/4] bpf: add cgroup device guard to flag a cgroup
 device prog

On Tue, Aug 15, 2023 at 10:59:22AM +0200, Christian Brauner wrote:
> On Mon, Aug 14, 2023 at 04:26:09PM +0200, Michael Weiß wrote:
> > Introduce the BPF_F_CGROUP_DEVICE_GUARD flag for BPF_PROG_LOAD
> > which allows to set a cgroup device program to be a device guard.
> 
> Currently we block access to devices unconditionally in may_open_dev().
> Anything that's mounted by an unprivileged containers will get
> SB_I_NODEV set in s_i_flags.
> 
> Then we currently mediate device access in:
> 
> * inode_permission()
>   -> devcgroup_inode_permission()
> * vfs_mknod()
>   -> devcgroup_inode_mknod()
> * blkdev_get_by_dev() // sget()/sget_fc(), other ways to open block devices and friends
>   -> devcgroup_check_permission()
> * drivers/gpu/drm/amd/amdkfd // weird restrictions on showing gpu info afaict
>   -> devcgroup_check_permission()
> 
> All your new flag does is to bypass that SB_I_NODEV check afaict and let
> it proceed to the devcgroup_*() checks for the vfs layer.
> 
> But I don't get the semantics yet.
> Is that a flag which is set on BPF_PROG_TYPE_CGROUP_DEVICE programs or
> is that a flag on random bpf programs? It looks like it would be the
> latter but design-wise I would expect this to be a property of the
> device program itself.

Looks like patch 4 attemps to bypass usual permission checks with:
@@ -3976,9 +3979,19 @@ int vfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
        if (error)
                return error;

-       if ((S_ISCHR(mode) || S_ISBLK(mode)) && !is_whiteout &&
-           !capable(CAP_MKNOD))
-               return -EPERM;
+       /*
+        * In case of a device cgroup restirction allow mknod in user
+        * namespace. Otherwise just check global capability; thus,
+        * mknod is also disabled for user namespace other than the
+        * initial one.
+        */
+       if ((S_ISCHR(mode) || S_ISBLK(mode)) && !is_whiteout) {
+               if (devcgroup_task_is_guarded(current)) {
+                       if (!ns_capable(current_user_ns(), CAP_MKNOD))
+                               return -EPERM;
+               } else if (!capable(CAP_MKNOD))
+                       return -EPERM;
+       }

which pretty much sounds like authoritative LSM that was brought up in the past
and LSM folks didn't like it.

If vfs folks are ok with this special bypass of permissions in vfs_mknod()
we can talk about kernel->bpf api details.
The way it's done with BPF_F_CGROUP_DEVICE_GUARD flag is definitely no go,
but no point going into bpf details now until agreement on bypass is made.