linux-kernel - Re: [PATCH bpf-next v1 1/9] bpf: Add mkdir, rmdir, unlink syscalls for prog_bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+khW7g+T2sAkP1aycmts_82JKWgYk5Y0ZJp+EvjFUyNY8W_5w@mail.gmail.com>
Date:   Mon, 14 Mar 2022 10:07:31 -0700
From:   Hao Luo <haoluo@...gle.com>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Joe Burton <jevburton.kernel@...il.com>,
        Tejun Heo <tj@...nel.org>, joshdon@...gle.com, sdf@...gle.com,
        bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v1 1/9] bpf: Add mkdir, rmdir, unlink syscalls
 for prog_bpf_syscall

Hello Al,

On Fri, Mar 11, 2022 at 7:46 PM Al Viro <viro@...iv.linux.org.uk> wrote:
>
> On Fri, Feb 25, 2022 at 03:43:31PM -0800, Hao Luo wrote:
> > This patch allows bpf_syscall prog to perform some basic filesystem
> > operations: create, remove directories and unlink files. Three bpf
> > helpers are added for this purpose. When combined with the following
> > patches that allow pinning and getting bpf objects from bpf prog,
> > this feature can be used to create directory hierarchy in bpffs that
> > help manage bpf objects purely using bpf progs.
> >
> > The added helpers subject to the same permission checks as their syscall
> > version. For example, one can not write to a read-only file system;
> > The identity of the current process is checked to see whether it has
> > sufficient permission to perform the operations.
> >
> > Only directories and files in bpffs can be created or removed by these
> > helpers. But it won't be too hard to allow these helpers to operate
> > on files in other filesystems, if we want.
>
> In which contexts can those be called?
>

In a sleepable context. The plan is to introduce a certain tracepoints
as sleepable, a program that attaches to sleepable tracepoints is
allowed to call these functions. In particular, the first sleepable
tracepoint introduced in this patchset is one at the end of
cgroup_mkdir(). Do you have any advices?

> > +BPF_CALL_2(bpf_rmdir, const char *, pathname, int, pathname_sz)
> > +{
> > +     struct user_namespace *mnt_userns;
> > +     struct path parent;
> > +     struct dentry *dentry;
> > +     int err;
> > +
> > +     if (pathname_sz <= 1 || pathname[pathname_sz - 1])
> > +             return -EINVAL;
> > +
> > +     err = kern_path(pathname, 0, &parent);
> > +     if (err)
> > +             return err;
> > +
> > +     if (!bpf_path_is_bpf_dir(&parent)) {
> > +             err = -EPERM;
> > +             goto exit1;
> > +     }
> > +
> > +     err = mnt_want_write(parent.mnt);
> > +     if (err)
> > +             goto exit1;
> > +
> > +     dentry = kern_path_locked(pathname, &parent);
>
> This can't be right.  Ever.  There is no promise whatsoever
> that these two lookups will resolve to the same place.
>
> > +BPF_CALL_2(bpf_unlink, const char *, pathname, int, pathname_sz)
> > +{
> > +     struct user_namespace *mnt_userns;
> > +     struct path parent;
> > +     struct dentry *dentry;
> > +     struct inode *inode = NULL;
> > +     int err;
> > +
> > +     if (pathname_sz <= 1 || pathname[pathname_sz - 1])
> > +             return -EINVAL;
> > +
> > +     err = kern_path(pathname, 0, &parent);
> > +     if (err)
> > +             return err;
> > +
> > +     err = mnt_want_write(parent.mnt);
> > +     if (err)
> > +             goto exit1;
> > +
> > +     dentry = kern_path_locked(pathname, &parent);
> > +     if (IS_ERR(dentry)) {
> > +             err = PTR_ERR(dentry);
> > +             goto exit2;
> > +     }
>
> Ditto.  NAK; if you want to poke into fs/namei.c guts, do it right.
> Or at least discuss that on fsdevel.  As it is, it's completely broken.
> It's racy *and* it blatantly leaks both vfsmount and dentry references.
>
> NAKed-by: Al Viro <viro@...iv.linux.org.uk>

Thanks Al for taking a look. Actually, there is a simpler approach:
can we export two functions in namei.c that wrap call to do_mkdirat
and do_unlinkat, but take a kernel string as pathname? Then these two
bpf helpers can use them, don't have to be this complicated. Does this
sound good to you?

Thanks!