linux-kernel - Re: [RFC PATCH v3 08/37] fuse: Add fuse-bpf, a stacked fs extension for FUSE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOQ4uxi3WXb2MKx+YUnsCad2jUDtUuafFzuqJi0uo4us7xmfuA@mail.gmail.com>
Date:   Wed, 3 May 2023 06:45:17 +0300
From:   Amir Goldstein <amir73il@...il.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Daniel Rosenberg <drosen@...gle.com>,
        Miklos Szeredi <miklos@...redi.hu>, bpf@...r.kernel.org,
        Alexei Starovoitov <ast@...nel.org>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-unionfs@...r.kernel.org,
        Daniel Borkmann <daniel@...earbox.net>,
        John Fastabend <john.fastabend@...il.com>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
        Shuah Khan <shuah@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Joanne Koong <joannelkoong@...il.com>,
        Mykola Lysenko <mykolal@...com>, kernel-team@...roid.com,
        Paul Lawrence <paullawrence@...gle.com>,
        Alessio Balsini <balsini@...gle.com>
Subject: Re: [RFC PATCH v3 08/37] fuse: Add fuse-bpf, a stacked fs extension
 for FUSE

On Tue, May 2, 2023 at 6:38 AM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Mon, Apr 17, 2023 at 06:40:08PM -0700, Daniel Rosenberg wrote:
> > Fuse-bpf provides a short circuit path for Fuse implementations that act
> > as a stacked filesystem. For cases that are directly unchanged,
> > operations are passed directly to the backing filesystem. Small
> > adjustments can be handled by bpf prefilters or postfilters, with the
> > option to fall back to userspace as needed.
>
> Here is my understanding of fuse-bpf design:
> - bpf progs can mostly read-only access fuse_args before and after proper vfs
>   operation on a backing path/file/inode.
> - args are unconditionally prepared for bpf prog consumption, but progs won't
>   be doing anything with them most of the time.
> - progs unfortunately cannot do any real work. they're nothing but simple filters.
>   They can give 'green light' for a fuse_FOO op to be delegated to proper vfs_FOO
>   in backing file. The logic in this patch keeps track of backing_path/file/inode.
> - in other words bpf side is "dumb", but it's telling kernel what to do with
>   real things like path/file/inode and the kernel is doing real work and calling vfs_*.
>
> This design adds non-negligible overhead to fuse when CONFIG_FUSE_BPF is set.
> Comparing to trip to user space it's close to zero, but the cost of
> initialize_in/out + backing + finalize is not free.
> The patch 33 is especially odd.
> fuse has a traditional mechanism to upcall to user space with fuse_simple_request.
> The patch 33 allows bpf prog to return special return value and trigger two more
> fuse_bpf_simple_request-s to user space. Not clear why.
> It seems to me that the main assumption of the fuse bpf design is that bpf prog
> has to stay short and simple. It cannot do much other than reading and comparing
> strings with the help of dynptr.
> How about we allow bpf attach to fuse_simple_request and nothing else?
> All fuse ops call it anyway and cmd is already encoded in the args.
> Then let bpf prog read fuse_args as-is (without converting them to bpf_fuse_args)
> and avoid doing actual fuse_req to user space.
> Also allow bpf prog acquire and remember path/file/inode.
> The verifier is already smart enough to track that the prog is doing it safely
> without leaking references and what not.
> And, of course, allow bpf prog call vfs_* via kfuncs.
> In other words, instead of hard coding
>  +#define bpf_fuse_backing(inode, io, out,                             \
>  +                      initialize_in, initialize_out,                 \
>  +                      backing, finalize, args...)                    \
> one for each fuse_ops in the kernel let bpf prog do the same but on demand.
> The biggest advantage is that this patch set instead of 95% on fuse side and 5% on bpf
> will become 5% addition to fuse code. All the logic will be handled purely by bpf.
> Right now you're limiting it to one backing_file per fuse_file.
> With bpf prog driving it the prog can keep multiple backing_files and shuffle
> access to them as prog decides.
> Instead of doing 'return BPF_FUSE_CONTINUE' the bpf progs will
> pass 'path' to kfunc bpf_vfs_open, than stash 'struct bpf_file*', etc.
> Probably will be easier to white board this idea during lsfmmbpf.
>

I have to admit that sounds a bit challenging, but I'm up for sitting
in front of that whiteboard :)

BTW, thanks Daniel (Borkmann) for sorting out the cross track
sessions for FS-BFP.
We have another FS only session on FUSE-BFP, but I feel there is plenty
to discuss on the FUSE-bypass part, as well as on the BPF part.
Same goes for BFP iterators for filesystems session.

Thanks,
Amir.