linux-kernel - Re: [RFC PATCH v3 08/37] fuse: Add fuse-bpf, a stacked fs extension for FUSE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230502033825.ofcxttuquoanhe7b@dhcp-172-26-102-232.dhcp.thefacebook.com>
Date:   Mon, 1 May 2023 20:38:25 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Daniel Rosenberg <drosen@...gle.com>
Cc:     Miklos Szeredi <miklos@...redi.hu>, bpf@...r.kernel.org,
        Alexei Starovoitov <ast@...nel.org>,
        Amir Goldstein <amir73il@...il.com>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-unionfs@...r.kernel.org,
        Daniel Borkmann <daniel@...earbox.net>,
        John Fastabend <john.fastabend@...il.com>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
        Shuah Khan <shuah@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Joanne Koong <joannelkoong@...il.com>,
        Mykola Lysenko <mykolal@...com>, kernel-team@...roid.com,
        Paul Lawrence <paullawrence@...gle.com>,
        Alessio Balsini <balsini@...gle.com>
Subject: Re: [RFC PATCH v3 08/37] fuse: Add fuse-bpf, a stacked fs extension
 for FUSE

On Mon, Apr 17, 2023 at 06:40:08PM -0700, Daniel Rosenberg wrote:
> Fuse-bpf provides a short circuit path for Fuse implementations that act
> as a stacked filesystem. For cases that are directly unchanged,
> operations are passed directly to the backing filesystem. Small
> adjustments can be handled by bpf prefilters or postfilters, with the
> option to fall back to userspace as needed.

Here is my understanding of fuse-bpf design:
- bpf progs can mostly read-only access fuse_args before and after proper vfs
  operation on a backing path/file/inode.
- args are unconditionally prepared for bpf prog consumption, but progs won't
  be doing anything with them most of the time.
- progs unfortunately cannot do any real work. they're nothing but simple filters.
  They can give 'green light' for a fuse_FOO op to be delegated to proper vfs_FOO
  in backing file. The logic in this patch keeps track of backing_path/file/inode.
- in other words bpf side is "dumb", but it's telling kernel what to do with
  real things like path/file/inode and the kernel is doing real work and calling vfs_*.

This design adds non-negligible overhead to fuse when CONFIG_FUSE_BPF is set.
Comparing to trip to user space it's close to zero, but the cost of
initialize_in/out + backing + finalize is not free.
The patch 33 is especially odd.
fuse has a traditional mechanism to upcall to user space with fuse_simple_request.
The patch 33 allows bpf prog to return special return value and trigger two more
fuse_bpf_simple_request-s to user space. Not clear why.
It seems to me that the main assumption of the fuse bpf design is that bpf prog
has to stay short and simple. It cannot do much other than reading and comparing
strings with the help of dynptr.
How about we allow bpf attach to fuse_simple_request and nothing else?
All fuse ops call it anyway and cmd is already encoded in the args.
Then let bpf prog read fuse_args as-is (without converting them to bpf_fuse_args)
and avoid doing actual fuse_req to user space.
Also allow bpf prog acquire and remember path/file/inode.
The verifier is already smart enough to track that the prog is doing it safely
without leaking references and what not.
And, of course, allow bpf prog call vfs_* via kfuncs.
In other words, instead of hard coding
 +#define bpf_fuse_backing(inode, io, out,                             \
 +                      initialize_in, initialize_out,                 \
 +                      backing, finalize, args...)                    \
one for each fuse_ops in the kernel let bpf prog do the same but on demand.
The biggest advantage is that this patch set instead of 95% on fuse side and 5% on bpf
will become 5% addition to fuse code. All the logic will be handled purely by bpf.
Right now you're limiting it to one backing_file per fuse_file.
With bpf prog driving it the prog can keep multiple backing_files and shuffle
access to them as prog decides.
Instead of doing 'return BPF_FUSE_CONTINUE' the bpf progs will
pass 'path' to kfunc bpf_vfs_open, than stash 'struct bpf_file*', etc.
Probably will be easier to white board this idea during lsfmmbpf.

First 3 patches look fine. Thank you for resending them separately.