linux-kernel - Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <caea44dd-10a8-accb-7dec-868fb8f2f061@linux.alibaba.com>
Date:   Wed, 17 May 2023 15:19:28 +0800
From:   Gao Xiang <hsiangkao@...ux.alibaba.com>
To:     Amir Goldstein <amir73il@...il.com>
Cc:     Daniel Rosenberg <drosen@...gle.com>,
        Miklos Szeredi <miklos@...redi.hu>, bpf@...r.kernel.org,
        Alexei Starovoitov <ast@...nel.org>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-unionfs@...r.kernel.org,
        Daniel Borkmann <daniel@...earbox.net>,
        John Fastabend <john.fastabend@...il.com>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
        Shuah Khan <shuah@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Joanne Koong <joannelkoong@...il.com>,
        Mykola Lysenko <mykolal@...com>, kernel-team@...roid.com
Subject: Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem
 Extension for FUSE



On 2023/5/17 00:05, Gao Xiang wrote:
> Hi Amir,
> 
> On 2023/5/17 23:51, Amir Goldstein wrote:
>> On Wed, May 17, 2023 at 5:50 AM Gao Xiang <hsiangkao@...ux.alibaba.com> wrote:
>>>
>>>
>>>
>>> On 2023/5/2 17:07, Daniel Rosenberg wrote:
>>>> On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@...redi.hu> wrote:
>>>>>
>>>>>
>>>>> The security model needs to be thought about and documented.  Think
>>>>> about this: the fuse server now delegates operations it would itself
>>>>> perform to the passthrough code in fuse.  The permissions that would
>>>>> have been checked in the context of the fuse server are now checked in
>>>>> the context of the task performing the operation.  The server may be
>>>>> able to bypass seccomp restrictions.  Files that are open on the
>>>>> backing filesystem are now hidden (e.g. lsof won't find these), which
>>>>> allows the server to obfuscate accesses to backing files.  Etc.
>>>>>
>>>>> These are not particularly worrying if the server is privileged, but
>>>>> fuse comes with the history of supporting unprivileged servers, so we
>>>>> should look at supporting passthrough with unprivileged servers as
>>>>> well.
>>>>>
>>>>
>>>> This is on my todo list. My current plan is to grab the creds that the
>>>> daemon uses to respond to FUSE_INIT. That should keep behavior fairly
>>>> similar. I'm not sure if there are cases where the fuse server is
>>>> operating under multiple contexts.
>>>> I don't currently have a plan for exposing open files via lsof. Every
>>>> such file should relate to one that will show up though. I haven't dug
>>>> into how that's set up, but I'm open to suggestions.
>>>>
>>>>> My other generic comment is that you should add justification for
>>>>> doing this in the first place.  I guess it's mainly performance.  So
>>>>> how performance can be won in real life cases?   It would also be good
>>>>> to measure the contribution of individual ops to that win.   Is there
>>>>> another reason for this besides performance?
>>>>>
>>>>> Thanks,
>>>>> Miklos
>>>>
>>>> Our main concern with it is performance. We have some preliminary
>>>> numbers looking at the pure passthrough case. We've been testing using
>>>> a ramdrive on a somewhat slow machine, as that should highlight
>>>> differences more. We ran fio for sequential reads, and random
>>>> read/write. For sequential reads, we were seeing libfuse's
>>>> passthrough_hp take about a 50% hit, with fuse-bpf not being
>>>> detectably slower. For random read/write, we were seeing a roughly 90%
>>>> drop in performance from passthrough_hp, while fuse-bpf has about a 7%
>>>> drop in read and write speed. When we use a bpf that traces every
>>>> opcode, that performance hit increases to a roughly 1% drop in
>>>> sequential read performance, and a 20% drop in both read and write
>>>> performance for random read/write. We plan to make more complex bpf
>>>> examples, with fuse daemon equivalents to compare against.
>>>>
>>>> We have not looked closely at the impact of individual opcodes yet.
>>>>
>>>> There's also a potential ease of use for fuse-bpf. If you're
>>>> implementing a fuse daemon that is largely mirroring a backing
>>>> filesystem, you only need to write code for the differences in
>>>> behavior. For instance, say you want to remove image metadata like
>>>> location. You could give bpf information on what range of data is
>>>> metadata, and zero out that section without having to handle any other
>>>> operations.
>>>
>>> A bit out of topic (although I'm not quite look into FUSE BPF internals)
>>> After roughly listening to this topic in FS track last week, I'm not
>>> quite sure (at least in the long term) if it might be better if
>>> ebpf-related filter/redirect stuffs could be landed in vfs or in a
>>> somewhat stackable fs so that we could redirect/filter any sub-fstree
>>> in principle?    It's just an open question and I have no real tendency
>>> of this but do we really need a BPF-filter functionality for each
>>> individual fs?
>>
>> I think that is a valid question, but the answer is that even if it makes sense,
>> doing something like this in vfs would be a much bigger project with larger
>> consequences on performance and security and whatnot, so even if
>> (and a very big if) this ever happens, using FUSE-BPF as a playground for
>> this sort of stuff would be a good idea.
> 
> My current observation is that the total Fuse-BPF LoC is already beyond the


                          ^ sorry I double-checked now I was wrong, forget about it.

> whole FUSE itself.  In addition, it almost hooks all fs operations which
> impacts something to me.
> 
>>
>> This reminds me of union mounts - it made sense to have union mount
>> functionality in vfs, but after a long winding road, a stacked fs (overlayfs)
>> turned out to be a much more practical solution.
> 
> Yeah, I agree.  So it was just a pure hint on my side.
> 
>>
>>>
>>> It sounds much like
>>> https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers
>>>
>>
>> Nice reference.
>> I must admit that I found it hard to understand what Windows filter drivers
>> can do compared to FUSE-BPF design.
>> It'd be nice to get some comparison from what is planned for FUSE-BPF.
> 
> At least some investigation/analysis first might be better in the long
> term development.
> 
>>
>> Interesting to note that there is a "legacy" Windows filter driver API,
>> so Windows didn't get everything right for the first API - that is especially
>> interesting to look at as repeating other people's mistakes would be a shame.
> 
> I'm not familiar with that details as well, yet I saw that they have a
> filesystem filter subsystem, so I mentioned it here.
> 
> Thanks,
> Gao Xiang
> 
>>
>> Thanks,
>> Amir.