[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251119180449.GS196358@frogsfrogsfrogs>
Date: Wed, 19 Nov 2025 10:04:49 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Demi Marie Obenour <demiobenour@...il.com>
Cc: bernd@...ernd.com, joannelkoong@...il.com, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org, miklos@...redi.hu, neal@...pa.dev,
linux-bcachefs@...r.kernel.org, linux-btrfs@...r.kernel.org,
zfs-devel@...t.zfsonlinux.org
Subject: Re: [PATCHSET v6 4/8] fuse: allow servers to use iomap for better
file IO performance
On Wed, Nov 19, 2025 at 04:19:36AM -0500, Demi Marie Obenour wrote:
> > By keeping the I/O path mostly within the kernel, we can dramatically
> > increase the speed of disk-based filesystems.
>
> ZFS, BTRFS, and bcachefs all support compression, checksumming,
> and RAID. ZFS and bcachefs also support encryption, and f2fs and
> ext4 support fscrypt.
>
> Will this patchset be able to improve FUSE implementations of these
> filesystems? I'd rather not be in the situation where one can have
> a FUSE filesystem that is fast, but only if it doesn't support modern
> data integrity or security features.
Not on its own, no.
> I'm not a filesystem developer, but here are some ideas (that you
> can take or leave):
>
> 1. Keep the compression, checksumming, and/or encryption in-kernel,
> and have userspace tell the kernel what algorithm and/or encryption
> key to use. These algorithms are generally well-known and secure
> against malicious input. It might be necessary to make an extra
> data copy, but ideally that copy could just stay within the
> CPU caches.
I think this is easily doable for fscrypt and compression since (IIRC)
the kernel filesystems already know how to transform data for I/O, and
nowadays iomap allows hooking of bios before submission and/or after
endio. Obviously you'd have to store encryption keys in the kernel
somewhere.
Checksumming is harder though, since the checksum information has to be
persisted in the metadata somewhere and AFAICT each checksumming fs does
things differently. For that, I think the fuse server would have to
convey to the kernel (a) a description of the checksum geometry and (b)
a buffer for storing the checksums. On write the kernel would compute
the checksum and write it to the buffer for the fs to persist as part of
the ioend; and for read the fuse server would have to read the checksums
into the buffer and pass that to the kernel.
(Note that fsverity won't have this problem because all current
implementations stuff the merkle tree in post-eof datablocks; the
fsverity code only wants fses to read it in the pagecache; and pass it
the page)
> 2. Somehow integrate with the blk-crypto framework. This has the
> advantage that it supports inline encryption hardware, which
> I suspect is needed for this to be usable on mobile devices.
> After all, the keys on these systems are often not even visible
> to the kernel, let alone to userspace.
Yes, that would be even easier than messing around with bounce buffers.
> 3. Figure out a way to make a userspace data path fast enough.
> To prevent data corruption by unprivileged users of the FS,
> it's necessary to make a copy before checksumming, compression,
> or authenticated encryption. If this copy is done in the kernel,
> the server doesn't have to perform its own copy. By using large
> ring buffers, it might be possible to amortize the context switch
> cost away.
>
> Authenticated encryption also needs a copy in the *other* direction:
> if the (untrusted) client can see unauthenticated plaintext, it's
> a security vulnerability. That needs another copy from server
> buffers to client buffers, and the kernel can do that as well.
>
> 4. Make context switches much faster. L4-style IPC is incredibly fast,
> at least if one doesn't have to worry about Spectre. Unfortunately,
> nowadays one *does* need to worry about Spectre.
I don't think context switching overhead is going down.
> Obviously, none of these will be as fast as doing DMA directly to user
> buffers. However, all of these features (except for encryption using
> inline encryption hardware) come at a performance penalty already.
> I just don't want a FUSE server to have to pay a much larger penalty
> than a kernel filesystem would.
>
> I'm CCing the bcachefs, BTRFS, and ZFS-on-Linux mailing lists.
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)
Powered by blists - more mailing lists