[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <175279460935.715479.15460687085573767955.stgit@frogsfrogsfrogs>
Date: Thu, 17 Jul 2025 16:25:48 -0700
From: "Darrick J. Wong" <djwong@...nel.org>
To: tytso@....edu
Cc: joannelkoong@...il.com, miklos@...redi.hu, John@...ves.net,
linux-fsdevel@...r.kernel.org, bernd@...ernd.com, linux-ext4@...r.kernel.org,
neal@...pa.dev
Subject: [PATCHSET RFC v3 1/3] fuse2fs: use fuse iomap data paths for better
file I/O performance
Hi all,
Switch fuse2fs to use the new iomap file data IO paths instead of
pushing it very slowly through the /dev/fuse connection. For local
filesystems, all we have to do is respond to requests for file to device
mappings; the rest of the IO hot path stays within the kernel. This
means that we can get rid of all file data block processing within
fuse2fs.
Because we're not pinning dirty pages through a potentially slow network
connection, we don't need the heavy BDI throttling for which most fuse
servers have become infamous. Yes, mapping lookups for writeback can
stall, but mappings are small as compared to data and this situation
exists for all kernel filesystems as well.
The performance of this new data path is quite stunning: on a warm
system, streaming reads and writes through the pagecache go from
60-90MB/s to 2-2.5GB/s. Direct IO reads and writes improve from the
same baseline to 2.5-8GB/s. FIEMAP and SEEK_DATA/SEEK_HOLE now work
too. The kernel ext4 driver can manage about 1.6GB/s for pagecache IO
and about 2.6-8.5GB/s, which means that fuse2fs is about as fast as the
kernel for streaming file IO.
Random 4k buffered IO is not so good: plain fuse2fs pokes along at
25-50MB/s, whereas fuse2fs with iomap manages 90-1300MB/s. The kernel
can do 900-1300MB/s. Random directio is worse: plain fuse2fs does
20-30MB/s, fuse-iomap does about 30-35MB/s, and the kernel does
40-55MB/s. I suspect that metadata heavy workloads do not perform well
on fuse2fs because libext2fs wasn't designed for that and it doesn't
even have a journal to absorb all the fsync writes. We also probably
need iomap caching really badly.
These performance numbers are slanted: my machine is 12 years old, and
fuse2fs is VERY poorly optimized for performance. It contains a single
Big Filesystem Lock which nukes multi-threaded scalability. There's no
inode cache nor is there a proper buffer cache, which means that fuse2fs
reads metadata in from disk and checksums it on EVERY ACCESS. Sad!
Despite these gaps, this RFC demonstrates that it's feasible to run the
metadata parsing parts of a filesystem in userspace while not
sacrificing much performance. We now have a vehicle to move the
filesystems out of the kernel, where they can be containerized so that
malicious filesystems can be contained, somewhat.
If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.
Comments and questions are, as always, welcome.
e2fsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/e2fsprogs.git/log/?h=fuse2fs-iomap
---
Commits in this patchset:
* fuse2fs: implement bare minimum iomap for file mapping reporting
* fuse2fs: add iomap= mount option
* fuse2fs: implement iomap configuration
* fuse2fs: register block devices for use with iomap
* fuse2fs: always use directio disk reads with fuse2fs
* fuse2fs: implement directio file reads
* fuse2fs: use tagged block IO for zeroing sub-block regions
* fuse2fs: only flush the cache for the file under directio read
* fuse2fs: add extent dump function for debugging
* fuse2fs: implement direct write support
* fuse2fs: turn on iomap for pagecache IO
* fuse2fs: improve tracing for fallocate
* fuse2fs: don't zero bytes in punch hole
* fuse2fs: don't do file data block IO when iomap is enabled
* fuse2fs: disable most io channel flush/invalidate in iomap pagecache mode
* fuse2fs: re-enable the block device pagecache for metadata IO
* fuse2fs: avoid fuseblk mode if fuse-iomap support is likely
* fuse2fs: don't allow hardlinks for now
* fuse2fs: enable file IO to inline data files
* fuse2fs: set iomap-related inode flags
* fuse2fs: add strictatime/lazytime mount options
* fuse2fs: configure block device block size
---
configure | 47 ++
configure.ac | 32 +
lib/config.h.in | 3
misc/fuse2fs.c | 1567 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 1628 insertions(+), 21 deletions(-)
Powered by blists - more mailing lists