lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250521235837.GB9688@frogsfrogsfrogs>
Date: Wed, 21 May 2025 16:58:37 -0700
From: "Darrick J. Wong" <djwong@...nel.org>
To: linux-fsdevel <linux-fsdevel@...r.kernel.org>
Cc: John@...ves.net, bernd@...ernd.com, miklos@...redi.hu,
	joannelkoong@...il.com, Josef Bacik <josef@...icpanda.com>,
	linux-ext4 <linux-ext4@...r.kernel.org>,
	Theodore Ts'o <tytso@....edu>
Subject: [RFC[RAP]] fuse: use fs-iomap for better performance so we can
 containerize ext4

Hi everyone,

DO NOT MERGE THIS.

This is the very first request for comments of a prototype to connect
the Linux fuse driver to fs-iomap for regular file IO operations to and
from files whose contents persist to locally attached storage devices.

Why would you want to do that?  Most filesystem drivers are seriously
vulnerable to metadata parsing attacks, as syzbot has shown repeatedly
over almost a decade of its existence.  Faulty code can lead to total
kernel compromise, and I think there's a very strong incentive to move
all that parsing out to userspace where we can containerize the fuse
server process.

willy's folios conversion project (and to a certain degree RH's new
mount API) have also demonstrated that treewide changes to the core
mm/pagecache/fs code are very very difficult to pull off and take years
because you have to understand every filesystem's bespoke use of that
core code.  Eeeugh.

The fuse command plumbing is very simple -- the ->iomap_begin,
->iomap_end, and iomap ioend calls within iomap are turned into upcalls
to the fuse server via a trio of new fuse commands.  This is suitable
for very simple filesystems that don't do tricky things with mappings
(e.g. FAT/HFS) during writeback.  This isn't quite adequate for ext4,
but solving that is for the next sprint.

With this overly simplistic RFC, I am to show that it's possible to
build a fuse server for a real filesystem (ext4) that runs entirely in
userspace yet maintains most of its performance.  At this early stage I
get about 95% of the kernel ext4 driver's streaming directio performance
on streaming IO, and 110% of its streaming buffered IO performance.
Random buffered IO suffers a 90% hit on writes due to unwritten extent
conversions.  Random direct IO is about 60% as fast as the kernel; see
the cover letter for the fuse2fs iomap changes for more details.

There are some major warts remaining:

1. The iomap cookie validation is not present, which can lead to subtle
races between pagecache zeroing and writeback on filesystems that
support unwritten and delalloc mappings.

2. Mappings ought to be cached in the kernel for more speed.

3. iomap doesn't support things like fscrypt or fsverity, and I haven't
yet figured out how inline data is supposed to work.

4. I would like to be able to turn on fuse+iomap on a per-inode basis,
which currently isn't possible because the kernel fuse driver will iget
inodes prior to calling FUSE_GETATTR to discover the properties of the
inode it just read.

5. ext4 doesn't support out of place writes so I don't know if that
actually works correctly.

6. iomap is an inode-based service, not a file-based service.  This
means that we /must/ push ext2's inode numbers into the kernel via
FUSE_GETATTR so that it can report those same numbers back out through
the FUSE_IOMAP_* calls.  However, the fuse kernel uses a separate nodeid
to index its incore inode, so we have to pass those too so that
notifications work properly.

I'll work on these in June, but for now here's an unmergeable RFC to
start some discussion.

--Darrick

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ