lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260127022235.GG5900@frogsfrogsfrogs>
Date: Mon, 26 Jan 2026 18:22:35 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Joanne Koong <joannelkoong@...il.com>
Cc: miklos@...redi.hu, bernd@...ernd.com, neal@...pa.dev,
	linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCHSET v6 4/8] fuse: allow servers to use iomap for better
 file IO performance

On Mon, Jan 26, 2026 at 04:59:16PM -0800, Joanne Koong wrote:
> On Tue, Oct 28, 2025 at 5:38 PM Darrick J. Wong <djwong@...nel.org> wrote:
> >
> > Hi all,
> >
> > This series connects fuse (the userspace filesystem layer) to fs-iomap
> > to get fuse servers out of the business of handling file I/O themselves.
> > By keeping the IO path mostly within the kernel, we can dramatically
> > improve the speed of disk-based filesystems.  This enables us to move
> > all the filesystem metadata parsing code out of the kernel and into
> > userspace, which means that we can containerize them for security
> > without losing a lot of performance.
> 
> I haven't looked through how the fuse2fs or fuse4fs servers are
> implemented yet (also, could you explain the difference between the
> two? Which one should we look at to see how it all ties together?),

fuse4fs is a lowlevel fuse server; fuse2fs is a high(?) level fuse
server.  fuse4fs is the successor to fuse2fs, at least on Linux and BSD.

> but I wonder if having bpf infrastructure hooked up to fuse would be
> especially helpful for what you're doing here with fuse iomap. afaict,
> every read/write whether it's buffered or direct will incur at least 1
> call to ->iomap_begin() to get the mapping metadata, which will be 2
> context-switches (and if the server has ->iomap_end() implemented,
> then 2 more context-switches).

Yes, I agree that's a lot of context switching for file IO...

> But it seems like the logic for retrieving mapping
> offsets/lengths/metadata should be pretty straightforward?

...but it gets very cheap if the fuse server can cache mappings in the
kernel to avoid all that.  That is, incidentally, what patchset #7
implements.

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-cache_2026-01-22

> If the extent lookups are table lookups or tree
> traversals without complex side effects, then having
> ->iomap_begin()/->iomap_end() be executed as a bpf program would avoid
> the context switches and allow all the caching logic to be moved from
> the kernel to the server-side (eg using bpf maps).

Hrmm.  Now that /is/ an interesting proposal.  Does BPF have a data
structure that supports interval mappings?  I think the existing bpf map
only does key -> value.  Also, is there an upper limit on the size of a
map?  You could have hundreds of millions of maps for a very fragmented
regular file.

At one point I suggested to the famfs maintainer that it might be
easier/better to implement the interleaved mapping lookups as bpf
programs instead of being stuck with a fixed format in the fuse
userspace abi, but I don't know if he ever implemented that.

> Is this your
> assessment of it as well or do you think the server-side logic for
> iomap_begin()/iomap_end() is too complicated to make this realistic?
> Asking because I'm curious whether this direction makes sense, not
> because I think it would be a blocker for your series.

For disk-based filesystems I think it would be difficult to model a bpf
program to do mappings, since they can basically point anywhere and be
of any size.

OTOH it would be enormously hilarious to me if one could load a file
mapping predictive model into the kernel as a bpf program and use that
as a first tier before checking the in-memory btree mapping cache from
patchset 7.  Quite a few years ago now there was a FAST paper
establishing that even a stupid linear regression model could in theory
beat a disk btree lookup.

--D

> Thanks,
> Joanne
> 
> >
> > If you're going to start using this code, I strongly recommend pulling
> > from my git trees, which are linked below.
> >
> > This has been running on the djcloud for months with no problems.  Enjoy!
> > Comments and questions are, as always, welcome.
> >
> > --D
> >
> > kernel git tree:
> > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-fileio
> > ---
> > Commits in this patchset:
> >  * fuse: implement the basic iomap mechanisms
> >  * fuse_trace: implement the basic iomap mechanisms
> >  * fuse: make debugging configurable at runtime
> >  * fuse: adapt FUSE_DEV_IOC_BACKING_{OPEN,CLOSE} to add new iomap devices
> >  * fuse_trace: adapt FUSE_DEV_IOC_BACKING_{OPEN,CLOSE} to add new iomap devices
> >  * fuse: flush events and send FUSE_SYNCFS and FUSE_DESTROY on unmount
> >  * fuse: create a per-inode flag for toggling iomap
> >  * fuse_trace: create a per-inode flag for toggling iomap
> >  * fuse: isolate the other regular file IO paths from iomap
> >  * fuse: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE}
> >  * fuse_trace: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE}
> >  * fuse: implement direct IO with iomap
> >  * fuse_trace: implement direct IO with iomap
> >  * fuse: implement buffered IO with iomap
> >  * fuse_trace: implement buffered IO with iomap
> >  * fuse: implement large folios for iomap pagecache files
> >  * fuse: use an unrestricted backing device with iomap pagecache io
> >  * fuse: advertise support for iomap
> >  * fuse: query filesystem geometry when using iomap
> >  * fuse_trace: query filesystem geometry when using iomap
> >  * fuse: implement fadvise for iomap files
> >  * fuse: invalidate ranges of block devices being used for iomap
> >  * fuse_trace: invalidate ranges of block devices being used for iomap
> >  * fuse: implement inline data file IO via iomap
> >  * fuse_trace: implement inline data file IO via iomap
> >  * fuse: allow more statx fields
> >  * fuse: support atomic writes with iomap
> >  * fuse_trace: support atomic writes with iomap
> >  * fuse: disable direct reclaim for any fuse server that uses iomap
> >  * fuse: enable swapfile activation on iomap
> >  * fuse: implement freeze and shutdowns for iomap filesystems
> > ---
> >  fs/fuse/fuse_i.h          |  161 +++
> >  fs/fuse/fuse_trace.h      |  939 +++++++++++++++++++
> >  fs/fuse/iomap_i.h         |   52 +
> >  include/uapi/linux/fuse.h |  219 ++++
> >  fs/fuse/Kconfig           |   48 +
> >  fs/fuse/Makefile          |    1
> >  fs/fuse/backing.c         |   12
> >  fs/fuse/dev.c             |   30 +
> >  fs/fuse/dir.c             |  120 ++
> >  fs/fuse/file.c            |  133 ++-
> >  fs/fuse/file_iomap.c      | 2230 +++++++++++++++++++++++++++++++++++++++++++++
> >  fs/fuse/inode.c           |  162 +++
> >  fs/fuse/iomode.c          |    2
> >  fs/fuse/trace.c           |    2
> >  14 files changed, 4056 insertions(+), 55 deletions(-)
> >  create mode 100644 fs/fuse/iomap_i.h
> >  create mode 100644 fs/fuse/file_iomap.c
> >
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ