lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJnrk1bSVy4=c=N_FfOajs1FE4o8T=Br=jFm7gBDaCGvRpgGVA@mail.gmail.com>
Date: Tue, 27 Jan 2026 11:47:31 -0800
From: Joanne Koong <joannelkoong@...il.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: miklos@...redi.hu, bernd@...ernd.com, neal@...pa.dev, 
	linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCHSET v6 4/8] fuse: allow servers to use iomap for better
 file IO performance

On Mon, Jan 26, 2026 at 6:22 PM Darrick J. Wong <djwong@...nel.org> wrote:
>
> On Mon, Jan 26, 2026 at 04:59:16PM -0800, Joanne Koong wrote:
> > On Tue, Oct 28, 2025 at 5:38 PM Darrick J. Wong <djwong@...nel.org> wrote:
> > >
> > > Hi all,
> > >
> > > This series connects fuse (the userspace filesystem layer) to fs-iomap
> > > to get fuse servers out of the business of handling file I/O themselves.
> > > By keeping the IO path mostly within the kernel, we can dramatically
> > > improve the speed of disk-based filesystems.  This enables us to move
> > > all the filesystem metadata parsing code out of the kernel and into
> > > userspace, which means that we can containerize them for security
> > > without losing a lot of performance.
> >
> > I haven't looked through how the fuse2fs or fuse4fs servers are
> > implemented yet (also, could you explain the difference between the
> > two? Which one should we look at to see how it all ties together?),
>
> fuse4fs is a lowlevel fuse server; fuse2fs is a high(?) level fuse
> server.  fuse4fs is the successor to fuse2fs, at least on Linux and BSD.

Ah I see, thanks for the explanation. In that case, I'll just look at
fuse4fs then.

>
> > but I wonder if having bpf infrastructure hooked up to fuse would be
> > especially helpful for what you're doing here with fuse iomap. afaict,
> > every read/write whether it's buffered or direct will incur at least 1
> > call to ->iomap_begin() to get the mapping metadata, which will be 2
> > context-switches (and if the server has ->iomap_end() implemented,
> > then 2 more context-switches).
>
> Yes, I agree that's a lot of context switching for file IO...
>
> > But it seems like the logic for retrieving mapping
> > offsets/lengths/metadata should be pretty straightforward?
>
> ...but it gets very cheap if the fuse server can cache mappings in the
> kernel to avoid all that.  That is, incidentally, what patchset #7
> implements.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-cache_2026-01-22
>
> > If the extent lookups are table lookups or tree
> > traversals without complex side effects, then having
> > ->iomap_begin()/->iomap_end() be executed as a bpf program would avoid
> > the context switches and allow all the caching logic to be moved from
> > the kernel to the server-side (eg using bpf maps).
>
> Hrmm.  Now that /is/ an interesting proposal.  Does BPF have a data
> structure that supports interval mappings?  I think the existing bpf map

Not yet but I don't see why a b+ tree like data strucutre couldn't be added.
Maybe one workaround in the meantime that could work is using a sorted
array map and doing binary search on that, until interval mappings can
be natively supported?

> only does key -> value.  Also, is there an upper limit on the size of a
> map?  You could have hundreds of millions of maps for a very fragmented
> regular file.

If I'm remembering correctly, there's an upper limit on the number of
map entries, which is bounded by u32

>
> At one point I suggested to the famfs maintainer that it might be
> easier/better to implement the interleaved mapping lookups as bpf
> programs instead of being stuck with a fixed format in the fuse
> userspace abi, but I don't know if he ever implemented that.

This seems like a good use case for it too
>
> > Is this your
> > assessment of it as well or do you think the server-side logic for
> > iomap_begin()/iomap_end() is too complicated to make this realistic?
> > Asking because I'm curious whether this direction makes sense, not
> > because I think it would be a blocker for your series.
>
> For disk-based filesystems I think it would be difficult to model a bpf
> program to do mappings, since they can basically point anywhere and be
> of any size.

Hmm I'm not familiar enough with disk-based filesystems to know what
the "point anywhere and be of any size" means. For the mapping stuff,
doesn't it just point to a block number? Or are you saying the problem
would be there's too many mappings since a mapping could be any size?

I was thinking the issue would be more that there might be other logic
inside ->iomap_begin()/->iomap_end() besides the mapping stuff that
would need to be done that would be too out-of-scope for bpf. But I
think I need to read through the fuse4fs stuff to understand more what
it's doing in those functions.

Thanks,
Joanne

>
> OTOH it would be enormously hilarious to me if one could load a file
> mapping predictive model into the kernel as a bpf program and use that
> as a first tier before checking the in-memory btree mapping cache from
> patchset 7.  Quite a few years ago now there was a FAST paper
> establishing that even a stupid linear regression model could in theory
> beat a disk btree lookup.
>
> --D
>
> > Thanks,
> > Joanne
> >
> > >
> > > If you're going to start using this code, I strongly recommend pulling
> > > from my git trees, which are linked below.
> > >
> > > This has been running on the djcloud for months with no problems.  Enjoy!
> > > Comments and questions are, as always, welcome.
> > >
> > > --D
> > >
> > > kernel git tree:
> > > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-fileio
> > > ---
> > > Commits in this patchset:
> > >  * fuse: implement the basic iomap mechanisms
> > >  * fuse_trace: implement the basic iomap mechanisms
> > >  * fuse: make debugging configurable at runtime
> > >  * fuse: adapt FUSE_DEV_IOC_BACKING_{OPEN,CLOSE} to add new iomap devices
> > >  * fuse_trace: adapt FUSE_DEV_IOC_BACKING_{OPEN,CLOSE} to add new iomap devices
> > >  * fuse: flush events and send FUSE_SYNCFS and FUSE_DESTROY on unmount
> > >  * fuse: create a per-inode flag for toggling iomap
> > >  * fuse_trace: create a per-inode flag for toggling iomap
> > >  * fuse: isolate the other regular file IO paths from iomap
> > >  * fuse: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE}
> > >  * fuse_trace: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE}
> > >  * fuse: implement direct IO with iomap
> > >  * fuse_trace: implement direct IO with iomap
> > >  * fuse: implement buffered IO with iomap
> > >  * fuse_trace: implement buffered IO with iomap
> > >  * fuse: implement large folios for iomap pagecache files
> > >  * fuse: use an unrestricted backing device with iomap pagecache io
> > >  * fuse: advertise support for iomap
> > >  * fuse: query filesystem geometry when using iomap
> > >  * fuse_trace: query filesystem geometry when using iomap
> > >  * fuse: implement fadvise for iomap files
> > >  * fuse: invalidate ranges of block devices being used for iomap
> > >  * fuse_trace: invalidate ranges of block devices being used for iomap
> > >  * fuse: implement inline data file IO via iomap
> > >  * fuse_trace: implement inline data file IO via iomap
> > >  * fuse: allow more statx fields
> > >  * fuse: support atomic writes with iomap
> > >  * fuse_trace: support atomic writes with iomap
> > >  * fuse: disable direct reclaim for any fuse server that uses iomap
> > >  * fuse: enable swapfile activation on iomap
> > >  * fuse: implement freeze and shutdowns for iomap filesystems
> > > ---
> > >  fs/fuse/fuse_i.h          |  161 +++
> > >  fs/fuse/fuse_trace.h      |  939 +++++++++++++++++++
> > >  fs/fuse/iomap_i.h         |   52 +
> > >  include/uapi/linux/fuse.h |  219 ++++
> > >  fs/fuse/Kconfig           |   48 +
> > >  fs/fuse/Makefile          |    1
> > >  fs/fuse/backing.c         |   12
> > >  fs/fuse/dev.c             |   30 +
> > >  fs/fuse/dir.c             |  120 ++
> > >  fs/fuse/file.c            |  133 ++-
> > >  fs/fuse/file_iomap.c      | 2230 +++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/fuse/inode.c           |  162 +++
> > >  fs/fuse/iomode.c          |    2
> > >  fs/fuse/trace.c           |    2
> > >  14 files changed, 4056 insertions(+), 55 deletions(-)
> > >  create mode 100644 fs/fuse/iomap_i.h
> > >  create mode 100644 fs/fuse/file_iomap.c
> > >
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ