lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJnrk1bxhw2u0qwjw0dJPGdmxEXbcEyKn-=iFrszqof2c8wGCA@mail.gmail.com>
Date: Tue, 27 Jan 2026 16:10:43 -0800
From: Joanne Koong <joannelkoong@...il.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: miklos@...redi.hu, bernd@...ernd.com, neal@...pa.dev, 
	linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCHSET v6 4/8] fuse: allow servers to use iomap for better
 file IO performance

On Tue, Jan 27, 2026 at 3:21 PM Darrick J. Wong <djwong@...nel.org> wrote:
>
> On Tue, Jan 27, 2026 at 11:47:31AM -0800, Joanne Koong wrote:
> > On Mon, Jan 26, 2026 at 6:22 PM Darrick J. Wong <djwong@...nel.org> wrote:
> > >
> > > On Mon, Jan 26, 2026 at 04:59:16PM -0800, Joanne Koong wrote:
> > > > On Tue, Oct 28, 2025 at 5:38 PM Darrick J. Wong <djwong@...nel.org> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This series connects fuse (the userspace filesystem layer) to fs-iomap
> > > > > to get fuse servers out of the business of handling file I/O themselves.
> > > > > By keeping the IO path mostly within the kernel, we can dramatically
> > > > > improve the speed of disk-based filesystems.  This enables us to move
> > > > > all the filesystem metadata parsing code out of the kernel and into
> > > > > userspace, which means that we can containerize them for security
> > > > > without losing a lot of performance.
> > > >
> > > > I haven't looked through how the fuse2fs or fuse4fs servers are
> > > > implemented yet (also, could you explain the difference between the
> > > > two? Which one should we look at to see how it all ties together?),
> > >
> > > fuse4fs is a lowlevel fuse server; fuse2fs is a high(?) level fuse
> > > server.  fuse4fs is the successor to fuse2fs, at least on Linux and BSD.
> >
> > Ah I see, thanks for the explanation. In that case, I'll just look at
> > fuse4fs then.
> >
> > >
> > > > but I wonder if having bpf infrastructure hooked up to fuse would be
> > > > especially helpful for what you're doing here with fuse iomap. afaict,
> > > > every read/write whether it's buffered or direct will incur at least 1
> > > > call to ->iomap_begin() to get the mapping metadata, which will be 2
> > > > context-switches (and if the server has ->iomap_end() implemented,
> > > > then 2 more context-switches).
> > >
> > > Yes, I agree that's a lot of context switching for file IO...
> > >
> > > > But it seems like the logic for retrieving mapping
> > > > offsets/lengths/metadata should be pretty straightforward?
> > >
> > > ...but it gets very cheap if the fuse server can cache mappings in the
> > > kernel to avoid all that.  That is, incidentally, what patchset #7
> > > implements.
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-cache_2026-01-22
> > >
> > > > If the extent lookups are table lookups or tree
> > > > traversals without complex side effects, then having
> > > > ->iomap_begin()/->iomap_end() be executed as a bpf program would avoid
> > > > the context switches and allow all the caching logic to be moved from
> > > > the kernel to the server-side (eg using bpf maps).
> > >
> > > Hrmm.  Now that /is/ an interesting proposal.  Does BPF have a data
> > > structure that supports interval mappings?  I think the existing bpf map
> >
> > Not yet but I don't see why a b+ tree like data strucutre couldn't be added.
> > Maybe one workaround in the meantime that could work is using a sorted
> > array map and doing binary search on that, until interval mappings can
> > be natively supported?
>
> I guess, though I already had a C structure to borrow from xfs ;)
>
> > > only does key -> value.  Also, is there an upper limit on the size of a
> > > map?  You could have hundreds of millions of maps for a very fragmented
> > > regular file.
> >
> > If I'm remembering correctly, there's an upper limit on the number of
> > map entries, which is bounded by u32
>
> That's problematic, since files can have 64-bit logical block numbers.

The key size supports 64-bits. The u32 bound would be the limit on the
number of extents for the file.

>
> > > At one point I suggested to the famfs maintainer that it might be
> > > easier/better to implement the interleaved mapping lookups as bpf
> > > programs instead of being stuck with a fixed format in the fuse
> > > userspace abi, but I don't know if he ever implemented that.
> >
> > This seems like a good use case for it too
> > >
> > > > Is this your
> > > > assessment of it as well or do you think the server-side logic for
> > > > iomap_begin()/iomap_end() is too complicated to make this realistic?
> > > > Asking because I'm curious whether this direction makes sense, not
> > > > because I think it would be a blocker for your series.
> > >
> > > For disk-based filesystems I think it would be difficult to model a bpf
> > > program to do mappings, since they can basically point anywhere and be
> > > of any size.
> >
> > Hmm I'm not familiar enough with disk-based filesystems to know what
> > the "point anywhere and be of any size" means. For the mapping stuff,
> > doesn't it just point to a block number? Or are you saying the problem
> > would be there's too many mappings since a mapping could be any size?
>
> The second -- mappings can be any size, and unprivileged userspace can
> control the mappings.

If I'm understanding what you're saying here, this is the same
discussion as the one above about the u32 bound, correct?

>
> > I was thinking the issue would be more that there might be other logic
> > inside ->iomap_begin()/->iomap_end() besides the mapping stuff that
> > would need to be done that would be too out-of-scope for bpf. But I
> > think I need to read through the fuse4fs stuff to understand more what
> > it's doing in those functions.

Looking at fuse4fs logic cursorily, it seems doable? What I like about
offloading this to bpf too is it would also then allow John's famfs to
just go through your iomap plumbing as a use case of it instead of
being an entirely separate thing. Though maybe there's some other
reason for that that you guys have discussed prior. In any case, I'll
ask this on John's main famfs patchset. It kind of seems to me that
you guys are pretty much doing the exact same thing conceptually.

Thanks,
Joanne

>
> <nod>
>
> --D
>
> >
> > Thanks,
> > Joanne
> >
> > >
> > > OTOH it would be enormously hilarious to me if one could load a file
> > > mapping predictive model into the kernel as a bpf program and use that
> > > as a first tier before checking the in-memory btree mapping cache from
> > > patchset 7.  Quite a few years ago now there was a FAST paper
> > > establishing that even a stupid linear regression model could in theory
> > > beat a disk btree lookup.
> > >
> > > --D
> > >
> > > > Thanks,
> > > > Joanne
> > > >
> > > > >
> > > > > If you're going to start using this code, I strongly recommend pulling
> > > > > from my git trees, which are linked below.
> > > > >
> > > > > This has been running on the djcloud for months with no problems.  Enjoy!
> > > > > Comments and questions are, as always, welcome.
> > > > >
> > > > > --D
> > > > >
> > > > > kernel git tree:
> > > > > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-fileio
> > > > > ---
> > > > > Commits in this patchset:
> > > > >  * fuse: implement the basic iomap mechanisms
> > > > >  * fuse_trace: implement the basic iomap mechanisms
> > > > >  * fuse: make debugging configurable at runtime
> > > > >  * fuse: adapt FUSE_DEV_IOC_BACKING_{OPEN,CLOSE} to add new iomap devices
> > > > >  * fuse_trace: adapt FUSE_DEV_IOC_BACKING_{OPEN,CLOSE} to add new iomap devices
> > > > >  * fuse: flush events and send FUSE_SYNCFS and FUSE_DESTROY on unmount
> > > > >  * fuse: create a per-inode flag for toggling iomap
> > > > >  * fuse_trace: create a per-inode flag for toggling iomap
> > > > >  * fuse: isolate the other regular file IO paths from iomap
> > > > >  * fuse: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE}
> > > > >  * fuse_trace: implement basic iomap reporting such as FIEMAP and SEEK_{DATA,HOLE}
> > > > >  * fuse: implement direct IO with iomap
> > > > >  * fuse_trace: implement direct IO with iomap
> > > > >  * fuse: implement buffered IO with iomap
> > > > >  * fuse_trace: implement buffered IO with iomap
> > > > >  * fuse: implement large folios for iomap pagecache files
> > > > >  * fuse: use an unrestricted backing device with iomap pagecache io
> > > > >  * fuse: advertise support for iomap
> > > > >  * fuse: query filesystem geometry when using iomap
> > > > >  * fuse_trace: query filesystem geometry when using iomap
> > > > >  * fuse: implement fadvise for iomap files
> > > > >  * fuse: invalidate ranges of block devices being used for iomap
> > > > >  * fuse_trace: invalidate ranges of block devices being used for iomap
> > > > >  * fuse: implement inline data file IO via iomap
> > > > >  * fuse_trace: implement inline data file IO via iomap
> > > > >  * fuse: allow more statx fields
> > > > >  * fuse: support atomic writes with iomap
> > > > >  * fuse_trace: support atomic writes with iomap
> > > > >  * fuse: disable direct reclaim for any fuse server that uses iomap
> > > > >  * fuse: enable swapfile activation on iomap
> > > > >  * fuse: implement freeze and shutdowns for iomap filesystems
> > > > > ---
> > > > >  fs/fuse/fuse_i.h          |  161 +++
> > > > >  fs/fuse/fuse_trace.h      |  939 +++++++++++++++++++
> > > > >  fs/fuse/iomap_i.h         |   52 +
> > > > >  include/uapi/linux/fuse.h |  219 ++++
> > > > >  fs/fuse/Kconfig           |   48 +
> > > > >  fs/fuse/Makefile          |    1
> > > > >  fs/fuse/backing.c         |   12
> > > > >  fs/fuse/dev.c             |   30 +
> > > > >  fs/fuse/dir.c             |  120 ++
> > > > >  fs/fuse/file.c            |  133 ++-
> > > > >  fs/fuse/file_iomap.c      | 2230 +++++++++++++++++++++++++++++++++++++++++++++
> > > > >  fs/fuse/inode.c           |  162 +++
> > > > >  fs/fuse/iomode.c          |    2
> > > > >  fs/fuse/trace.c           |    2
> > > > >  14 files changed, 4056 insertions(+), 55 deletions(-)
> > > > >  create mode 100644 fs/fuse/iomap_i.h
> > > > >  create mode 100644 fs/fuse/file_iomap.c
> > > > >
> > > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ