[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJnrk1Yw5Ypv8NdY_+1wW3VURgmrornp2XCQSYv5VS4rwaf6Ow@mail.gmail.com>
Date: Tue, 27 Jan 2026 10:04:28 -0800
From: Joanne Koong <joannelkoong@...il.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: miklos@...redi.hu, bernd@...ernd.com, neal@...pa.dev,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 17/31] fuse: use an unrestricted backing device with iomap
pagecache io
On Mon, Jan 26, 2026 at 6:09 PM Darrick J. Wong <djwong@...nel.org> wrote:
>
> On Mon, Jan 26, 2026 at 05:35:05PM -0800, Joanne Koong wrote:
> > On Mon, Jan 26, 2026 at 3:55 PM Darrick J. Wong <djwong@...nel.org> wrote:
> > >
> > > On Mon, Jan 26, 2026 at 02:03:35PM -0800, Joanne Koong wrote:
> > > > On Tue, Oct 28, 2025 at 5:49 PM Darrick J. Wong <djwong@...nel.org> wrote:
> > > > >
> > > > > From: Darrick J. Wong <djwong@...nel.org>
> > > > >
> > > > > With iomap support turned on for the pagecache, the kernel issues
> > > > > writeback to directly to block devices and we no longer have to push all
> > > > > those pages through the fuse device to userspace. Therefore, we don't
> > > > > need the tight dirty limits (~1M) that are used for regular fuse. This
> > > > > dramatically increases the performance of fuse's pagecache IO.
> > > > >
> > > > > Signed-off-by: "Darrick J. Wong" <djwong@...nel.org>
> > > > > ---
> > > > > fs/fuse/file_iomap.c | 21 +++++++++++++++++++++
> > > > > 1 file changed, 21 insertions(+)
> > > > >
> > > > >
> > > > > diff --git a/fs/fuse/file_iomap.c b/fs/fuse/file_iomap.c
> > > > > index 0bae356045638b..a9bacaa0991afa 100644
> > > > > --- a/fs/fuse/file_iomap.c
> > > > > +++ b/fs/fuse/file_iomap.c
> > > > > @@ -713,6 +713,27 @@ const struct fuse_backing_ops fuse_iomap_backing_ops = {
> > > > > void fuse_iomap_mount(struct fuse_mount *fm)
> > > > > {
> > > > > struct fuse_conn *fc = fm->fc;
> > > > > + struct super_block *sb = fm->sb;
> > > > > + struct backing_dev_info *old_bdi = sb->s_bdi;
> > > > > + char *suffix = sb->s_bdev ? "-fuseblk" : "-fuse";
> > > > > + int res;
> > > > > +
> > > > > + /*
> > > > > + * sb->s_bdi points to the initial private bdi. However, we want to
> > > > > + * redirect it to a new private bdi with default dirty and readahead
> > > > > + * settings because iomap writeback won't be pushing a ton of dirty
> > > > > + * data through the fuse device. If this fails we fall back to the
> > > > > + * initial fuse bdi.
> > > > > + */
> > > > > + sb->s_bdi = &noop_backing_dev_info;
> > > > > + res = super_setup_bdi_name(sb, "%u:%u%s.iomap", MAJOR(fc->dev),
> > > > > + MINOR(fc->dev), suffix);
> > > > > + if (res) {
> > > > > + sb->s_bdi = old_bdi;
> > > > > + } else {
> > > > > + bdi_unregister(old_bdi);
> > > > > + bdi_put(old_bdi);
> > > > > + }
> > > >
> > > > Maybe I'm missing something here, but isn't sb->s_bdi already set to
> > > > noop_backing_dev_info when fuse_iomap_mount() is called?
> > > > fuse_fill_super() -> fuse_fill_super_common() -> fuse_bdi_init() does
> > > > this already before the fuse_iomap_mount() call, afaict.
> > >
> > > Right.
> > >
> > > > I think what we need to do is just unset BDI_CAP_STRICTLIMIT and
> > > > adjust the bdi max ratio?
> > >
> > > That's sufficient to undo the effects of fuse_bdi_init, yes. However
> > > the BDI gets created with the name "$major:$minor{-fuseblk}" and there
> > > are "management" scripts that try to tweak fuse BDIs for better
> > > performance.
> > >
> > > I don't want some dumb script to mismanage a fuse-iomap filesystem
> > > because it can't tell the difference, so I create a new bdi with the
> > > name "$major:$minor.iomap" to make it obvious. But super_setup_bdi_name
> > > gets cranky if s_bdi isn't set to noop and we don't want to fail a mount
> > > here due to ENOMEM so ... I implemented this weird switcheroo code.
> >
> > I see. It might be useful to copy/paste this into the commit message
> > just for added context. I don't see a better way of doing it than what
> > you have in this patch then since we rely on the init reply to know
> > whether iomap should be used or not...
>
> I'll do that. I will also add that as soon as any BDI is created, it
> will be exposed to userspace in sysfs. That means that running the code
> from fuse_bdi_init in reverse will not necessarily produce the same
> results as a freshly created BDI.
>
> > If the new bdi setup fails, I wonder if the mount should just fail
> > entirely then. That seems better to me than letting it succeed with
>
> Err, which new bdi setup? If fuse-iomap can't create a new BDI, it will
> set s_bdi back to the old one and move on. You'll get degraded
> performance, but that's not the end of the world.
I was thinking from the user POV, I'd rather the whole mount fail
(which it seems like would only be a transient failure, eg running out
of memory) and I retry, than it work but have writes potentially run
10x slower (10x comes from the benchmarks Jingbo saw in [1])
>
> > strictlimiting enforced, especially since large folios will be enabled
> > for fuse iomap. [1] has some numbers for the performance degradations
> > I saw for writes with strictlimiting on and large folios enabled.
>
> If fuse_bdi_init can't set up a bdi it will fail the mount.
>
> That said... from reading [1], if strictlimiting is enabled with large
> folios, then can we figure out what is the effective max folio size and
> lower it to that?
I'm not really sure how we figure that out, unless I guess we try to
do it experimentally? The throttling logic for this is in
balance_dirty_pages().
>
> > Speaking of strictlimiting though, from a policy standpoint if we
> > think strictlimiting is needed in general in fuse (there's a thread
> > from last year [1] about removing strict limiting), then I think that
>
> (did you mean [2] here?)
Ah yes sorry, I had meant [2].
>
> > would need to apply to iomap as well, at least for unprivileged
> > servers.
>
> iomap requires a privileged server, FWIW.
Oh right, I forgot iomap only runs with privileges enabled. In that
case, I think that makes the whole strictlimiting thing a lot simpler
then. imo for privileged servers, we should get rid of strictlimiting
entirely. Though I'm not sure how MIklos feels about that.
Thanks,
Joanne
>
> > [1] https://lore.kernel.org/linux-fsdevel/CAJnrk1bwat_r4+pmhaWH-ThAi+zoAJFwmJG65ANj1Zv0O0s4_A@mail.gmail.com/
> > [2] https://lore.kernel.org/linux-fsdevel/20251010150113.GC6174@frogsfrogsfrogs/T/#ma34ff5ae338a83f8b2e946d7e5332ea835fa0ff6
> >
> > >
> > > > This is more of a nit, but I think it'd also be nice if we
> > > > swapped the ordering of this patch with the previous one enabling
> > > > large folios, so that large folios gets enabled only when all the bdi
> > > > stuff for it is ready.
> > >
> > > Will do, thanks for reading these patches!
> > >
> > > Also note that I've changed this part of the patchset quite a lot since
> > > this posting; iomap configuration is now a completely separate fuse
> > > command that gets triggered after the FUSE_INIT reply is received.
> >
> > Great, I'll look at your upstream tree then for this part.
>
> Ok.
>
> --D
>
> > Thanks,
> > Joanne
> >
> > >
> > > --D
> > >
> > > > Thanks,
> > > > Joanne
> > > >
> > > > >
> > > > > /*
> > > > > * Enable syncfs for iomap fuse servers so that we can send a final
> > > > >
> > > >
> >
Powered by blists - more mailing lists