[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aytnzv4tmp7fdvpgxdfoe2ncu7qaxlp2svsxiskfnrvdnknhmp@uu4ifgc6aj34>
Date: Mon, 12 May 2025 14:51:45 -0500
From: John Groves <John@...ves.net>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: "Darrick J. Wong" <djwong@...nel.org>,
Dan Williams <dan.j.williams@...el.com>, Bernd Schubert <bschubert@....com>,
John Groves <jgroves@...ron.com>, Jonathan Corbet <corbet@....net>,
Vishal Verma <vishal.l.verma@...el.com>, Dave Jiang <dave.jiang@...el.com>,
Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>,
Luis Henriques <luis@...lia.com>, Randy Dunlap <rdunlap@...radead.org>,
Jeff Layton <jlayton@...nel.org>, Kent Overstreet <kent.overstreet@...ux.dev>,
Petr Vorel <pvorel@...e.cz>, Brian Foster <bfoster@...hat.com>, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev, linux-cxl@...r.kernel.org,
linux-fsdevel@...r.kernel.org, Amir Goldstein <amir73il@...il.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>, Stefan Hajnoczi <shajnocz@...hat.com>,
Joanne Koong <joannelkoong@...il.com>, Josef Bacik <josef@...icpanda.com>,
Aravind Ramesh <arramesh@...ron.com>, Ajay Joshi <ajayjoshi@...ron.com>, john@...ves.net
Subject: Re: [RFC PATCH 13/19] famfs_fuse: Create files with famfs fmaps
On 25/05/06 06:56PM, Miklos Szeredi wrote:
> On Mon, 28 Apr 2025 at 21:00, Darrick J. Wong <djwong@...nel.org> wrote:
>
> > <nod> I don't know what Miklos' opinion is about having multiple
> > fusecmds that do similar things -- on the one hand keeping yours and my
> > efforts separate explodes the amount of userspace abi that everyone must
> > maintain, but on the other hand it then doesn't couple our projects
> > together, which might be a good thing if it turns out that our domain
> > models are /really/ actually quite different.
>
> Sharing the interface at least would definitely be worthwhile, as
> there does not seem to be a great deal of difference between the
> generic one and the famfs specific one. Only implementing part of the
> functionality that the generic one provides would be fine.
Agreed. I'm coming around to thinking the most practical approach would be
to share the GET_FMAP message/response, but to add a separate response
format for Darrick's use case - when the time comes. In this patch set,
that starts with 'struct fuse_famfs_fmap_header' and is followed by the
approriate extent structures, serialized in the message. Collectively
that's an fmap in message format.
Side note: the current patch set sends back the logically-variable-sized
fmap in a fixed-size message, but V2 of the series will address that;
I got some help from Bernd there, but haven't finished it yet.
So the next version of the patch set would, say, add a more generic first
'struct fmap_header' that would indicate whether the next item would be
'struct fuse_famfs_fmap_header' (i.e. my/famfs metadata) or some other
to be codified metadata format. I'm going here because I'm dubious that
we even *can* do grand-unified-fmap-metadata (or that we should try).
This will require versioning the affected structures, unless we think
the fmap-in-message structure can be opaque to the rest of fuse. @miklos,
is there an example to follow regarding struct versioning in
already-existing fuse structures?
>
> > (Especially because I suspect that interleaving is the norm for memory,
> > whereas we try to avoid that for disk filesystems.)
>
> So interleaved extents are just like normal ones except they repeat,
> right? What about adding a special "repeat last N extent
> descriptions" type of extent?
It's a bit more than that. The comment at [1] makes it possible to understand
the scheme, but I'd be happy to talk through it with you on a call if that
seems helpful.
An interleaved extent stripes data spread across N memory devices in raid 0
format; the space from each device is described by a single simple extent
(so it's contigous), but it's not consumed contiguously - it's consumed in
fixed-sized chunks that precess across the devices. Notwithstanding that I
couldn't explain it very well when we talked about it at LPC, I think I
could make it pretty clear in a pretty brief call now.
In any case, you have my word that it's actually quite elegant :D
(seriously, but also with a smile...)
>
> > > But the current implementation does not contemplate partially cached fmaps.
> > >
> > > Adding notification could address revoking them post-haste (is that why
> > > you're thinking about notifications? And if not can you elaborate on what
> > > you're after there?).
> >
> > Yeah, invalidating the mapping cache at random places. If, say, you
> > implement a clustered filesystem with iomap, the metadata server could
> > inform the fuse server on the local node that a certain range of inode X
> > has been written to, at which point you need to revoke any local leases,
> > invalidate the pagecache, and invalidate the iomapping cache to force
> > the client to requery the server.
> >
> > Or if your fuse server wants to implement its own weird operations (e.g.
> > XFS EXCHANGE-RANGE) this would make that possible without needing to
> > add a bunch of code to fs/fuse/ for the benefit of a single fuse driver.
>
> Wouldn't existing invalidation framework be sufficient?
>
> Thanks,
> Miklos
My current thinking is that Darrick's use case doesn't need GET_DAXDEV, but
famfs does. I think Darrick's use case has one backing device, and that should
be passed in at mount time. Correct me if you think that might be wrong.
Famfs doesn't necessarily have just one backing dev, which means that famfs
could pass in the *primary* backing dev at mount time, but it would still
need GET_DAXDEV to get the rest. But if I just use GET_FMAP every time, I
only need one way to do this.
I'll add a few more responses to Darrick's reply...
Thanks,
John
[1] https://github.com/cxl-micron-reskit/famfs-linux/blob/c57553c4ca91f0634f137285840ab25be8a87c30/fs/fuse/famfs_kfmap.h#L13
Powered by blists - more mailing lists