[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJfpegv6wHOniQE6dgGymq4h1430oc2EyV3OQ2S9DqA20nZZUQ@mail.gmail.com>
Date: Thu, 14 Aug 2025 15:36:26 +0200
From: Miklos Szeredi <miklos@...redi.hu>
To: John Groves <John@...ves.net>
Cc: Dan Williams <dan.j.williams@...el.com>, Miklos Szeredi <miklos@...redb.hu>,
Bernd Schubert <bschubert@....com>, John Groves <jgroves@...ron.com>, Jonathan Corbet <corbet@....net>,
Vishal Verma <vishal.l.verma@...el.com>, Dave Jiang <dave.jiang@...el.com>,
Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>,
"Darrick J . Wong" <djwong@...nel.org>, Randy Dunlap <rdunlap@...radead.org>,
Jeff Layton <jlayton@...nel.org>, Kent Overstreet <kent.overstreet@...ux.dev>,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
nvdimm@...ts.linux.dev, linux-cxl@...r.kernel.org,
linux-fsdevel@...r.kernel.org, Amir Goldstein <amir73il@...il.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>, Stefan Hajnoczi <shajnocz@...hat.com>,
Joanne Koong <joannelkoong@...il.com>, Josef Bacik <josef@...icpanda.com>,
Aravind Ramesh <arramesh@...ron.com>, Ajay Joshi <ajayjoshi@...ron.com>
Subject: Re: [RFC V2 12/18] famfs_fuse: Plumb the GET_FMAP message/response
On Thu, 3 Jul 2025 at 20:54, John Groves <John@...ves.net> wrote:
>
> Upon completion of an OPEN, if we're in famfs-mode we do a GET_FMAP to
> retrieve and cache up the file-to-dax map in the kernel. If this
> succeeds, read/write/mmap are resolved direct-to-dax with no upcalls.
Nothing to do at this time unless you want a side project: doing this
with compound requests would save a roundtrip (OPEN + GET_FMAP in one
go).
> GET_FMAP has a variable-size response payload, and the allocated size
> is sent in the in_args[0].size field. If the fmap would overflow the
> message, the fuse server sends a reply of size 'sizeof(uint32_t)' which
> specifies the size of the fmap message. Then the kernel can realloc a
> large enough buffer and try again.
There is a better way to do this: the allocation can happen when we
get the response. Just need to add infrastructure to dev.c.
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index 6c384640c79b..dff5aa62543e 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -654,6 +654,10 @@ enum fuse_opcode {
> FUSE_TMPFILE = 51,
> FUSE_STATX = 52,
>
> + /* Famfs / devdax opcodes */
> + FUSE_GET_FMAP = 53,
> + FUSE_GET_DAXDEV = 54,
Introduced too early.
> +
> /* CUSE specific operations */
> CUSE_INIT = 4096,
>
> @@ -888,6 +892,16 @@ struct fuse_access_in {
> uint32_t padding;
> };
>
> +struct fuse_get_fmap_in {
> + uint32_t size;
> + uint32_t padding;
> +};
As noted, passing size to server really makes no sense. I'd just omit
fuse_get_fmap_in completely.
> +
> +struct fuse_get_fmap_out {
> + uint32_t size;
> + uint32_t padding;
> +};
> +
> struct fuse_init_in {
> uint32_t major;
> uint32_t minor;
> @@ -1284,4 +1298,8 @@ struct fuse_uring_cmd_req {
> uint8_t padding[6];
> };
>
> +/* Famfs fmap message components */
> +
> +#define FAMFS_FMAP_MAX 32768 /* Largest supported fmap message */
> +
Hmm, Darrick's interface gets one extents at a time. This one tries
to get the whole map in one go.
The single extent thing can be inefficient even for plain block fs, so
it would be nice to get multiple extents. The whole map has an
artificial limit that currently may seem sufficient but down the line
could cause pain.
I'm still hoping some common ground would benefit both interfaces.
Just not sure what it should be.
Thanks,
Miklos
Powered by blists - more mailing lists