lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <xkr6i7r2vntkl2tigssvmnveepgdipwxewmzdm2xptmsct2odz@eyepa76aepsl>
Date: Thu, 14 Aug 2025 19:38:36 -0500
From: John Groves <John@...ves.net>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: Dan Williams <dan.j.williams@...el.com>, 
	Miklos Szeredi <miklos@...redb.hu>, Bernd Schubert <bschubert@....com>, 
	John Groves <jgroves@...ron.com>, Jonathan Corbet <corbet@....net>, 
	Vishal Verma <vishal.l.verma@...el.com>, Dave Jiang <dave.jiang@...el.com>, 
	Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>, 
	Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, 
	"Darrick J . Wong" <djwong@...nel.org>, Randy Dunlap <rdunlap@...radead.org>, 
	Jeff Layton <jlayton@...nel.org>, Kent Overstreet <kent.overstreet@...ux.dev>, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev, 
	linux-cxl@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	Amir Goldstein <amir73il@...il.com>, Jonathan Cameron <Jonathan.Cameron@...wei.com>, 
	Stefan Hajnoczi <shajnocz@...hat.com>, Joanne Koong <joannelkoong@...il.com>, 
	Josef Bacik <josef@...icpanda.com>, Aravind Ramesh <arramesh@...ron.com>, 
	Ajay Joshi <ajayjoshi@...ron.com>
Subject: Re: [RFC V2 12/18] famfs_fuse: Plumb the GET_FMAP message/response

On 25/08/14 03:36PM, Miklos Szeredi wrote:
> On Thu, 3 Jul 2025 at 20:54, John Groves <John@...ves.net> wrote:
> >
> > Upon completion of an OPEN, if we're in famfs-mode we do a GET_FMAP to
> > retrieve and cache up the file-to-dax map in the kernel. If this
> > succeeds, read/write/mmap are resolved direct-to-dax with no upcalls.
> 
> Nothing to do at this time unless you want a side project:  doing this
> with compound requests would save a roundtrip (OPEN + GET_FMAP in one
> go).

I'm thinking that's an opportunity for improvement after the basic mechanism
is in ;)

> 
> > GET_FMAP has a variable-size response payload, and the allocated size
> > is sent in the in_args[0].size field. If the fmap would overflow the
> > message, the fuse server sends a reply of size 'sizeof(uint32_t)' which
> > specifies the size of the fmap message. Then the kernel can realloc a
> > large enough buffer and try again.
> 
> There is a better way to do this: the allocation can happen when we
> get the response.  Just need to add infrastructure to dev.c.

OK, makes sense. Will take a run at this. Might drop back and go with a hard
limit and relax it later. Famfs fmaps won't grow unbounded near term...

> 
> > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> > index 6c384640c79b..dff5aa62543e 100644
> > --- a/include/uapi/linux/fuse.h
> > +++ b/include/uapi/linux/fuse.h
> > @@ -654,6 +654,10 @@ enum fuse_opcode {
> >         FUSE_TMPFILE            = 51,
> >         FUSE_STATX              = 52,
> >
> > +       /* Famfs / devdax opcodes */
> > +       FUSE_GET_FMAP           = 53,
> > +       FUSE_GET_DAXDEV         = 54,
> 
> Introduced too early.

You mean FUSE_GET_DAXDEV I presume (which is not used until 2 patches later? 
Right, will fix.

> 
> > +
> >         /* CUSE specific operations */
> >         CUSE_INIT               = 4096,
> >
> > @@ -888,6 +892,16 @@ struct fuse_access_in {
> >         uint32_t        padding;
> >  };
> >
> > +struct fuse_get_fmap_in {
> > +       uint32_t        size;
> > +       uint32_t        padding;
> > +};
> 
> As noted, passing size to server really makes no sense.  I'd just omit
> fuse_get_fmap_in completely.

OK, I think I understand; Will rework in v3.

Same idea as "better way" above...

> 
> > +
> > +struct fuse_get_fmap_out {
> > +       uint32_t        size;
> > +       uint32_t        padding;
> > +};
> > +
> >  struct fuse_init_in {
> >         uint32_t        major;
> >         uint32_t        minor;
> > @@ -1284,4 +1298,8 @@ struct fuse_uring_cmd_req {
> >         uint8_t padding[6];
> >  };
> >
> > +/* Famfs fmap message components */
> > +
> > +#define FAMFS_FMAP_MAX 32768 /* Largest supported fmap message */
> > +
> 
> Hmm, Darrick's interface gets one extents at a time.   This one tries
> to get the whole map in one go.
> 
> The single extent thing can be inefficient even for plain block fs, so
> it would be nice to get multiple extents.  The whole map has an
> artificial limit that currently may seem sufficient but down the line
> could cause pain.
> 
> I'm still hoping some common ground would benefit both interfaces.
> Just not sure what it should be.
> 
> Thanks,
> Miklos

At one point Darrick and I discussed retrieving a [file: offset, length] range 
of extents (i.e. request describes what it wants, and reply describes what 
range of the file it covers). I'm not sure it will make sense for famfs to 
retrieve anything but the whole file's map, but I know it might in Darrick's 
case.

I could imagine an update to GET_FMAP (possibly with a differnet name) that 
requests an offset range, and then receives a (possibly different) range that 
is intended to match or exceed the requested range.

It seems like we might be able to share the same command to retrieve extents, 
provided the response starts with a header that allows us to have separate 
(and presumably extensible) payload formats. No doubt Darrick will have 
thoughts on this :D

I don't think we can merge our "fmap" formats; famfs uses either short
extent lists or a format that is efficient for repeating interleave patterns,
and wants to cache the entire fmap.  ...which is not likely to match Darrick's 
pattern, but we might be able to share the same retrieval message/response.

Thanks!
John


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ