lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 21 May 2024 21:05:18 -0500
From: John Groves <John@...ves.net>
To: Amir Goldstein <amir73il@...il.com>
Cc: Miklos Szeredi <miklos@...redi.hu>, John Groves <jgroves@...ron.com>, 
	Jonathan Corbet <corbet@....net>, Dan Williams <dan.j.williams@...el.com>, 
	Vishal Verma <vishal.l.verma@...el.com>, Dave Jiang <dave.jiang@...el.com>, 
	Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, 
	Matthew Wilcox <willy@...radead.org>, linux-cxl@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev, 
	john@...alactic.com, Dave Chinner <david@...morbit.com>, 
	Christoph Hellwig <hch@...radead.org>, dave.hansen@...ux.intel.com, gregory.price@...verge.com, 
	Vivek Goyal <vgoyal@...hat.com>
Subject: Re: [RFC PATCH 00/20] Introduce the famfs shared-memory file system

Initial reply to both Amir and Miklos. Sorry for the delay - I took a few
days off after LSFMM and I'm just re-engaging now.

First an observation: these messages are on the famfs v1 patch set thread.
The v2 patch set is at [1]. That is also the default branch now if you clone
the famfs kernel from [2].

Among the biggest changes at v2 is dropping /dev/pmem support and only 
supporting /dev/dax (character) devices as backing devs for famfs.

On 24/05/19 08:59AM, Amir Goldstein wrote:
> On Fri, May 17, 2024 at 12:55 PM Miklos Szeredi <miklos@...redi.hu> wrote:
> >
> > On Thu, 29 Feb 2024 at 07:52, Amir Goldstein <amir73il@...il.com> wrote:
> >
> > > I'm not virtiofs expert, but I don't think that you are wrong about this.
> > > IIUC, virtiofsd could map arbitrary memory region to any fuse file mmaped
> > > by virtiofs client.
> > >
> > > So what are the gaps between virtiofs and famfs that justify a new filesystem
> > > driver and new userspace API?
> >
> > Let me try to fill in some gaps.  I've looked at the famfs driver
> > (even tried to set it up in a VM, but got stuck with the EFI stuff).

I'm happy to help with that if you care - ping me if so; getting a VM running 
in EFI mode is not necessary if you reserve the dax memory via memmap=, or
via libvirt xml.

> >
> > - famfs has an extent list per file that indicates how each page
> > within the file should be mapped onto the dax device, IOW it has the
> > following mapping:
> >
> >   [famfs file, offset] -> [offset, length]

More generally, a famfs file extent is [daxdev, offset, len]; there may
be multiple extents per file, and in the future this definitely needs to
generalize to multiple daxdev's.

Disclaimer: I'm still coming up to speed on fuse (slowly and ignorantly, 
I think)...

A single backing device (daxdev) will contain extents of many famfs
files (plus metadata - currently a superblock and a log). I'm not sure
it's realistic to have a backing daxdev "open" per famfs file. 

In addition there is:

- struct dax_holder_operations - to allow a notify_failure() upcall
  from dax. This provides the critical capability to shut down famfs
  if there are memory errors. This is filesystem- (or technically daxdev-
  wide)

- The pmem or devdax iomap_ops - to allow the fsdax file system (famfs,
  and [soon] famfs_fuse) to call dax_iomap_rw() and dax_iomap_fault().
  I strongly suspect that famfs_fuse can't be correct unless it uses
  this path rather than just the idea of a single backing file.
  This interface explicitly supports files that map to disjoint ranges
  of one or more dax devices.

- the dev_dax_iomap portion of the famfs patchsets adds iomap_ops to
  character devdax.

- Note that dax devices, unlike files, don't support read/write - only
  mmap(). I suspect (though I'm still pretty ignorant) that this means
  we can't just treat the dax device as an extent-based backing file.


> >
> > - fuse can currently map a fuse file onto a backing file:
> >
> >   [fuse file] -> [backing file]
> >
> > The interface for the latter is
> >
> >    backing_id = ioctl(dev_fuse_fd, FUSE_DEV_IOC_BACKING_OPEN, backing_map);
> > ...
> >    fuse_open_out.flags |= FOPEN_PASSTHROUGH;
> >    fuse_open_out.backing_id = backing_id;
> 
> FYI, library and example code was recently merged to libfuse:
> https://github.com/libfuse/libfuse/pull/919
> 
> >
> > This looks suitable for doing the famfs file - > dax device mapping as
> > well.  I wouldn't extend the ioctl with extent information, since
> > famfs can just use FUSE_DEV_IOC_BACKING_OPEN once to register the dax
> > device.  The flags field could be used to tell the kernel to treat
> > this fd as a dax device instead of a a regular file.

A dax device to famfs is a lot more like a backing device for a "filesystem"
than a backing file for another file. And, as previously mentioned, there
is the iomap_ops interface and the holder_ops interface that deal with
multiple file tenants on a dax device (plus error notification, 
respectively)

Probably doable, but important distinctions...

> >
> > Letter, when the file is opened the extent list could be sent in the
> > open reply together with the backing id.  The fuse_ext_header
> > mechanism seems suitable for this.
> >
> > And I think that's it as far as API's are concerned.
> >
> > Note: this is already more generic than the current famfs prototype,
> > since multiple dax devices could be used as backing for famfs files,
> > with the constraint that a single file can only map data from a single
> > dax device.
> >
> > As for implementing dax passthrough, I think that needs a separate
> > source file, the one used by virtiofs (fs/fuse/dax.c) does not appear
> > to have many commonalities with this one.  That could be renamed to
> > virtiofs_dax.c as it's pretty much virtiofs specific, AFAICT.
> >
> > Comments?
> 
> Would probably also need to decouple CONFIG_FUSE_DAX
> from CONFIG_FUSE_VIRTIO_DAX.
> 
> What about fc->dax_mode (i.e. dax= mount option)?
> 
> What about FUSE_IS_DAX()? does it apply to both dax implementations?
> 
> Sounds like a decent plan.
> John, let us know if you need help understanding the details.

I'm certain I will need some help, but I'll try to do my part. 

First question: can you suggest an example fuse file pass-through
file system that I might use as a jumping-off point? Something that
gets the basic pass-through capability from which to start hacking
in famfs/dax capabilities?

When I started on famfs, I used ramfs because it got me all the basic
file system functionality minus a backing store. Then I built the dax
functionality by referring to xfs. 

> 
> > Am I missing something significant?
> 
> Would we need to set IS_DAX() on inode init time or can we set it
> later on first file open?
> 
> Currently, iomodes enforces that all opens are either
> mapped to same backing file or none mapped to backing file:
> 
> fuse_inode_uncached_io_start()
> {
> ...
>         /* deny conflicting backing files on same fuse inode */
> 
> The iomodes rules will need to be amended to verify that:
> - IS_DAX() inode open is always mapped to backing dax device
> - All files of the same fuse inode are mapped to the same range
>   of backing file/dax device.

I'm confused by the last item. I would think there would be a fuse
inode per famfs file, and that multiple of those would map to separate
extent lists of one or more backing dax devices.

Or maybe I misunderstand the meaning of "fuse inode". Feel free to
assign reading...

> 
> Thanks,
> Amir.

Thanks Miklos and Amir,
John

[1] https://lore.kernel.org/linux-fsdevel/cover.1714409084.git.john@groves.net/T/#m3b11e8d311eca80763c7d6f27d43efd1cdba628b
[2] https://github.com/cxl-micron-reskit/famfs-linux



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ