lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3jwluwrqj6rwsxdsksfvdeo5uccgmnkh7rgefaeyxf2gu75344@ybhwncywkftx>
Date: Thu, 29 Feb 2024 16:16:33 -0600
From: John Groves <John@...ves.net>
To: Amir Goldstein <amir73il@...il.com>
Cc: John Groves <jgroves@...ron.com>, Jonathan Corbet <corbet@....net>, 
	Dan Williams <dan.j.williams@...el.com>, Vishal Verma <vishal.l.verma@...el.com>, 
	Dave Jiang <dave.jiang@...el.com>, Alexander Viro <viro@...iv.linux.org.uk>, 
	Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, Matthew Wilcox <willy@...radead.org>, 
	linux-cxl@...r.kernel.org, linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev, john@...alactic.com, 
	Dave Chinner <david@...morbit.com>, Christoph Hellwig <hch@...radead.org>, 
	dave.hansen@...ux.intel.com, gregory.price@...verge.com, Miklos Szeredi <miklos@...redi.hu>, 
	Vivek Goyal <vgoyal@...hat.com>
Subject: Re: [RFC PATCH 00/20] Introduce the famfs shared-memory file system

On 24/02/29 08:52AM, Amir Goldstein wrote:
> On Fri, Feb 23, 2024 at 7:42 PM John Groves <John@...ves.net> wrote:
> >
> > This patch set introduces famfs[1] - a special-purpose fs-dax file system
> > for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> > CXL-specific in anyway way.
> >
> > * Famfs creates a simple access method for storing and sharing data in
> >   sharable memory. The memory is exposed and accessed as memory-mappable
> >   dax files.
> > * Famfs supports multiple hosts mounting the same file system from the
> >   same memory (something existing fs-dax file systems don't do).
> > * A famfs file system can be created on either a /dev/pmem device in fs-dax
> >   mode, or a /dev/dax device in devdax mode (the latter depending on
> >   patches 2-6 of this series).
> >
> > The famfs kernel file system is part the famfs framework; additional
> > components in user space[2] handle metadata and direct the famfs kernel
> > module to instantiate files that map to specific memory. The famfs user
> > space has documentation and a reasonably thorough test suite.
> >
> 
> So can we say that Famfs is Fuse specialized for DAX?
> 
> I am asking because you seem to have asked it first:
> https://lore.kernel.org/linux-fsdevel/0100018b2439ebf3-a442db6f-f685-4bc4-b4b0-28dc333f6712-000000@email.amazonses.com/
> I guess that you did not get your answers to your questions before or at LPC?

Thanks for paying attention Amir. I think there is some validity to thinking
of famfs as Fuse for DAX. Administration / metadata originating in user space
is similar (but doing it this way also helps reduce RAS exposure to memory 
that might have a more complex connection path).

One way it differs from fuse is that famfs is very much aimed at use
cases that require performance. *Accessing* files must run at full
memory speeds.

> 
> I did not see your question back in October.
> Let me try to answer your questions and we can discuss later if a new dedicated
> kernel driver + userspace API is really needed, or if FUSE could be used as is
> extended for your needs.
> 
> You wrote:
> "...My naive reading of the existence of some sort of fuse/dax support
> for virtiofs
> suggested that there might be a way of doing this - but I may be wrong
> about that."
> 
> I'm not virtiofs expert, but I don't think that you are wrong about this.
> IIUC, virtiofsd could map arbitrary memory region to any fuse file mmaped
> by virtiofs client.
> 
> So what are the gaps between virtiofs and famfs that justify a new filesystem
> driver and new userspace API?

I have a lot of thoughts here, and an actual conversation might be good
sooner rather than later. I hope to be at LSFMM to discuss this - if you agree,
put in a vote for my topic ;). But if you want to talk sooner than that, I'm
interested.

I think one piece of evidence that this isn't possible with Fuse today is that
I had to plumb the iomap interface for /dev/dax in this patch set. That is the
way that fs-dax file systems communicate with the dax layer for fault 
resolution. If fuse/virtiofs handles dax somehow without the iomap interface,
I suspect it's doing something somehow simpler, /and/ that might need to get 
reconciled with the fs-dax methodology. Or maybe I don't know what I'm talking
about (in which case, please help :D).

I think one thing that might make sense would be to bring up this functionality
as a standalone file system, and then consider merging it into fuse when &
if the time seems right. 

Famfs doesn't currently have any up-calls. User space plays the log and tells
the kmod to instantiate files with extent lists to dax. Access happens with
zero user space involvement.

The important thing, the thing I'm currently paid for, is making it
practical to use disaggregated shared memory - it's ultimately not important 
which mechanism is used to enable a filesystem access method for memory.

But caching metadata in the kernel for efficient fault handling is the
only way to get it to perform at "memory speeds" so that appears critical.

One final observation: famfs has significantly more code in user space than
in kernel space, and it's the user side that is likely to grow over time.
That logic is at least theoretically independent of the kernel ABI.

> 
> Thanks,
> Amir.

Thanks!
John


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ