lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250421013346.32530-1-john@groves.net>
Date: Sun, 20 Apr 2025 20:33:27 -0500
From: John Groves <John@...ves.net>
To: John Groves <John@...ves.net>,
	Dan Williams <dan.j.williams@...el.com>,
	Miklos Szeredi <miklos@...redb.hu>,
	Bernd Schubert <bschubert@....com>
Cc: John Groves <jgroves@...ron.com>,
	Jonathan Corbet <corbet@....net>,
	Vishal Verma <vishal.l.verma@...el.com>,
	Dave Jiang <dave.jiang@...el.com>,
	Matthew Wilcox <willy@...radead.org>,
	Jan Kara <jack@...e.cz>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Christian Brauner <brauner@...nel.org>,
	"Darrick J . Wong" <djwong@...nel.org>,
	Luis Henriques <luis@...lia.com>,
	Randy Dunlap <rdunlap@...radead.org>,
	Jeff Layton <jlayton@...nel.org>,
	Kent Overstreet <kent.overstreet@...ux.dev>,
	Petr Vorel <pvorel@...e.cz>,
	Brian Foster <bfoster@...hat.com>,
	linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	nvdimm@...ts.linux.dev,
	linux-cxl@...r.kernel.org,
	linux-fsdevel@...r.kernel.org,
	Amir Goldstein <amir73il@...il.com>,
	Jonathan Cameron <Jonathan.Cameron@...wei.com>,
	Stefan Hajnoczi <shajnocz@...hat.com>,
	Joanne Koong <joannelkoong@...il.com>,
	Josef Bacik <josef@...icpanda.com>,
	Aravind Ramesh <arramesh@...ron.com>,
	Ajay Joshi <ajayjoshi@...ron.com>,
	John Groves <john@...ves.net>
Subject: [RFC PATCH 00/19] famfs: port into fuse

Subject: famfs: port into fuse

This is the initial RFC for the fabric-attached memory file system (famfs)
integration into fuse. In order to function, this requires a related patch
to libfuse [1] and the famfs user space [2]. 

This RFC is mainly intended to socialize the approach and get feedback from
the fuse developers and maintainers. There is some dax work that needs to
be done before this should be merged (see the "poisoned page|folio problem"
below).

This patch set fully works with Linux 6.14 -- passing all existing famfs
smoke and unit tests -- and I encourage existing famfs users to test it.

This is really two patch sets mashed up:

* The patches with the dev_dax_iomap: prefix fill in missing functionality for
  devdax to host an fs-dax file system.
* The famfs_fuse: patches add famfs into fs/fuse/. These are effectively
  unchanged since last year.

Because this is not ready to merge yet, I have felt free to leave some debug
prints in place because we still find them useful; those will be cleaned up
in a subsequent revision.

Famfs Overview

Famfs exposes shared memory as a file system. Famfs consumes shared memory
from dax devices, and provides memory-mappable files that map directly to
the memory - no page cache involvement. Famfs differs from conventional
file systems in fs-dax mode, in that it handles in-memory metadata in a
sharable way (which begins with never caching dirty shared metadata).

Famfs started as a standalone file system [3,4], but the consensus at LSFMM
2024 [5] was that it should be ported into fuse - and this RFC is the first
public evidence that I've been working on that.

The key performance requirement is that famfs must resolve mapping faults
without upcalls. This is achieved by fully caching the file-to-devdax
metadata for all active files. This is done via two fuse client/server
message/response pairs: GET_FMAP and GET_DAXDEV.

Famfs remains the first fs-dax file system that is backed by devdax rather
than pmem in fs-dax mode (hence the need for the dev_dax_iomap fixups).

Notes

* Once the dev_dax_iomap patches land, I suspect it may make sense for
  virtiofs to update to use the improved interface.

* I'm currently maintaining compatibility between the famfs user space and
  both the standalone famfs kernel file system and this new fuse
  implementation. In the near future I'll be running performance comparisons
  and sharing them - but there is no reason to expect significant degradation
  with fuse, since famfs caches entire "fmaps" in the kernel to resolve
  faults with no upcalls. This patch has a bit too much debug turned on to
  to that testing quite yet. A branch 

* Two new fuse messages / responses are added: GET_FMAP and GET_DAXDEV.

* When a file is looked up in a famfs mount, the LOOKUP is followed by a
  GET_FMAP message and response. The "fmap" is the full file-to-dax mapping,
  allowing the fuse/famfs kernel code to handle read/write/fault without any
  upcalls.

* After each GET_FMAP, the fmap is checked for extents that reference
  previously-unknown daxdevs. Each such occurence is handled with a
  GET_DAXDEV message and response.

* Daxdevs are stored in a table (which might become an xarray at some point).
  When entries are added to the table, we acquire exclusive access to the
  daxdev via the fs_dax_get() call (modeled after how fs-dax handles this
  with pmem devices). famfs provides holder_operations to devdax, providing
  a notification path in the event of memory errors.

* If devdax notifies famfs of memory errors on a dax device, famfs currently
  bocks all subsequent accesses to data on that device. The recovery is to
  re-initialize the memory and file system. Famfs is memory, not storage...

* Because famfs uses backing (devdax) devices, only privileged mounts are
  supported.

* The famfs kernel code never accesses the memory directly - it only
  facilitates read, write and mmap on behalf of user processes. As such,
  the RAS of the shared memory affects applications, but not the kernel.

* Famfs has backing device(s), but they are devdax (char) rather than
  block. Right now there is no way to tell the vfs layer that famfs has a
  char backing device (unless we say it's block, but it's not). Currently
  we use the standard anonymous fuse fs_type - but I'm not sure that's
  ultimately optimal (thoughts?)

The "poisoned page|folio problem"

* Background: before doing a kernel mount, the famfs user space [2] validates
  the superblock and log. This is done via raw mmap of the primary devdax
  device. If valid, the file system is mounted, and the superblock and log
  get exposed through a pair of files (.meta/.superblock and .meta/.log) -
  because we can't be using raw device mmap when a file system is mounted
  on the device. But this exposes a devdax bug and warning...

* Pages that have been memory mapped via devdax are left in a permanently
  problematic state. Devdax sets page|folio->mapping when a page is accessed
  via raw devdax mmap (as famfs does before mount), but never cleans it up.
  When the pages of the famfs superblock and log are accessed via the "meta"
  files after mount, we see a WARN_ONCE() in dax_insert_entry(), which
  notices that page|folio->mapping is still set. I intend to address this
  prior to asking for the famfs patches to be merged.

* Alistair Popple's recent dax patch series [6], which has been merged
  for 6.15, addresses some dax issues, but sadly does not fix the poisoned
  page|folio problem - its enhanced refcount checking turns the warning into
  an error.

* This 6.14 patch set disables the warning; a proper fix will be required for
  famfs to work at all in 6.15. Dan W. and I are actively discussing how to do
  this properly...

* In terms of the correct functionality of famfs, the warning can be ignored.

References

[1] - https://github.com/libfuse/libfuse/pull/1200
[2] - https://github.com/cxl-micron-reskit/famfs
[3] - https://lore.kernel.org/linux-cxl/cover.1708709155.git.john@groves.net/
[4] - https://lore.kernel.org/linux-cxl/cover.1714409084.git.john@groves.net/
[5] - https://lwn.net/Articles/983105/
[6] - https://lore.kernel.org/linux-cxl/cover.8068ad144a7eea4a813670301f4d2a86a8e68ec4.1740713401.git-series.apopple@nvidia.com/


John Groves (19):
  dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c
  dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage
  dev_dax_iomap: Save the kva from memremap
  dev_dax_iomap: Add dax_operations for use by fs-dax on devdax
  dev_dax_iomap: export dax_dev_get()
  dev_dax_iomap: (ignore!) Drop poisoned page warning in fs/dax.c
  famfs_fuse: magic.h: Add famfs magic numbers
  famfs_fuse: Kconfig
  famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/
  famfs_fuse: Basic fuse kernel ABI enablement for famfs
  famfs_fuse: Basic famfs mount opts
  famfs_fuse: Plumb the GET_FMAP message/response
  famfs_fuse: Create files with famfs fmaps
  famfs_fuse: GET_DAXDEV message and daxdev_table
  famfs_fuse: Plumb dax iomap and fuse read/write/mmap
  famfs_fuse: Add holder_operations for dax notify_failure()
  famfs_fuse: Add famfs metadata documentation
  famfs_fuse: Add documentation
  famfs_fuse: (ignore) debug cruft

 Documentation/filesystems/famfs.rst |  142 ++++
 Documentation/filesystems/index.rst |    1 +
 MAINTAINERS                         |   10 +
 drivers/dax/Kconfig                 |    6 +
 drivers/dax/bus.c                   |  144 +++-
 drivers/dax/dax-private.h           |    1 +
 drivers/dax/device.c                |   38 +-
 drivers/dax/super.c                 |   33 +-
 fs/dax.c                            |    1 -
 fs/fuse/Kconfig                     |   13 +
 fs/fuse/Makefile                    |    4 +-
 fs/fuse/dev.c                       |   61 ++
 fs/fuse/dir.c                       |   74 +-
 fs/fuse/famfs.c                     | 1105 +++++++++++++++++++++++++++
 fs/fuse/famfs_kfmap.h               |  166 ++++
 fs/fuse/file.c                      |   27 +-
 fs/fuse/fuse_i.h                    |   67 +-
 fs/fuse/inode.c                     |   49 +-
 fs/fuse/iomode.c                    |    2 +-
 fs/namei.c                          |    1 +
 include/linux/dax.h                 |    6 +
 include/uapi/linux/fuse.h           |   63 ++
 include/uapi/linux/magic.h          |    2 +
 23 files changed, 1973 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/filesystems/famfs.rst
 create mode 100644 fs/fuse/famfs.c
 create mode 100644 fs/fuse/famfs_kfmap.h


base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557
-- 
2.49.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ