linux-kernel - Re: Roadmap for netfslib and local caching (cachefiles)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b18ba6299d7cf54a96a3aa6641b9f883efb8bd2.camel@kernel.org>
Date: Thu, 25 Jan 2024 11:29:15 -0500
From: Jeff Layton <jlayton@...nel.org>
To: David Howells <dhowells@...hat.com>, Gao Xiang <xiang@...nel.org>
Cc: Christian Brauner <brauner@...nel.org>, Matthew Wilcox
 <willy@...radead.org>,  Eric Sandeen <esandeen@...hat.com>,
 v9fs@...ts.linux.dev, linux-afs@...ts.infradead.org, 
 ceph-devel@...r.kernel.org, linux-cifs@...r.kernel.org, 
 samba-technical@...ts.samba.org, linux-nfs@...r.kernel.org, 
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Roadmap for netfslib and local caching (cachefiles)

On Thu, 2024-01-25 at 14:02 +0000, David Howells wrote:
> Here's a roadmap for the future development of netfslib and local caching
> (e.g. cachefiles).
> 
> Netfslib
> ========
> 
> [>] Current state:
> 
> The netfslib write helpers have gone upstream now and are in v6.8-rc1, with
> both the 9p and afs filesystems using them.  This provides larger I/O size
> support to 9p and write-streaming and DIO support to afs.
> 
> The helpers provide their own version of generic_perform_write() that:
> 
>  (1) doesn't use ->write_begin() and ->write_end() at all, completely taking
>      over all of of the buffered I/O operations, including writeback.
> 
>  (2) can perform write-through caching, setting up one or more write
>      operations and adding folios to them as we copy data into the pagecache
>      and then starting them as we finish.  This is then used for O_SYNC and
>      O_DSYNC and can be used with immediate-write caching modes in, say, cifs.
> 
> Filesystems using this then deal with iov_iters and ideally would not deal
> pages or folios at all - except incidentally where a wrapper is necessary.
> 
> 
> [>] Aims for the next merge window:
> 
> Convert cifs to use netfslib.  This is now in Steve French's for-next branch.
> 
> Implement content crypto and bounce buffering.  I have patches to do this, but
> it would only be used by ceph (see below).
> 
> Make libceph and rbd use iov_iters rather than referring to pages and folios
> as much as possible.  This is mostly done and rbd works - but there's one bit
> in rbd that still needs doing.
> 
> Convert ceph to use netfslib.  This is about half done, but there are some
> wibbly bits in the ceph RPCs that I'm not sure I fully grasp.  I'm not sure
> I'll quite manage this and it might get bumped.
> 
> Finally, change netfslib so that it uses ->writepages() to write data to the
> cache, even data on clean pages just read from the server.  I have a patch to
> do this, but I need to move cifs and ceph over first.  This means that
> netfslib, 9p, afs, cifs and ceph will no longer use PG_private_2 (aka
> PG_fscache) and Willy can have it back - he just then has to wrest control
> from NFS and btrfs.
> 
> 
> [>] Aims for future merge windows:
> 
> Using a larger chunk size than PAGE_SIZE - for instance 256KiB - but that
> might require fiddling with the VM readahead code to avoid read/read races.
> 
> Cache AFS directories - there are just files and currently are downloaded and
> parsed locally for readdir and lookup.
> 
> Cache directories from other filesystems.
> 
> Cache inode metadata, xattrs.
> 
> Add support for fallocate().
> 
> Implement content crypto in other filesystems, such as cifs which has its own
> non-fscrypt way of doing this.
> 
> Support for data transport compression.
> 
> Disconnected operation.
> 
> NFS.  NFS at the very least needs to be altered to give up the use of
> PG_private_2.
> 
> 
> Local Caching
> =============
> 
> There are a number of things I want to look at with local caching:
> 
> [>] Although cachefiles has switched from using bmap to using SEEK_HOLE and
> SEEK_DATA, this isn't sufficient as we cannot rely on the backing filesystem
> optimising things and introducing both false positives and false negatives.
> Cachefiles needs to track the presence/absence of data for itself.
> 
> I had a partially-implemented solution that stores a block bitmap in an xattr,
> but that only worked up to files of 1G in size (with bits representing 256K
> blocks in a 512-byte bitmap).
> 
> [>] An alternative cache format might prove more fruitful.  Various AFS
> implementations use a 'tagged cache' format with an index file and a bunch of
> small files each of which contains a single block (typically 256K in OpenAFS).
> 
> This would offer some advantages over the current approach:
> 
>  - it can handle entry reuse within the index
>  - doesn't require an external culling process
>  - doesn't need to truncate/reallocate when invalidating
> 
> There are some downsides, including:
> 
>  - each block is in a separate file
>  - metadata coherency is more tricky - a powercut may require a cache wipe
>  - the index key is highly variable in size if used for multiple filesystems
> 
> But OpenAFS has been using this for something like 30 years, so it's probably
> worth a try.
> 
> [>] Need to work out some way to store xattrs, directory entries and inode
> metadata efficiently.
> 
> [>] Using NVRAM as the cache rather than spinning rust.
> 
> [>] Support for disconnected operation to pin desirable data and keep
> track of changes.
> 
> [>] A user API by which the cache for specific files or volumes can be
> flushed.
> 
> 
> Disconnected Operation
> ======================
> 
> I'm working towards providing support for disconnected operation, so that,
> provided you've got your working set pinned in the cache, you can continue to
> work on your network-provided files when the network goes away and resync the
> changes later.
> 
> This is going to require a number of things:
> 
>  (1) A user API by which files can be preloaded into the cache and pinned.
> 
>  (2) The ability to track changes in the cache.
> 
>  (3) A way to synchronise changes on reconnection.
> 
>  (4) A way to communicate to the user when there's a conflict with a third
>      party change on reconnect.  This might involve communicating via systemd
>      to the desktop environment to ask the user to indicate how they'd like
>      conflicts recolved.
> 
>  (5) A way to prompt the user to re-enter their authentication/crypto keys.
> 
>  (6) A way to ask the user how to handle a process that wants to access data
>      we don't have (error/wait) - and how to handle the DE getting stuck in
>      this fashion.
> 
> David
> 

This is all great stuff, David! Would it be reasonable to request a slot
to talk about the state of all of this at LSF/MM in May?

-- 
Jeff Layton <jlayton@...nel.org>