[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8ac3a2fd-1c41-493a-b6a0-a5f53afb49e1@linux.alibaba.com>
Date: Thu, 25 Jan 2024 23:22:31 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: David Howells <dhowells@...hat.com>
Cc: Jeff Layton <jlayton@...nel.org>, Christian Brauner <brauner@...nel.org>,
Matthew Wilcox <willy@...radead.org>, Eric Sandeen <esandeen@...hat.com>,
v9fs@...ts.linux.dev, linux-afs@...ts.infradead.org,
ceph-devel@...r.kernel.org, linux-cifs@...r.kernel.org,
samba-technical@...ts.samba.org, linux-nfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Roadmap for netfslib and local caching (cachefiles)
Hi David,
On 2024/1/25 22:02, David Howells wrote:
> Here's a roadmap for the future development of netfslib and local caching
> (e.g. cachefiles).
Thanks for writing this detailed email. And congrats to you work.
I only comment the parts directly related to myself.
>
..
>
>
> Local Caching
> =============
>
> There are a number of things I want to look at with local caching:
>
> [>] Although cachefiles has switched from using bmap to using SEEK_HOLE and
> SEEK_DATA, this isn't sufficient as we cannot rely on the backing filesystem
> optimising things and introducing both false positives and false negatives.
> Cachefiles needs to track the presence/absence of data for itself.
Yes, that is indeed an issue that needs to resolve and already discussed
before.
>
> I had a partially-implemented solution that stores a block bitmap in an xattr,
> but that only worked up to files of 1G in size (with bits representing 256K
> blocks in a 512-byte bitmap).
Jingbo once had an approach to use external bitmap files and
extended-attribute pointers inside the cache files:
https://listman.redhat.com/archives/linux-cachefs/2022-August/007050.html
I'm not quite sure the performance was but if it's worth trying or comparing,
that might be useful though.
>
> [>] An alternative cache format might prove more fruitful. Various AFS
> implementations use a 'tagged cache' format with an index file and a bunch of
> small files each of which contains a single block (typically 256K in OpenAFS).
>
> This would offer some advantages over the current approach:
>
> - it can handle entry reuse within the index
> - doesn't require an external culling process
> - doesn't need to truncate/reallocate when invalidating
>
> There are some downsides, including:
>
> - each block is in a separate file
Not quite sure, yet accessing too many small files might be another issue
which is currently happening with AI training workloads.. but as you said,
it's worth trying.
> - metadata coherency is more tricky - a powercut may require a cache wipe
> - the index key is highly variable in size if used for multiple filesystems
>
> But OpenAFS has been using this for something like 30 years, so it's probably
> worth a try.
Yes, also configurable chunk sizes per blob are much helpful.
Thanks,
Gao Xiang
>
> [>] Need to work out some way to store xattrs, directory entries and inode
> metadata efficiently.
>
> [>] Using NVRAM as the cache rather than spinning rust.
>
> [>] Support for disconnected operation to pin desirable data and keep
> track of changes.
>
> [>] A user API by which the cache for specific files or volumes can be
> flushed.
>
>
> Disconnected Operation
> ======================
>
> I'm working towards providing support for disconnected operation, so that,
> provided you've got your working set pinned in the cache, you can continue to
> work on your network-provided files when the network goes away and resync the
> changes later.
>
> This is going to require a number of things:
>
> (1) A user API by which files can be preloaded into the cache and pinned.
>
> (2) The ability to track changes in the cache.
>
> (3) A way to synchronise changes on reconnection.
>
> (4) A way to communicate to the user when there's a conflict with a third
> party change on reconnect. This might involve communicating via systemd
> to the desktop environment to ask the user to indicate how they'd like
> conflicts recolved.
>
> (5) A way to prompt the user to re-enter their authentication/crypto keys.
>
> (6) A way to ask the user how to handle a process that wants to access data
> we don't have (error/wait) - and how to handle the DE getting stuck in
> this fashion.
>
> David
Powered by blists - more mailing lists