[<prev] [next>] [day] [month] [year] [list]
Message-ID: <2522190.1612544534@warthog.procyon.org.uk>
Date: Fri, 05 Feb 2021 17:02:14 +0000
From: David Howells <dhowells@...hat.com>
To: torvalds@...ux-foundation.org
cc: dhowells@...hat.com, Matthew Wilcox <willy@...radead.org>,
Anna Schumaker <anna.schumaker@...app.com>,
Trond Myklebust <trondmy@...merspace.com>,
Steve French <sfrench@...ba.org>,
Dominique Martinet <asmadeus@...ewreck.org>,
Jeff Layton <jlayton@...hat.com>,
David Wysochanski <dwysocha@...hat.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
ceph-devel@...r.kernel.org, linux-afs@...ts.infradead.org,
linux-cachefs@...hat.com, linux-cifs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-nfs@...r.kernel.org,
v9fs-developer@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Upcoming for next merge window: fscache and netfs lib
Hi Linus,
To apprise you in advance, I'm intending to submit a pull request for a
partial modernisation of the fscache I/O subsystem, which can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-next
The main parts of it are:
(1) Institute a helper library for network filesystems. The first stage of
this handles ->readpage(), ->readahead() and part of ->write_begin() on
behalf of the netfs, requiring the netfs to provide a vector to perform a
read to some part of an inode.
This allows handling of the following to be (at least partially) moved
out of all the network filesystems and consolidated in one place:
- changes in VM vectors (Matthew Wilcox's Grand Plans™;-)
- transparent huge page support
- shaping of reads
- readahead expansion
- fs alignment/granularity (ceph, pnfs)
- cache alignment/granularity
- slicing of reads
- rsize
- keeping multiple read in flight } Steve French would like
- multichannel distribution } but for the future
- multiserver distribution (ceph, pnfs)
- stitching together reads from the cache and reads from the network
- saving data read from the server into the cache
- retry/reissue handling
- fallback after cache failure
- short reads
- fscrypt data decryption (Jeff Layton is considering for the future)
(2) Add an alternate cache I/O API for use with the netfs lib that makes use
of kiocbs in the cache to do direct I/O between the cache files and the
netfs pages.
This is intended to replace the current I/O API that calls the backing fs
readpage op and than snooping the wait queues for completion to read and
using vfs_write() to write. It wasn't possible to do in-kernel DIO when
I first wrote cachefiles - and this makes it a lot simpler and more
robust (and uses a lot less memory).
(3) Add an ITER_XARRAY iov_iter that allows I/O iteration to be done on an
xarray of pinned pages (such as inode->i_mapping->i_pages), thereby
avoiding the need to allocate a bvec array to represent this.
This is used to present a set of netfs pages to the cache to do DIO on
and is also used by afs to present netfs pages to sendmsg. It could also
be used by unencrypted cifs to pass the pages to the TCP socket it uses
(if it's doing TCP) and my patch for 9p (which isn't included here) can
make use of it.
(4) Make afs use the above. It passes the same xfstests (and has the same
failures) as the unpatched afs client.
(5) Make ceph use the above (I've merged a branch from Jeff Layton for this).
This also passes xfstests.
Dave Wysochanski has a patch series for nfs. Normal nfs works fine and passes
various tests, but it turned out pnfs has a problem - pnfs does splitting of
requests itself and sending them to various places, but it needs to cooperate
more closely with netfs over this. He's working on this.
I've given Dominique Martinet a patch for 9p and Steve French a partial patch
for cifs, but neither of those is going to be ready this merge window either.
-~-
Assuming you're willing to take this, should I submit one pull request for the
combined lot, or should I break it up into smaller requests (say with a
separate request from Jeff for the ceph stuff).
If we can get the netfs lib in this merge window, that simplifies dealing with
nfs and cifs particularly as the changes specific to those can go through the
maintainer trees.
Thanks,
David
Powered by blists - more mailing lists