[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5c905e500499a07c5e4b0dcf9983b90e8746ed81.camel@kernel.org>
Date: Mon, 15 Apr 2024 08:49:39 -0400
From: Jeff Layton <jlayton@...nel.org>
To: David Howells <dhowells@...hat.com>, Christian Brauner
<christian@...uner.io>, Gao Xiang <hsiangkao@...ux.alibaba.com>, Dominique
Martinet <asmadeus@...ewreck.org>
Cc: Matthew Wilcox <willy@...radead.org>, Steve French <smfrench@...il.com>,
Marc Dionne <marc.dionne@...istor.com>, Paulo Alcantara
<pc@...guebit.com>, Shyam Prasad N <sprasad@...rosoft.com>, Tom Talpey
<tom@...pey.com>, Eric Van Hensbergen <ericvh@...nel.org>, Ilya Dryomov
<idryomov@...il.com>, netfs@...ts.linux.dev, linux-cachefs@...hat.com,
linux-afs@...ts.infradead.org, linux-cifs@...r.kernel.org,
linux-nfs@...r.kernel.org, ceph-devel@...r.kernel.org,
v9fs@...ts.linux.dev, linux-erofs@...ts.ozlabs.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use
->writepages() to copy to cache
On Thu, 2024-03-28 at 16:33 +0000, David Howells wrote:
> Hi Christian, Willy,
>
> The primary purpose of these patches is to rework the netfslib writeback
> implementation such that pages read from the cache are written to the cache
> through ->writepages(), thereby allowing the fscache page flag to be
> retired.
>
> The reworking also:
>
> (1) builds on top of the new writeback_iter() infrastructure;
>
> (2) makes it possible to use vectored write RPCs as discontiguous streams
> of pages can be accommodated;
>
> (3) makes it easier to do simultaneous content crypto and stream division.
>
> (4) provides support for retrying writes and re-dividing a stream;
>
> (5) replaces the ->launder_folio() op, so that ->writepages() is used
> instead;
>
> (6) uses mempools to allocate the netfs_io_request and netfs_io_subrequest
> structs to avoid allocation failure in the writeback path.
>
> Some code that uses the fscache page flag is retained for compatibility
> purposes with nfs and ceph. The code is switched to using the synonymous
> private_2 label instead and marked with deprecation comments. I have a
> separate set of patches that convert cifs to use this code.
>
> -~-
>
> In this new implementation, writeback_iter() is used to pump folios,
> progressively creating two parallel, but separate streams. Either or both
> streams can contain gaps, and the subrequests in each stream can be of
> variable size, don't need to align with each other and don't need to align
> with the folios. (Note that more streams can be added if we have multiple
> servers to duplicate data to).
>
> Indeed, subrequests can cross folio boundaries, may cover several folios or
> a folio may be spanned by multiple subrequests, e.g.:
>
> +---+---+-----+-----+---+----------+
> Folios: | | | | | | |
> +---+---+-----+-----+---+----------+
>
> +------+------+ +----+----+
> Upload: | | |.....| | |
> +------+------+ +----+----+
>
> +------+------+------+------+------+
> Cache: | | | | | |
> +------+------+------+------+------+
>
> Data that got read from the server that needs copying to the cache is
> stored in folios that are marked dirty and have folio->private set to a
> special value.
>
> The progressive subrequest construction permits the algorithm to be
> preparing both the next upload to the server and the next write to the
> cache whilst the previous ones are already in progress. Throttling can be
> applied to control the rate of production of subrequests - and, in any
> case, we probably want to write them to the server in ascending order,
> particularly if the file will be extended.
>
> Content crypto can also be prepared at the same time as the subrequests and
> run asynchronously, with the prepped requests being stalled until the
> crypto catches up with them. This might also be useful for transport
> crypto, but that happens at a lower layer, so probably would be harder to
> pull off.
>
> The algorithm is split into three parts:
>
> (1) The issuer. This walks through the data, packaging it up, encrypting
> it and creating subrequests. The part of this that generates
> subrequests only deals with file positions and spans and so is usable
> for DIO/unbuffered writes as well as buffered writes.
>
> (2) The collector. This asynchronously collects completed subrequests,
> unlocks folios, frees crypto buffers and performs any retries. This
> runs in a work queue so that the issuer can return to the caller for
> writeback (so that the VM can have its kswapd thread back) or async
> writes.
>
> Collection is slightly complex as the collector has to work out where
> discontiguities happen in the folio list so that it doesn't try and
> collect folios that weren't included in the write out.
>
> (3) The retryer. This pauses the issuer, waits for all outstanding
> subrequests to complete and then goes through the failed subrequests
> to reissue them. This may involve reprepping them (with cifs, the
> credits must be renegotiated and a subrequest may need splitting), and
> doing RMW for content crypto if there's a conflicting change on the
> server.
>
> David
>
> David Howells (26):
> cifs: Fix duplicate fscache cookie warnings
> 9p: Clean up some kdoc and unused var warnings.
> netfs: Update i_blocks when write committed to pagecache
> netfs: Replace PG_fscache by setting folio->private and marking dirty
> mm: Remove the PG_fscache alias for PG_private_2
> netfs: Remove deprecated use of PG_private_2 as a second writeback
> flag
> netfs: Make netfs_io_request::subreq_counter an atomic_t
> netfs: Use subreq_counter to allocate subreq debug_index values
> mm: Provide a means of invalidation without using launder_folio
> cifs: Use alternative invalidation to using launder_folio
> 9p: Use alternative invalidation to using launder_folio
> afs: Use alternative invalidation to using launder_folio
> netfs: Remove ->launder_folio() support
> netfs: Use mempools for allocating requests and subrequests
> mm: Export writeback_iter()
> netfs: Switch to using unsigned long long rather than loff_t
> netfs: Fix writethrough-mode error handling
> netfs: Add some write-side stats and clean up some stat names
> netfs: New writeback implementation
> netfs, afs: Implement helpers for new write code
> netfs, 9p: Implement helpers for new write code
> netfs, cachefiles: Implement helpers for new write code
> netfs: Cut over to using new writeback code
> netfs: Remove the old writeback code
> netfs: Miscellaneous tidy ups
> netfs, afs: Use writeback retry to deal with alternate keys
>
> fs/9p/vfs_addr.c | 60 +--
> fs/9p/vfs_inode_dotl.c | 4 -
> fs/afs/file.c | 8 +-
> fs/afs/internal.h | 6 +-
> fs/afs/validation.c | 4 +-
> fs/afs/write.c | 187 ++++----
> fs/cachefiles/io.c | 75 +++-
> fs/ceph/addr.c | 24 +-
> fs/ceph/inode.c | 2 +
> fs/netfs/Makefile | 3 +-
> fs/netfs/buffered_read.c | 40 +-
> fs/netfs/buffered_write.c | 832 ++++-------------------------------
> fs/netfs/direct_write.c | 30 +-
> fs/netfs/fscache_io.c | 14 +-
> fs/netfs/internal.h | 55 ++-
> fs/netfs/io.c | 155 +------
> fs/netfs/main.c | 55 ++-
> fs/netfs/misc.c | 10 +-
> fs/netfs/objects.c | 81 +++-
> fs/netfs/output.c | 478 --------------------
> fs/netfs/stats.c | 17 +-
> fs/netfs/write_collect.c | 813 ++++++++++++++++++++++++++++++++++
> fs/netfs/write_issue.c | 673 ++++++++++++++++++++++++++++
> fs/nfs/file.c | 8 +-
> fs/nfs/fscache.h | 6 +-
> fs/nfs/write.c | 4 +-
> fs/smb/client/cifsfs.h | 1 -
> fs/smb/client/file.c | 136 +-----
> fs/smb/client/fscache.c | 16 +-
> fs/smb/client/inode.c | 27 +-
> include/linux/fscache.h | 22 +-
> include/linux/netfs.h | 196 +++++----
> include/linux/pagemap.h | 1 +
> include/net/9p/client.h | 2 +
> include/trace/events/netfs.h | 249 ++++++++++-
> mm/filemap.c | 52 ++-
> mm/page-writeback.c | 1 +
> net/9p/Kconfig | 1 +
> net/9p/client.c | 49 +++
> net/9p/trans_fd.c | 1 -
> 40 files changed, 2492 insertions(+), 1906 deletions(-)
> delete mode 100644 fs/netfs/output.c
> create mode 100644 fs/netfs/write_collect.c
> create mode 100644 fs/netfs/write_issue.c
>
This all looks pretty reasonable. There is at least one bugfix that
looks like it ought to go in independently (#17). #19 is huge, complex
and hard to review. That will need some cycles in -next, I think. In any
case, on any that I didn't send comments you can add:
Reviewed-by: Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists