[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrbgRnXV-snNicjY@codewreck.org>
Date: Sat, 10 Aug 2024 12:36:38 +0900
From: Dominique Martinet <asmadeus@...ewreck.org>
To: David Howells <dhowells@...hat.com>
Cc: Eric Van Hensbergen <ericvh@...nel.org>,
Latchesar Ionkov <lucho@...kov.net>,
Christian Schoenebeck <linux_oss@...debyte.com>,
Marc Dionne <marc.dionne@...istor.com>,
Ilya Dryomov <idryomov@...il.com>, Steve French <sfrench@...ba.org>,
Paulo Alcantara <pc@...guebit.com>,
Trond Myklebust <trond.myklebust@...merspace.com>,
Christian Brauner <brauner@...nel.org>, v9fs@...ts.linux.dev,
linux-afs@...ts.infradead.org, ceph-devel@...r.kernel.org,
linux-cifs@...r.kernel.org, linux-nfs@...r.kernel.org,
netfs@...ts.linux.dev, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] 9p: Fix DIO read through netfs
David Howells wrote on Fri, Aug 09, 2024 at 02:56:09PM +0100:
> From: Dominique Martinet <asmadeus@...ewreck.org>
>
> 9p: Fix DIO read through netfs
nitpick: now sure how that ended up here but this is duplicated with the
subject (the commit message ends up with this line twice)
> If a program is watching a file on a 9p mount, it won't see any change in
> size if the file being exported by the server is changed directly in the
> source filesystem, presumably because 9p doesn't have change notifications,
> and because netfs skips the reads if the file is empty.
>
> Fix this by attempting to read the full size specified when a DIO read is
> requested (such as when 9p is operating in unbuffered mode) and dealing
> with a short read if the EOF was less than the expected read.
>
> To make this work, filesystems using netfslib must not set
> NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
> I don't want to mandatorily clear this flag in netfslib for DIO because,
> say, ceph might make a read from an object that is not completely filled,
> but does not reside at the end of file - and so we need to clear the
> excess.
>
> This can be tested by watching an empty file over 9p within a VM (such as
> in the ktest framework):
>
> while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo
(This is basically the same thing but if one wants to control the read
timing for more precise/verbose debugging:
exec 3< /host/tmp/foo
read -u 3 content && echo $content
(repeat as appropriate)
exec 3>&-
)
> then writing something into the empty file. The watcher should immediately
> display the file content and break out of the loop. Without this fix, it
> remains in the loop indefinitely.
>
> Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
> Written-by: Dominique Martinet <asmadeus@...ewreck.org>
Thanks for adding extra comments & fixing other filesystems.
I've checked this covers all cases of setting NETFS_SREQ_CLEAR_TAIL so
hopefully shouldn't have further side effects, this sounds good to me:
Signed-off-by: Dominique Martinet <asmadeus@...ewreck.org>
> Signed-off-by: David Howells <dhowells@...hat.com>
> cc: Eric Van Hensbergen <ericvh@...nel.org>
> cc: Latchesar Ionkov <lucho@...kov.net>
> cc: Christian Schoenebeck <linux_oss@...debyte.com>
> cc: Marc Dionne <marc.dionne@...istor.com>
> cc: Ilya Dryomov <idryomov@...il.com>
> cc: Steve French <sfrench@...ba.org>
> cc: Paulo Alcantara <pc@...guebit.com>
> cc: Trond Myklebust <trond.myklebust@...merspace.com>
> cc: v9fs@...ts.linux.dev
> cc: linux-afs@...ts.infradead.org
> cc: ceph-devel@...r.kernel.org
> cc: linux-cifs@...r.kernel.org
> cc: linux-nfs@...r.kernel.org
> cc: netfs@...ts.linux.dev
> cc: linux-fsdevel@...r.kernel.org
> ---
> fs/9p/vfs_addr.c | 3 ++-
> fs/afs/file.c | 3 ++-
> fs/ceph/addr.c | 6 ++++--
> fs/netfs/io.c | 17 +++++++++++------
> fs/nfs/fscache.c | 3 ++-
> fs/smb/client/file.c | 3 ++-
> 6 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index a97ceb105cd8..24fdc74caeba 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -75,7 +75,8 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
>
> /* if we just extended the file size, any portion not in
> * cache won't be on server and is zeroes */
> - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> + if (subreq->rreq->origin != NETFS_DIO_READ)
> + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>
> netfs_subreq_terminated(subreq, err ?: total, false);
> }
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index c3f0c45ae9a9..ec1be0091fdb 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -242,7 +242,8 @@ static void afs_fetch_data_notify(struct afs_operation *op)
>
> req->error = error;
> if (subreq) {
> - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> + if (subreq->rreq->origin != NETFS_DIO_READ)
> + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> netfs_subreq_terminated(subreq, error ?: req->actual_len, false);
> req->subreq = NULL;
> } else if (req->done) {
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index cc0a2240de98..c4744a02db75 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -246,7 +246,8 @@ static void finish_netfs_read(struct ceph_osd_request *req)
> if (err >= 0) {
> if (sparse && err > 0)
> err = ceph_sparse_ext_map_end(op);
> - if (err < subreq->len)
> + if (err < subreq->len &&
> + subreq->rreq->origin != NETFS_DIO_READ)
> __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> if (IS_ENCRYPTED(inode) && err > 0) {
> err = ceph_fscrypt_decrypt_extents(inode,
> @@ -282,7 +283,8 @@ static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq)
> size_t len;
> int mode;
>
> - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> + if (rreq->origin != NETFS_DIO_READ)
> + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
>
> if (subreq->start >= inode->i_size)
> diff --git a/fs/netfs/io.c b/fs/netfs/io.c
> index c179a1c73fa7..5367caf3fa28 100644
> --- a/fs/netfs/io.c
> +++ b/fs/netfs/io.c
> @@ -530,7 +530,8 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
>
> if (transferred_or_error == 0) {
> if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
> - subreq->error = -ENODATA;
> + if (rreq->origin != NETFS_DIO_READ)
> + subreq->error = -ENODATA;
> goto failed;
> }
> } else {
> @@ -601,9 +602,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
> }
> if (subreq->len > ictx->zero_point - subreq->start)
> subreq->len = ictx->zero_point - subreq->start;
> +
> + /* We limit buffered reads to the EOF, but let the
> + * server deal with larger-than-EOF DIO/unbuffered
> + * reads.
> + */
> + if (subreq->len > rreq->i_size - subreq->start)
> + subreq->len = rreq->i_size - subreq->start;
> }
> - if (subreq->len > rreq->i_size - subreq->start)
> - subreq->len = rreq->i_size - subreq->start;
> if (rreq->rsize && subreq->len > rreq->rsize)
> subreq->len = rreq->rsize;
>
> @@ -739,11 +745,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
> do {
> _debug("submit %llx + %llx >= %llx",
> rreq->start, rreq->submitted, rreq->i_size);
> - if (rreq->origin == NETFS_DIO_READ &&
> - rreq->start + rreq->submitted >= rreq->i_size)
> - break;
> if (!netfs_rreq_submit_slice(rreq, &io_iter))
> break;
> + if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags))
> + break;
> if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
> test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
> break;
> diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> index bf29a65c5027..7a558dea75c4 100644
> --- a/fs/nfs/fscache.c
> +++ b/fs/nfs/fscache.c
> @@ -363,7 +363,8 @@ void nfs_netfs_read_completion(struct nfs_pgio_header *hdr)
> return;
>
> sreq = netfs->sreq;
> - if (test_bit(NFS_IOHDR_EOF, &hdr->flags))
> + if (test_bit(NFS_IOHDR_EOF, &hdr->flags) &&
> + sreq->rreq->origin != NETFS_DIO_READ)
> __set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
>
> if (hdr->error)
> diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
> index b2405dd4d4d4..3f3842e7b44a 100644
> --- a/fs/smb/client/file.c
> +++ b/fs/smb/client/file.c
> @@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq)
> goto out;
> }
>
> - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> + if (subreq->rreq->origin != NETFS_DIO_READ)
> + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>
> rc = rdata->server->ops->async_readv(rdata);
> out:
>
--
Dominique Martinet | Asmadeus
Powered by blists - more mailing lists