[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAH2r5muh6-PtS3TcAEAP9xjStcib3KzDtiQmmmWOPHaV2jr2xA@mail.gmail.com>
Date: Mon, 10 Feb 2025 19:07:10 -0600
From: Steve French <smfrench@...il.com>
To: David Howells <dhowells@...hat.com>
Cc: Ihor Solodrai <ihor.solodrai@...me>, Marc Dionne <marc.dionne@...istor.com>,
Steve French <stfrench@...rosoft.com>, Eric Van Hensbergen <ericvh@...nel.org>,
Latchesar Ionkov <lucho@...kov.net>, Dominique Martinet <asmadeus@...ewreck.org>,
Christian Schoenebeck <linux_oss@...debyte.com>, Paulo Alcantara <pc@...guebit.com>,
Jeff Layton <jlayton@...nel.org>, Christian Brauner <brauner@...nel.org>, v9fs@...ts.linux.dev,
linux-cifs@...r.kernel.org, netfs@...ts.linux.dev,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] netfs: Fix a number of read-retry hangs
Tested-by: Steve French <stfrench@...rosoft.com>
Verified it fixed various hangs I saw in testing cifs.ko Here is a
success run with the patch e.g.:
http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/342
On Mon, Jan 27, 2025 at 6:33 PM David Howells <dhowells@...hat.com> wrote:
>
> Hi Ihor, Marc, Steve,
>
> I think this patch should better fix the hangs that have been seen than Ihor's
> previously tested fix (which I think isn't actually correct).
>
> David
> ---
> From: David Howells <dhowells@...hat.com>
>
> netfs: Fix a number of read-retry hangs
>
> Fix a number of hangs in the netfslib read-retry code, including:
>
> (1) netfs_reissue_read() doubles up the getting of references on
> subrequests, thereby leaking the subrequest and causing inode eviction
> to wait indefinitely. This can lead to the kernel reporting a hang in
> the filesystem's evict_inode().
>
> Fix this by removing the get from netfs_reissue_read() and adding one
> to netfs_retry_read_subrequests() to deal with the one place that
> didn't double up.
>
> (2) The loop in netfs_retry_read_subrequests() that retries a sequence of
> failed subrequests doesn't record whether or not it retried the one
> that the "subreq" pointer points to when it leaves the loop. It may
> not if renegotiation/repreparation of the subrequests means that fewer
> subrequests are needed to span the cumulative range of the sequence.
>
> Because it doesn't record this, the piece of code that discards
> now-superfluous subrequests doesn't know whether it should discard the
> one "subreq" points to - and so it doesn't.
>
> Fix this by noting whether the last subreq it examines is superfluous
> and if it is, then getting rid of it and all subsequent subrequests.
>
> If that one one wasn't superfluous, then we would have tried to go
> round the previous loop again and so there can be no further unretried
> subrequests in the sequence.
>
> (3) netfs_retry_read_subrequests() gets yet an extra ref on any additional
> subrequests it has to get because it ran out of ones it could reuse to
> to renegotiation/repreparation shrinking the subrequests.
>
> Fix this by removing that extra ref.
>
> (4) In netfs_retry_reads(), it was using wait_on_bit() to wait for
> NETFS_SREQ_IN_PROGRESS to be cleared on all subrequests in the
> sequence - but netfs_read_subreq_terminated() is now using a wait
> queue on the request instead and so this wait will never finish.
>
> Fix this by waiting on the wait queue instead. To make this work, a
> new flag, NETFS_RREQ_RETRYING, is now set around the wait loop to tell
> the wake-up code to wake up the wait queue rather than requeuing the
> request's work item.
>
> Note that this flag replaces the NETFS_RREQ_NEED_RETRY flag which is
> no longer used.
>
> (5) Whilst not strictly anything to do with the hang,
> netfs_retry_read_subrequests() was also doubly incrementing the
> subreq_counter and re-setting the debug index, leaving a gap in the
> trace. This is also fixed.
>
> One of these hangs was observed with 9p and with cifs. Others were forced
> by manual code injection into fs/afs/file.c. Firstly, afs_prepare_read()
> was created to provide an changing pattern of maximum subrequest sizes:
>
> static int afs_prepare_read(struct netfs_io_subrequest *subreq)
> {
> struct netfs_io_request *rreq = subreq->rreq;
> if (!S_ISREG(subreq->rreq->inode->i_mode))
> return 0;
> if (subreq->retry_count < 20)
> rreq->io_streams[0].sreq_max_len =
> umax(200, 2222 - subreq->retry_count * 40);
> else
> rreq->io_streams[0].sreq_max_len = 3333;
> return 0;
> }
>
> and pointed to by afs_req_ops. Then the following:
>
> struct netfs_io_subrequest *subreq = op->fetch.subreq;
> if (subreq->error == 0 &&
> S_ISREG(subreq->rreq->inode->i_mode) &&
> subreq->retry_count < 20) {
> subreq->transferred = subreq->already_done;
> __clear_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);
> __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
> afs_fetch_data_notify(op);
> return;
> }
>
> was inserted into afs_fetch_data_success() at the beginning and struct
> netfs_io_subrequest given an extra field, "already_done" that was set to
> the value in "subreq->transferred" by netfs_reissue_read().
>
> When reading a 4K file, the subrequests would get gradually smaller, a new
> subrequest would be allocated around the 3rd retry and then eventually be
> rendered superfluous when the 20th retry was hit and the limit on the first
> subrequest was eased.
>
> Reported-by: Ihor Solodrai <ihor.solodrai@...me>
> Closes: https://lore.kernel.org/r/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/
> Signed-off-by: David Howells <dhowells@...hat.com>
> cc: Marc Dionne <marc.dionne@...istor.com>
> cc: Steve French <stfrench@...rosoft.com>
> cc: Eric Van Hensbergen <ericvh@...nel.org>
> cc: Latchesar Ionkov <lucho@...kov.net>
> cc: Dominique Martinet <asmadeus@...ewreck.org>
> cc: Christian Schoenebeck <linux_oss@...debyte.com>
> cc: Paulo Alcantara <pc@...guebit.com>
> cc: Jeff Layton <jlayton@...nel.org>
> cc: v9fs@...ts.linux.dev
> cc: linux-cifs@...r.kernel.org
> cc: netfs@...ts.linux.dev
> cc: linux-fsdevel@...r.kernel.org
> ---
> fs/netfs/read_collect.c | 6 ++++--
> fs/netfs/read_retry.c | 40 ++++++++++++++++++++++++++++++----------
> include/linux/netfs.h | 2 +-
> include/trace/events/netfs.h | 4 +++-
> 4 files changed, 38 insertions(+), 14 deletions(-)
>
> diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
> index f65affa5a9e4..636cc5a98ef5 100644
> --- a/fs/netfs/read_collect.c
> +++ b/fs/netfs/read_collect.c
> @@ -470,7 +470,8 @@ void netfs_read_collection_worker(struct work_struct *work)
> */
> void netfs_wake_read_collector(struct netfs_io_request *rreq)
> {
> - if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) {
> + if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags) &&
> + !test_bit(NETFS_RREQ_RETRYING, &rreq->flags)) {
> if (!work_pending(&rreq->work)) {
> netfs_get_request(rreq, netfs_rreq_trace_get_work);
> if (!queue_work(system_unbound_wq, &rreq->work))
> @@ -586,7 +587,8 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq)
> smp_mb__after_atomic(); /* Clear IN_PROGRESS before task state */
>
> /* If we are at the head of the queue, wake up the collector. */
> - if (list_is_first(&subreq->rreq_link, &stream->subrequests))
> + if (list_is_first(&subreq->rreq_link, &stream->subrequests) ||
> + test_bit(NETFS_RREQ_RETRYING, &rreq->flags))
> netfs_wake_read_collector(rreq);
>
> netfs_put_subrequest(subreq, true, netfs_sreq_trace_put_terminated);
> diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
> index 2290af0d51ac..8316c4533a51 100644
> --- a/fs/netfs/read_retry.c
> +++ b/fs/netfs/read_retry.c
> @@ -14,7 +14,6 @@ static void netfs_reissue_read(struct netfs_io_request *rreq,
> {
> __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
> __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
> - netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
> subreq->rreq->netfs_ops->issue_read(subreq);
> }
>
> @@ -48,6 +47,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
> __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
> subreq->retry_count++;
> netfs_reset_iter(subreq);
> + netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
> netfs_reissue_read(rreq, subreq);
> }
> }
> @@ -75,7 +75,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
> struct iov_iter source;
> unsigned long long start, len;
> size_t part;
> - bool boundary = false;
> + bool boundary = false, subreq_superfluous = false;
>
> /* Go through the subreqs and find the next span of contiguous
> * buffer that we then rejig (cifs, for example, needs the
> @@ -116,8 +116,10 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
> /* Work through the sublist. */
> subreq = from;
> list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) {
> - if (!len)
> + if (!len) {
> + subreq_superfluous = true;
> break;
> + }
> subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
> subreq->start = start - subreq->transferred;
> subreq->len = len + subreq->transferred;
> @@ -154,19 +156,21 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
>
> netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
> netfs_reissue_read(rreq, subreq);
> - if (subreq == to)
> + if (subreq == to) {
> + subreq_superfluous = false;
> break;
> + }
> }
>
> /* If we managed to use fewer subreqs, we can discard the
> * excess; if we used the same number, then we're done.
> */
> if (!len) {
> - if (subreq == to)
> + if (!subreq_superfluous)
> continue;
> list_for_each_entry_safe_from(subreq, tmp,
> &stream->subrequests, rreq_link) {
> - trace_netfs_sreq(subreq, netfs_sreq_trace_discard);
> + trace_netfs_sreq(subreq, netfs_sreq_trace_superfluous);
> list_del(&subreq->rreq_link);
> netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_done);
> if (subreq == to)
> @@ -187,14 +191,12 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
> subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
> subreq->start = start;
> subreq->len = len;
> - subreq->debug_index = atomic_inc_return(&rreq->subreq_counter);
> subreq->stream_nr = stream->stream_nr;
> subreq->retry_count = 1;
>
> trace_netfs_sreq_ref(rreq->debug_id, subreq->debug_index,
> refcount_read(&subreq->ref),
> netfs_sreq_trace_new);
> - netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
>
> list_add(&subreq->rreq_link, &to->rreq_link);
> to = list_next_entry(to, rreq_link);
> @@ -256,14 +258,32 @@ void netfs_retry_reads(struct netfs_io_request *rreq)
> {
> struct netfs_io_subrequest *subreq;
> struct netfs_io_stream *stream = &rreq->io_streams[0];
> + DEFINE_WAIT(myself);
> +
> + set_bit(NETFS_RREQ_RETRYING, &rreq->flags);
>
> /* Wait for all outstanding I/O to quiesce before performing retries as
> * we may need to renegotiate the I/O sizes.
> */
> list_for_each_entry(subreq, &stream->subrequests, rreq_link) {
> - wait_on_bit(&subreq->flags, NETFS_SREQ_IN_PROGRESS,
> - TASK_UNINTERRUPTIBLE);
> + if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags))
> + continue;
> +
> + trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
> + for (;;) {
> + prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
> +
> + if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags))
> + break;
> +
> + trace_netfs_sreq(subreq, netfs_sreq_trace_wait_for);
> + schedule();
> + trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue);
> + }
> +
> + finish_wait(&rreq->waitq, &myself);
> }
> + clear_bit(NETFS_RREQ_RETRYING, &rreq->flags);
>
> trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit);
> netfs_retry_read_subrequests(rreq);
> diff --git a/include/linux/netfs.h b/include/linux/netfs.h
> index 071d05d81d38..c86a11cfc4a3 100644
> --- a/include/linux/netfs.h
> +++ b/include/linux/netfs.h
> @@ -278,7 +278,7 @@ struct netfs_io_request {
> #define NETFS_RREQ_PAUSE 11 /* Pause subrequest generation */
> #define NETFS_RREQ_USE_IO_ITER 12 /* Use ->io_iter rather than ->i_pages */
> #define NETFS_RREQ_ALL_QUEUED 13 /* All subreqs are now queued */
> -#define NETFS_RREQ_NEED_RETRY 14 /* Need to try retrying */
> +#define NETFS_RREQ_RETRYING 14 /* Set if we're in the retry path */
> #define NETFS_RREQ_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark
> * write to cache on read */
> const struct netfs_request_ops *netfs_ops;
> diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
> index 6e699cadcb29..f880835f7695 100644
> --- a/include/trace/events/netfs.h
> +++ b/include/trace/events/netfs.h
> @@ -99,7 +99,7 @@
> EM(netfs_sreq_trace_limited, "LIMIT") \
> EM(netfs_sreq_trace_need_clear, "N-CLR") \
> EM(netfs_sreq_trace_partial_read, "PARTR") \
> - EM(netfs_sreq_trace_need_retry, "NRTRY") \
> + EM(netfs_sreq_trace_need_retry, "ND-RT") \
> EM(netfs_sreq_trace_prepare, "PREP ") \
> EM(netfs_sreq_trace_prep_failed, "PRPFL") \
> EM(netfs_sreq_trace_progress, "PRGRS") \
> @@ -108,7 +108,9 @@
> EM(netfs_sreq_trace_short, "SHORT") \
> EM(netfs_sreq_trace_split, "SPLIT") \
> EM(netfs_sreq_trace_submit, "SUBMT") \
> + EM(netfs_sreq_trace_superfluous, "SPRFL") \
> EM(netfs_sreq_trace_terminated, "TERM ") \
> + EM(netfs_sreq_trace_wait_for, "_WAIT") \
> EM(netfs_sreq_trace_write, "WRITE") \
> EM(netfs_sreq_trace_write_skip, "SKIP ") \
> E_(netfs_sreq_trace_write_term, "WTERM")
>
>
--
Thanks,
Steve
Powered by blists - more mailing lists