[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e85a0515-7972-4428-9270-a982073adcb4@oracle.com>
Date: Wed, 22 Jan 2025 10:20:02 -0500
From: Chuck Lever <chuck.lever@...cle.com>
To: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
Tom Talpey <tom@...pey.com>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nfsd: always handle RPC_SIGNALLED earlier in
nfsd4_cb_sequence_done()
On 1/22/25 10:10 AM, Jeff Layton wrote:
> The v4.0 client always restarts the callback when the connection is shut
> down (which is indicated by RPC_SIGNALLED()). The RPC is then requeued
> and the result eventually should complete (or be aborted).
>
> The v4.1 code instead processes the result and only later decides to
> restart the call. Even more problematic is the fact that it releases the
> slot beforehand. The restarted call may get a new slot, which would
> could break DRC handling.
"break DRC handling" -- I'd like to understand this.
NFSD always sets cachethis to false in CB_SEQUENCE, so there is no DRC
for these operations. The only thing the client saves is the slot
sequence number IIUC.
Is retrying an uncached operation via a different slot a problem?
> Change nfsd4_cb_sequence_done() such that RPC_SIGNALLED() is handled the
> same way irrespective of the minorversion. Check it early, before
> processing the CB_SEQUENCE result, and immediately restart the call.
It would be lovely to have some test cases that exercise the server's
backchannel retry logic.
> Signed-off-by: Jeff Layton <jlayton@...nel.org>
> ---
> fs/nfsd/nfs4callback.c | 32 ++++++++++++++++----------------
> 1 file changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 50e468bdb8d4838b5217346dcc2bd0fec1765c1a..05afdf94c6478c7d698b059fa33dd9e7af49b91e 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1334,21 +1334,24 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> struct nfsd4_session *session = clp->cl_cb_session;
> bool ret = true;
>
> - if (!clp->cl_minorversion) {
> - /*
> - * If the backchannel connection was shut down while this
> - * task was queued, we need to resubmit it after setting up
> - * a new backchannel connection.
> - *
> - * Note that if we lost our callback connection permanently
> - * the submission code will error out, so we don't need to
> - * handle that case here.
> - */
> - if (RPC_SIGNALLED(task))
> - goto need_restart;
> + /*
> + * If the backchannel connection was shut down while this
> + * task was queued, resubmit it after setting up a new backchannel
> + * connection.
> + *
> + * If the backchannel connection is permanently lost, the submission
> + * code will error out, so there is no need to handle that case here.
> + *
> + * For the v4.1+ case, do this without releasing the slot or doing
> + * any further processing. It's possible that the restarted call will
> + * be a retransmission of an earlier call to the server and that will
> + * help ensure that the DRC works correctly.
> + */
> + if (RPC_SIGNALLED(task))
> + goto need_restart;
>
> + if (!clp->cl_minorversion)
> return true;
> - }
>
> if (cb->cb_held_slot < 0)
> goto need_restart;
> @@ -1403,9 +1406,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> }
> trace_nfsd_cb_free_slot(task, cb);
> nfsd41_cb_release_slot(cb);
> -
> - if (RPC_SIGNALLED(task))
> - goto need_restart;
> out:
> return ret;
> retry_nowait:
>
> ---
> base-commit: a8acd0de47f22619d62d548b86bcfc9a4de2b2c6
> change-id: 20250122-nfsd-6-14-77c1ad5d9f01
>
> Best regards,
--
Chuck Lever
Powered by blists - more mailing lists