[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <421c33ce-d43b-4444-a83a-a25f4fabfce2@oracle.com>
Date: Sat, 25 Jan 2025 11:24:57 -0500
From: Chuck Lever <chuck.lever@...cle.com>
To: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
Tom Talpey <tom@...pey.com>, "J. Bruce Fields" <bfields@...ldses.org>,
Kinglong Mee <kinglongmee@...il.com>,
Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH 1/8] nfsd: don't restart v4.1+ callback when RPC_SIGNALLED
is set
On 1/23/25 3:25 PM, Jeff Layton wrote:
> This is problematic, since the RPC might have been entirely successful.
> There is no point in restarting a v4.1+ callback just because
> RPC_SIGNALLED is true. The v4.1+ error handling has other mechanisms for
> detecting when it should retransmit the RPC.
>
> Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
> Signed-off-by: Jeff Layton <jlayton@...nel.org>
> ---
> fs/nfsd/nfs4callback.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 50e468bdb8d4838b5217346dcc2bd0fec1765c1a..e12205ef16ca932ffbcc86d67b0817aec2436c89 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1403,9 +1403,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> }
> trace_nfsd_cb_free_slot(task, cb);
> nfsd41_cb_release_slot(cb);
> -
> - if (RPC_SIGNALLED(task))
> - goto need_restart;
> out:
> return ret;
> retry_nowait:
>
I too am skeptical about this logic, but I don't entirely understand it
yet. More importantly, though, I don't recall seeing (mis)behavior that
I can directly attribute to it, so I can't yet confirm or deny your
assertion that "This is problematic".
Before making a code change here, let's gather a little evidence of a
real problem. For instance, we might want to replace this logic with
something better rather than wholesale removing it.
You might start by enabling aggressive disconnect injection to see how
backchannel recovery works (or that it doesn't work!). I'm trying this
on my kdevops NFSD while running fstests:
cd /sys/kernel/debug/fail_sunrpc/
echo Y > ignore-cache-wait
echo Y > ignore-client-disconnect
echo 24847 > interval
echo 97 > times
echo 100 > probability
--
Chuck Lever
Powered by blists - more mailing lists