netdev - Re: [PATCH 1/8] nfsd: don't restart v4.1+ callback when RPC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <421c33ce-d43b-4444-a83a-a25f4fabfce2@oracle.com>
Date: Sat, 25 Jan 2025 11:24:57 -0500
From: Chuck Lever <chuck.lever@...cle.com>
To: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
        Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
        Tom Talpey <tom@...pey.com>, "J. Bruce Fields" <bfields@...ldses.org>,
        Kinglong Mee <kinglongmee@...il.com>,
        Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        netdev@...r.kernel.org
Subject: Re: [PATCH 1/8] nfsd: don't restart v4.1+ callback when RPC_SIGNALLED
 is set

On 1/23/25 3:25 PM, Jeff Layton wrote:
> This is problematic, since the RPC might have been entirely successful.
> There is no point in restarting a v4.1+ callback just because
> RPC_SIGNALLED is true. The v4.1+ error handling has other mechanisms for
> detecting when it should retransmit the RPC.
> 
> Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
> Signed-off-by: Jeff Layton <jlayton@...nel.org>
> ---
>   fs/nfsd/nfs4callback.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 50e468bdb8d4838b5217346dcc2bd0fec1765c1a..e12205ef16ca932ffbcc86d67b0817aec2436c89 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1403,9 +1403,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>   	}
>   	trace_nfsd_cb_free_slot(task, cb);
>   	nfsd41_cb_release_slot(cb);
> -
> -	if (RPC_SIGNALLED(task))
> -		goto need_restart;
>   out:
>   	return ret;
>   retry_nowait:
> 

I too am skeptical about this logic, but I don't entirely understand it
yet. More importantly, though, I don't recall seeing (mis)behavior that
I can directly attribute to it, so I can't yet confirm or deny your 
assertion that "This is problematic".

Before making a code change here, let's gather a little evidence of a
real problem. For instance, we might want to replace this logic with
something better rather than wholesale removing it.

You might start by enabling aggressive disconnect injection to see how
backchannel recovery works (or that it doesn't work!). I'm trying this
on my kdevops NFSD while running fstests:

cd /sys/kernel/debug/fail_sunrpc/
echo Y > ignore-cache-wait
echo Y > ignore-client-disconnect
echo 24847 > interval
echo 97 > times
echo 100 > probability


-- 
Chuck Lever