linux-kernel - Re: [PATCH] nfsd: always handle RPC_SIGNALLED earlier in nfsd4_cb_sequence

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e85a0515-7972-4428-9270-a982073adcb4@oracle.com>
Date: Wed, 22 Jan 2025 10:20:02 -0500
From: Chuck Lever <chuck.lever@...cle.com>
To: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
        Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
        Tom Talpey <tom@...pey.com>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nfsd: always handle RPC_SIGNALLED earlier in
 nfsd4_cb_sequence_done()

On 1/22/25 10:10 AM, Jeff Layton wrote:
> The v4.0 client always restarts the callback when the connection is shut
> down (which is indicated by RPC_SIGNALLED()). The RPC is then requeued
> and the result eventually should complete (or be aborted).
> 
> The v4.1 code instead processes the result and only later decides to
> restart the call. Even more problematic is the fact that it releases the
> slot beforehand. The restarted call may get a new slot, which would
> could break DRC handling.

"break DRC handling" -- I'd like to understand this.

NFSD always sets cachethis to false in CB_SEQUENCE, so there is no DRC
for these operations. The only thing the client saves is the slot
sequence number IIUC.

Is retrying an uncached operation via a different slot a problem?


> Change nfsd4_cb_sequence_done() such that RPC_SIGNALLED() is handled the
> same way irrespective of the minorversion. Check it early, before
> processing the CB_SEQUENCE result, and immediately restart the call.

It would be lovely to have some test cases that exercise the server's
backchannel retry logic.


> Signed-off-by: Jeff Layton <jlayton@...nel.org>
> ---
>   fs/nfsd/nfs4callback.c | 32 ++++++++++++++++----------------
>   1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 50e468bdb8d4838b5217346dcc2bd0fec1765c1a..05afdf94c6478c7d698b059fa33dd9e7af49b91e 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1334,21 +1334,24 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>   	struct nfsd4_session *session = clp->cl_cb_session;
>   	bool ret = true;
>   
> -	if (!clp->cl_minorversion) {
> -		/*
> -		 * If the backchannel connection was shut down while this
> -		 * task was queued, we need to resubmit it after setting up
> -		 * a new backchannel connection.
> -		 *
> -		 * Note that if we lost our callback connection permanently
> -		 * the submission code will error out, so we don't need to
> -		 * handle that case here.
> -		 */
> -		if (RPC_SIGNALLED(task))
> -			goto need_restart;
> +	/*
> +	 * If the backchannel connection was shut down while this
> +	 * task was queued, resubmit it after setting up a new backchannel
> +	 * connection.
> +	 *
> +	 * If the backchannel connection is permanently lost, the submission
> +	 * code will error out, so there is no need to handle that case here.
> +	 *
> +	 * For the v4.1+ case, do this without releasing the slot or doing
> +	 * any further processing. It's possible that the restarted call will
> +	 * be a retransmission of an earlier call to the server and that will
> +	 * help ensure that the DRC works correctly.
> +	 */
> +	if (RPC_SIGNALLED(task))
> +		goto need_restart;
>   
> +	if (!clp->cl_minorversion)
>   		return true;
> -	}
>   
>   	if (cb->cb_held_slot < 0)
>   		goto need_restart;
> @@ -1403,9 +1406,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>   	}
>   	trace_nfsd_cb_free_slot(task, cb);
>   	nfsd41_cb_release_slot(cb);
> -
> -	if (RPC_SIGNALLED(task))
> -		goto need_restart;
>   out:
>   	return ret;
>   retry_nowait:
> 
> ---
> base-commit: a8acd0de47f22619d62d548b86bcfc9a4de2b2c6
> change-id: 20250122-nfsd-6-14-77c1ad5d9f01
> 
> Best regards,


-- 
Chuck Lever