linux-kernel - Re: [PATCH] nfsd: always handle RPC_SIGNALLED earlier in nfsd4_cb_sequence

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ec156ccf-9d6c-44be-bd1d-5f3e9b166e77@talpey.com>
Date: Wed, 22 Jan 2025 10:35:32 -0500
From: Tom Talpey <tom@...pey.com>
To: Chuck Lever <chuck.lever@...cle.com>, Jeff Layton <jlayton@...nel.org>,
 Neil Brown <neilb@...e.de>, Olga Kornievskaia <okorniev@...hat.com>,
 Dai Ngo <Dai.Ngo@...cle.com>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nfsd: always handle RPC_SIGNALLED earlier in
 nfsd4_cb_sequence_done()

On 1/22/2025 10:20 AM, Chuck Lever wrote:
> On 1/22/25 10:10 AM, Jeff Layton wrote:
>> The v4.0 client always restarts the callback when the connection is shut
>> down (which is indicated by RPC_SIGNALLED()). The RPC is then requeued
>> and the result eventually should complete (or be aborted).
>>
>> The v4.1 code instead processes the result and only later decides to
>> restart the call. Even more problematic is the fact that it releases the
>> slot beforehand. The restarted call may get a new slot, which would
>> could break DRC handling.
> 
> "break DRC handling" -- I'd like to understand this.
> 
> NFSD always sets cachethis to false in CB_SEQUENCE, so there is no DRC
> for these operations. The only thing the client saves is the slot
> sequence number IIUC.
> 
> Is retrying an uncached operation via a different slot a problem?

The server would ignore any in-progress status and therefore might
cause two requests to be processed in parallel. It might also
rearrange the order of replies. Other than those, maybe not!

I do think it's safest to not reallocate the slot in the 4.1 case.
The slot sequence number is only meaningful in the context of the
original slot.

Tom.

>> Change nfsd4_cb_sequence_done() such that RPC_SIGNALLED() is handled the
>> same way irrespective of the minorversion. Check it early, before
>> processing the CB_SEQUENCE result, and immediately restart the call.
> 
> It would be lovely to have some test cases that exercise the server's
> backchannel retry logic.
> 
> 
>> Signed-off-by: Jeff Layton <jlayton@...nel.org>
>> ---
>>   fs/nfsd/nfs4callback.c | 32 ++++++++++++++++----------------
>>   1 file changed, 16 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>> index 
>> 50e468bdb8d4838b5217346dcc2bd0fec1765c1a..05afdf94c6478c7d698b059fa33dd9e7af49b91e 100644
>> --- a/fs/nfsd/nfs4callback.c
>> +++ b/fs/nfsd/nfs4callback.c
>> @@ -1334,21 +1334,24 @@ static bool nfsd4_cb_sequence_done(struct 
>> rpc_task *task, struct nfsd4_callback
>>       struct nfsd4_session *session = clp->cl_cb_session;
>>       bool ret = true;
>> -    if (!clp->cl_minorversion) {
>> -        /*
>> -         * If the backchannel connection was shut down while this
>> -         * task was queued, we need to resubmit it after setting up
>> -         * a new backchannel connection.
>> -         *
>> -         * Note that if we lost our callback connection permanently
>> -         * the submission code will error out, so we don't need to
>> -         * handle that case here.
>> -         */
>> -        if (RPC_SIGNALLED(task))
>> -            goto need_restart;
>> +    /*
>> +     * If the backchannel connection was shut down while this
>> +     * task was queued, resubmit it after setting up a new backchannel
>> +     * connection.
>> +     *
>> +     * If the backchannel connection is permanently lost, the submission
>> +     * code will error out, so there is no need to handle that case 
>> here.
>> +     *
>> +     * For the v4.1+ case, do this without releasing the slot or doing
>> +     * any further processing. It's possible that the restarted call 
>> will
>> +     * be a retransmission of an earlier call to the server and that 
>> will
>> +     * help ensure that the DRC works correctly.
>> +     */
>> +    if (RPC_SIGNALLED(task))
>> +        goto need_restart;
>> +    if (!clp->cl_minorversion)
>>           return true;
>> -    }
>>       if (cb->cb_held_slot < 0)
>>           goto need_restart;
>> @@ -1403,9 +1406,6 @@ static bool nfsd4_cb_sequence_done(struct 
>> rpc_task *task, struct nfsd4_callback
>>       }
>>       trace_nfsd_cb_free_slot(task, cb);
>>       nfsd41_cb_release_slot(cb);
>> -
>> -    if (RPC_SIGNALLED(task))
>> -        goto need_restart;
>>   out:
>>       return ret;
>>   retry_nowait:
>>
>> ---
>> base-commit: a8acd0de47f22619d62d548b86bcfc9a4de2b2c6
>> change-id: 20250122-nfsd-6-14-77c1ad5d9f01
>>
>> Best regards,
> 
>