linux-kernel - Re: [PATCH] nfsd: always handle RPC_SIGNALLED earlier in nfsd4_cb_sequence

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5e3f1c01-8e3f-4a18-86b4-c9494d71f30a@oracle.com>
Date: Thu, 23 Jan 2025 09:37:13 -0500
From: Chuck Lever <chuck.lever@...cle.com>
To: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
        Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
        Tom Talpey <tom@...pey.com>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nfsd: always handle RPC_SIGNALLED earlier in
 nfsd4_cb_sequence_done()

On 1/22/25 11:50 AM, Jeff Layton wrote:
> On Wed, 2025-01-22 at 11:06 -0500, Chuck Lever wrote:
>> On 1/22/25 10:44 AM, Jeff Layton wrote:
>>> On Wed, 2025-01-22 at 10:20 -0500, Chuck Lever wrote:
>>>> On 1/22/25 10:10 AM, Jeff Layton wrote:
>>>>> The v4.0 client always restarts the callback when the connection is shut
>>>>> down (which is indicated by RPC_SIGNALLED()). The RPC is then requeued
>>>>> and the result eventually should complete (or be aborted).
>>>>>
>>>>> The v4.1 code instead processes the result and only later decides to
>>>>> restart the call. Even more problematic is the fact that it releases the
>>>>> slot beforehand. The restarted call may get a new slot, which would
>>>>> could break DRC handling.
>>>>
>>>> "break DRC handling" -- I'd like to understand this.
>>>>
>>>> NFSD always sets cachethis to false in CB_SEQUENCE, so there is no DRC
>>>> for these operations. The only thing the client saves is the slot
>>>> sequence number IIUC.
>>>>
>>>> Is retrying an uncached operation via a different slot a problem?
>>>>
>>>
>>> Ahh, I missed that we always set cachethis to false. So, there is
>>> probably now a problem with the DRC after all. Still, I don't see a
>>> good argument for processing the CB_SEQUENCE result, when we intend to
>>> retransmit the call anyway.
>>
>> I expect that the rationale is that the slot sequence number needs to be
>> advanced appropriately before the slot can be used again.
> 
> Once RPC_SIGNALLED returns true, the callback code can either trust the
> result of the rpc_task or not. If it's going to trust that result, then
> there is no need to restart the call.
> 
> If it's not going to trust it, then the RPC call might as well have not
> happened, and there is no need to increment the slot sequence number or
> do anything else.
> 
> Is my understanding wrong here?

It might be.

The callback client is careful to initialize cb_seq_status to 1 before
it sends the RPC call. Thus if cb_seq_status is any value other than 1
in nfsd4_cb_sequence_done(), that means an RPC reply was received
successfully and the XDR decoder was successful.

RPC_SIGNALLED doesn't have anything to do with whether a reply arrived
or can be trusted.

nfsd4_cb_sequence_done() needs to process the reply unconditionally,
otherwise the server and client will disagree on the slot sequence
number for that slot.

-- 
Chuck Lever