netdev - Re: [PATCH 7/8] nfsd: clean up and amend comments around nfsd4_cb_sequence

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f967fd7-17b6-402e-ac55-aba956ba0d65@oracle.com>
Date: Fri, 24 Jan 2025 10:05:30 -0500
From: Chuck Lever <chuck.lever@...cle.com>
To: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
        Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
        Tom Talpey <tom@...pey.com>, "J. Bruce Fields" <bfields@...ldses.org>,
        Kinglong Mee <kinglongmee@...il.com>,
        Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        netdev@...r.kernel.org
Subject: Re: [PATCH 7/8] nfsd: clean up and amend comments around
 nfsd4_cb_sequence_done()

On 1/24/25 9:50 AM, Jeff Layton wrote:
> On Fri, 2025-01-24 at 09:43 -0500, Chuck Lever wrote:
>> On 1/23/25 3:25 PM, Jeff Layton wrote:
>>> Add a new kerneldoc header, and clean up the comments a bit.
>>
>> Usually I'm in favor of kdoc headers, but here, it's a static function
>> whose address is not shared outside of this source file. The only
>> documentation need is the meaning of the return code, IMO.
>>
> 
> If you like. I figured it wouldn't hurt to do a full kdoc comment.

Kdoc comments are pretty noisy. This one doesn't seem to me to add
much real value -- its callers are all right here in the same file.


>>> Signed-off-by: Jeff Layton <jlayton@...nel.org>
>>> ---
>>>    fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------
>>>    1 file changed, 20 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>> index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644
>>> --- a/fs/nfsd/nfs4callback.c
>>> +++ b/fs/nfsd/nfs4callback.c
>>> @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
>>>    	rpc_call_start(task);
>>>    }
>>>    
>>> +/**
>>> + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE
>>> + * @task: rpc_task
>>> + * @cb: nfsd4_callback for this call
>>> + *
>>> + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call
>>> + * if the callback RPC client was killed. For v4.1+ the error handling
>>> + * is more sophisticated.
>>
>> It would be much clearer to pull the 4.0 error handling out of this
>> function, which is named "cb_/sequence/_done".
>>
>> Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ?
>>
> 
> If we do that then we'll need to change this function to return
> something other than a bool, and that's a larger change than I wanted
> to make here. I really wanted to keep these as small, targeted patches
> that can be backported easily.
> 
> I wouldn't object to further cleanup here on top of that though.

There's no reason to document the 4.0 logic if it's about to be moved
out. I strongly prefer making the code more self-documenting. Adding
a comment here about 4.0 then adding a patch on top moving the code
somewhere else seems silly to me.


>>> + *
>>> + * Returns true if reply processing should continue.
>>> + */
>>>    static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
>>>    {
>>>    	struct nfs4_client *clp = cb->cb_clp;
>>> @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>    	if (!clp->cl_minorversion) {
>>>    		/*
>>>    		 * If the backchannel connection was shut down while this
>>> -		 * task was queued, we need to resubmit it after setting up
>>> -		 * a new backchannel connection.
>>> +		 * task was queued, resubmit it after setting up a new
>>> +		 * backchannel connection.
>>>    		 *
>>> -		 * Note that if we lost our callback connection permanently
>>> -		 * the submission code will error out, so we don't need to
>>> +		 * Note that if the callback connection is permanently lost,
>>> +		 * the submission code will error out. There is no need to
>>>    		 * handle that case here.
>>>    		 */
>>>    		if (RPC_SIGNALLED(task))
>>> @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>    	switch (cb->cb_seq_status) {
>>>    	case 0:
>>>    		/*
>>> -		 * No need for lock, access serialized in nfsd4_cb_prepare
>>> -		 *
>>>    		 * RFC5661 20.9.3
>>>    		 * If CB_SEQUENCE returns an error, then the state of the slot
>>>    		 * (sequence ID, cached reply) MUST NOT change.
>>> @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>    		ret = true;
>>>    		break;
>>>    	case -ESERVERFAULT:
>>> +		/*
>>> +		 * Client returned NFS4_OK, but decoding failed. Mark the
>>> +		 * backchannel as faulty, but don't retransmit since the
>>> +		 * call was successful.
>>> +		 */
>>>    		++session->se_cb_seq_nr[cb->cb_held_slot];
>>>    		nfsd4_mark_cb_fault(cb->cb_clp);
>>>    		break;
>>
>> This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is
>> a better choice. But why call mark_cb_fault in this case?
>>
>> Maybe split this clean-up into a separate patch.
>>
>>
> 
> I'm only altering comments in this patch. Do you really want separate
> patches for the different comments?

Why call mark_cb_fault here? If NFSD retransmits this operation on a
fresh session/transport it will just fail to decode the reply again.

Do we believe that the decoding failure means there was a transport
problem of some kind?

It's clear we do not understand this code well enough to update the
existing comment, so my review comment above suggests a broader code
change is necessary.


-- 
Chuck Lever