netdev - Re: [PATCH 1/8] nfsd: don't restart v4.1+ callback when RPC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d52cf9b9b83753434c1b0098afe1b77bf65590d4.camel@kernel.org>
Date: Sun, 26 Jan 2025 06:18:21 -0500
From: Jeff Layton <jlayton@...nel.org>
To: NeilBrown <neilb@...e.de>
Cc: Chuck Lever <chuck.lever@...cle.com>, Olga Kornievskaia	
 <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey
 <tom@...pey.com>,  "J. Bruce Fields" <bfields@...ldses.org>, Kinglong Mee
 <kinglongmee@...il.com>, Trond Myklebust <trondmy@...nel.org>,  Anna
 Schumaker	 <anna@...nel.org>, "David S. Miller" <davem@...emloft.net>, Eric
 Dumazet	 <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo
 Abeni	 <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
 linux-nfs@...r.kernel.org, 	linux-kernel@...r.kernel.org,
 netdev@...r.kernel.org
Subject: Re: [PATCH 1/8] nfsd: don't restart v4.1+ callback when
 RPC_SIGNALLED is set

On Sun, 2025-01-26 at 10:01 +1100, NeilBrown wrote:
> On Fri, 24 Jan 2025, Jeff Layton wrote:
> > This is problematic, since the RPC might have been entirely successful.
> > There is no point in restarting a v4.1+ callback just because
> > RPC_SIGNALLED is true. The v4.1+ error handling has other mechanisms for
> > detecting when it should retransmit the RPC.
> 
> But why might RPC_SIGNALLED() ever be true?
> The flag RPC_TASK_SIGNALLED is only ever set by rpc_signal_task() which
> is only called from rpc_killall_tasks() and __rpc_execute() for
> non-async tasks which doesn't apply to nfsd callbacks as they are
> started with rpc_call_async().
> 
> rpc_killall_tasks() is called by fs/nfs/ which isn't relevant for us,
> and from rpc_shutdown_client().  In those cases we certainly don't want
> the request to be retried, though the nfsd4_process_cb_update() case is
> a little interesting as we do want it to be retried, but in a different
> client.
>
> So the code you are removing is either dead code because something else
> will prevent the restart when a client is being shut down, or it is bad
> code because it would delay rpc_shutdown_client() while the request is
> retried. 
> 
> I haven't dug the extra step to figure out which, but either way I think
> the code should go.
> 
> 

Thanks. That was my analysis too.

rpc_shutdown_client() is called when we tear down and rebuild the
rpc_client. nfsd does this in setup_callback_client(), which gets
called from nfsd4_process_cb_update() (basically when we detect that
the backchannel is having problems).

There are really only two states: We either got a reply from the server
before the client went down, or we didn't. In the case where we got a
reply, there is no need to retry anything. In the case where we didn't,
the tk_status will be '1', so there is no need to check RPC_SIGNALLED()
at all here.

The existing code could lead to the call being retried when we had
already gotten a perfectly valid reply.

> > 
> > Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
> > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> > ---
> >  fs/nfsd/nfs4callback.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 50e468bdb8d4838b5217346dcc2bd0fec1765c1a..e12205ef16ca932ffbcc86d67b0817aec2436c89 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1403,9 +1403,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> >  	}
> >  	trace_nfsd_cb_free_slot(task, cb);
> >  	nfsd41_cb_release_slot(cb);
> > -
> > -	if (RPC_SIGNALLED(task))
> > -		goto need_restart;
> >  out:
> >  	return ret;
> >  retry_nowait:
> > 
> > -- 
> > 2.48.1
> > 
> > 
> 

-- 
Jeff Layton <jlayton@...nel.org>