lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <00baf78e7d483930ddac4129fb91828707d89769.camel@kernel.org>
Date: Fri, 24 Jan 2025 11:08:58 -0500
From: Jeff Layton <jlayton@...nel.org>
To: Chuck Lever <chuck.lever@...cle.com>, Neil Brown <neilb@...e.de>, Olga
 Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>, Tom
 Talpey <tom@...pey.com>, "J. Bruce Fields" <bfields@...ldses.org>, Kinglong
 Mee <kinglongmee@...il.com>, Trond Myklebust	 <trondmy@...nel.org>, Anna
 Schumaker <anna@...nel.org>, "David S. Miller"	 <davem@...emloft.net>, Eric
 Dumazet <edumazet@...gle.com>, Jakub Kicinski	 <kuba@...nel.org>, Paolo
 Abeni <pabeni@...hat.com>, Simon Horman	 <horms@...nel.org>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	netdev@...r.kernel.org
Subject: Re: [PATCH 2/8] nfsd: fix CB_SEQUENCE error handling of
 NFS4ERR_{BADSLOT,BADSESSION,SEQ_MISORDERED}

On Fri, 2025-01-24 at 10:31 -0500, Chuck Lever wrote:
> On 1/24/25 9:46 AM, Jeff Layton wrote:
> > On Fri, 2025-01-24 at 09:32 -0500, Chuck Lever wrote:
> > > On 1/23/25 3:25 PM, Jeff Layton wrote:
> > > > The current error handling has some problems:
> > > > 
> > > > BADSLOT and BADSESSION: don't release the slot before retrying the call
> > > > 
> > > > SEQ_MISORDERED: does some sketchy resetting of the seqid? I can't find any
> > > > recommendation about doing that in the spec, and it seems wrong.
> > > 
> > > Random thought: You might use the Linux NFS client's forechannel session
> > > implementation as a code reference.
> > > 
> > > 
> > > > Handle all three errors the same way: release the slot, but then handle
> > > > it just like we would as if we hadn't gotten a reply; mark the session
> > > > as faulty, and retry the call.
> > > 
> > > Some questions:
> > > 
> > > Why does it matter whether NFSD keeps the slot if both sides plan to
> > > destroy the session?
> > > 
> > 
> > It may not be required, but there is no reason to hold onto the slot in
> > these cases.
> 
> In the BADSLOT case, if the slot is released, then another session
> consumer on the NFS server can use it and will encounter the same error.
> Best to keep it in the penalty box, IMO.
> 

There is another problem here too. Once the session is reconstituted,
there is no guarantee that the slot that the call is sitting on will
still be valid. The new CB slot table may be smaller than before. I
think we do need to release the slot in these cases for that reason
alone.

> If there are other slots, they are likely still usable. An
> implementation can choose to continue using the session rather than
> scuttling it immediately. In the past, with a single backchannel slot,
> NFSD had no choice but to replace the session. But now it can be more
> conservative.
> 
> 
> > Also, at this point, only nfsd has declared that it needs
> > a new session (see below).
> 
> If the client's backchannel service has returned BADSESSION, then the
> client already knows this session is unusable.
> 
> 
> > > Also, AFAICT marking CB_FAULT does not destroy the session, it simply
> > > tries to recreate backchannel's rpc_clnt. Perhaps NFSD's callback code
> > > should actively destroy the session and let the client drive a fresh
> > > CREATE_SESSION to recover?
> > > 
> > 
> > Marking it with a fault just sets the cl_cb_state to NFSD4_CB_FAULT.
> > Then, on the next SEQUENCE call, that makes nfsd set
> > SEQ4_STATUS_BACKCHANNEL_FAULT, which should make the client recreate
> > the session. Obviously, there is some delay involved there since we
> > might have to wait for the client to do a lease renewal before this
> > happens.
> > 
> > > 
> > > > Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
> > > > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> > > > ---
> > > >    fs/nfsd/nfs4callback.c | 27 +++++++++++----------------
> > > >    1 file changed, 11 insertions(+), 16 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index e12205ef16ca932ffbcc86d67b0817aec2436c89..bfc9de1fcb67b4f05ed2f7a28038cd8290809c17 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1371,17 +1371,24 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > >    		nfsd4_mark_cb_fault(cb->cb_clp);
> > > >    		ret = false;
> > > >    		break;
> > > > +	case -NFS4ERR_BADSESSION:
> > > > +	case -NFS4ERR_BADSLOT:
> > > > +	case -NFS4ERR_SEQ_MISORDERED:
> > > > +		/*
> > > > +		 * These errors indicate that something has gone wrong
> > > > +		 * with the server and client's synchronization. Release
> > > > +		 * the slot, but handle it as if we hadn't gotten a reply.
> > > > +		 */
> > > > +		nfsd41_cb_release_slot(cb);
> > > > +		fallthrough;
> > > >    	case 1:
> > > >    		/*
> > > >    		 * cb_seq_status remains 1 if an RPC Reply was never
> > > >    		 * received. NFSD can't know if the client processed
> > > >    		 * the CB_SEQUENCE operation. Ask the client to send a
> > > > -		 * DESTROY_SESSION to recover.
> > > > +		 * DESTROY_SESSION to recover, but keep the slot.
> > > >    		 */
> > > > -		fallthrough;
> > > > -	case -NFS4ERR_BADSESSION:
> > > >    		nfsd4_mark_cb_fault(cb->cb_clp);
> > > > -		ret = false;
> > > >    		goto need_restart;
> > > >    	case -NFS4ERR_DELAY:
> > > >    		cb->cb_seq_status = 1;
> > > > @@ -1390,14 +1397,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > >    
> > > >    		rpc_delay(task, 2 * HZ);
> > > >    		return false;
> > > > -	case -NFS4ERR_BADSLOT:
> > > > -		goto retry_nowait;
> > > > -	case -NFS4ERR_SEQ_MISORDERED:
> > > > -		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > > > -			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > > > -			goto retry_nowait;
> > > > -		}
> > > > -		break;
> > > >    	default:
> > > >    		nfsd4_mark_cb_fault(cb->cb_clp);
> > > >    	}
> > > > @@ -1405,10 +1404,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > >    	nfsd41_cb_release_slot(cb);
> > > >    out:
> > > >    	return ret;
> > > > -retry_nowait:
> > > > -	if (rpc_restart_call_prepare(task))
> > > > -		ret = false;
> > > > -	goto out;
> > > >    need_restart:
> > > >    	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
> > > >    		trace_nfsd_cb_restart(clp, cb);
> > > > 
> > > 
> > > 
> > 
> 
> 

-- 
Jeff Layton <jlayton@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ