[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ebbe58df9575136583f65cd19d133bdb61d5c20.camel@redhat.com>
Date: Fri, 05 Apr 2024 22:11:45 +0200
From: vbenes@...hat.com
To: Chuck Lever <chuck.lever@...cle.com>, Jeff Layton <jlayton@...nel.org>
Cc: Neil Brown <neilb@...e.de>, Olga Kornievskaia <kolga@...app.com>, Dai
Ngo <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] nfsd: hold a lighter-weight client reference over
CB_RECALL_ANY
On Fri, 2024-04-05 at 14:07 -0400, Chuck Lever wrote:
> On Fri, Apr 05, 2024 at 01:56:18PM -0400, Jeff Layton wrote:
> > Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to
> > the
> > client. While a callback job is technically an RPC that counter is
> > really more for client-driven RPCs, and this has the effect of
> > preventing the client from being unhashed until the callback
> > completes.
> >
> > If nfsd decides to send a CB_RECALL_ANY just as the client reboots,
> > we
> > can end up in a situation where the callback can't complete on the
> > (now
> > dead) callback channel, but the new client can't connect because
> > the old
> > client can't be unhashed. This usually manifests as a NFS4ERR_DELAY
> > return on the CREATE_SESSION operation.
> >
> > The job is only holding a reference to the client so it can clear a
> > flag
> > in the after the RPC completes. Fix this by having CB_RECALL_ANY
> > instead
> > hold a reference to the cl_nfsdfs.cl_ref. Typically we only take
> > that
> > sort of reference when dealing with the nfsdfs info files, but it
> > should
> > work appropriately here to ensure that the nfs4_client doesn't
> > disappear.
> >
> > Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low
> > memory condition")
> > Reported-by: Vladimir Benes <vbenes@...hat.com>
> > Signed-off-by: Jeff Layton <jlayton@...nel.org>
>
> Applied to nfsd-fixes while waiting for review and testing. Thanks!
>
>
> > ---
> > Changes in v2:
> > - Clean up the changelog
> > - Add Fixes: tag
> > - Use kref_get instead of kref_get_unless_zero
> > ---
> > fs/nfsd/nfs4state.c | 7 ++-----
> > 1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 5fcd93f7cb8c..3cef81e196c6 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -3042,12 +3042,9 @@ static void
> > nfsd4_cb_recall_any_release(struct nfsd4_callback *cb)
> > {
> > struct nfs4_client *clp = cb->cb_clp;
> > - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
> >
> > - spin_lock(&nn->client_lock);
> > clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags);
> > - put_client_renew_locked(clp);
> > - spin_unlock(&nn->client_lock);
> > + drop_client(clp);
> > }
> >
> > static int
> > @@ -6616,7 +6613,7 @@ deleg_reaper(struct nfsd_net *nn)
> > list_add(&clp->cl_ra_cblist, &cblist);
> >
> > /* release in nfsd4_cb_recall_any_release */
> > - atomic_inc(&clp->cl_rpc_users);
> > + kref_get(&clp->cl_nfsdfs.cl_ref);
> > set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp-
> > >cl_flags);
> > clp->cl_ra_time = ktime_get_boottime_seconds();
> > }
> >
> > ---
> > base-commit: 05258a0a69b3c5d2c003f818702c0a52b6fea861
> > change-id: 20240405-rhel-31513-028ab6f14252
> >
> > Best regards,
> > --
> > Jeff Layton <jlayton@...nel.org>
> >
> >
>
Hi,
I've just finished the testing of the new patch on the same HW
configuration and the dracut test suite is stable again.
Thank you for your patches!
Vladimir Benes
Tested-by: Vladimir Benes <vbenes@...hat.com>
Powered by blists - more mailing lists