[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <51fdd36542d253e92153ed3bc28f69866821735c.camel@kernel.org>
Date: Fri, 05 Apr 2024 13:08:02 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Chuck Lever <chuck.lever@...cle.com>, Neil Brown <neilb@...e.de>, Olga
Kornievskaia <kolga@...app.com>, Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey
<tom@...pey.com>
Cc: Vladimir Benes <vbenes@...hat.com>, linux-nfs@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nfsd: hold a lighter-weight client reference over
CB_RECALL_ANY
On Fri, 2024-04-05 at 13:00 -0400, Jeff Layton wrote:
> Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the
> client. While a callback job is technically an RPC, this has the effect
> of preventing the client from being unhashed.
>
> If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we
> can end up in a situation where the callback can't complete on the (now
> dead) callback channel, but the new client also can't connect because
> the old client can't be unhashed.
>
> The job is only holding a reference to the client so it can clear a flag
> in the client after it completes. This patch attempts to alleviate this
> by having the job instead hold a reference to the cl_nfsdfs.cl_ref.
>
> Typically we only take that sort of reference when dealing with the
> nfsdfs info files, but it should work appropriately here too.
>
This should probably also have:
Fixes: 44df6f439a17 NFSD: add delegation reaper to react to low memory condition
> Reported-by: Vladimir Benes <vbenes@...hat.com>
> Signed-off-by: Jeff Layton <jlayton@...nel.org>
> ---
> This patch seems to break the livelock in my testing. It's not the
> prettiest fix, but it's narrowly targeted and should be appropriate for
> 6.9-rc. Longer term, I think we need to rework how the nfs4_clients
> refcounts are managed, but that's a larger project.
>
> Many thanks to Vladimir for all his help with tracking this down!
> ---
> fs/nfsd/nfs4state.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 5fcd93f7cb8c..4311d608a297 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -3042,12 +3042,9 @@ static void
> nfsd4_cb_recall_any_release(struct nfsd4_callback *cb)
> {
> struct nfs4_client *clp = cb->cb_clp;
> - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
>
> - spin_lock(&nn->client_lock);
> clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags);
> - put_client_renew_locked(clp);
> - spin_unlock(&nn->client_lock);
> + drop_client(clp);
> }
>
> static int
> @@ -6613,10 +6610,12 @@ deleg_reaper(struct nfsd_net *nn)
> clp->cl_ra_time < 5)) {
> continue;
> }
> - list_add(&clp->cl_ra_cblist, &cblist);
>
> /* release in nfsd4_cb_recall_any_release */
> - atomic_inc(&clp->cl_rpc_users);
> + if (!kref_get_unless_zero(&clp->cl_nfsdfs.cl_ref))
> + continue;
> +
> + list_add(&clp->cl_ra_cblist, &cblist);
> set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags);
> clp->cl_ra_time = ktime_get_boottime_seconds();
> }
>
> ---
> base-commit: 05258a0a69b3c5d2c003f818702c0a52b6fea861
> change-id: 20240405-rhel-31513-028ab6f14252
>
> Best regards,
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists