[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com>
Date: Mon, 1 Sep 2025 17:07:39 +0800
From: Li Lingfeng <lilingfeng3@...wei.com>
To: Trond Myklebust <trondmy@...nel.org>, "zhangjian (CG)"
<zhangjian496@...wei.com>, <anna@...nel.org>
CC: <linux-nfs@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Chuck Lever
<chuck.lever@...cle.com>, Jeff Layton <jlayton@...nel.org>, NeilBrown
<neil@...wn.name>, yangerkun <yangerkun@...wei.com>, "zhangyi (F)"
<yi.zhang@...wei.com>, Hou Tao <houtao1@...wei.com>,
"chengzhihao1@...wei.com" <chengzhihao1@...wei.com>, Li Lingfeng
<lilingfeng@...weicloud.com>
Subject: Re: [Question]nfs: never returned delegation
Hi,
在 2025/8/11 21:03, Trond Myklebust 写道:
> On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
>> Recently, we meet a NFS problem in 5.10. There are so many
>> test_state_id request after a non-privilaged request in tcpdump
>> result. There are 40w+ delegations in client (I read the delegation
>> list from /proc/kcore).
>> Firstly, I think state manager cost a lot in
>> nfs_server_reap_expired_delegations. But I see they are all in
>> NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I
>> read this from /proc/kcore too).
>> I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure
>> meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and
>> never return it again. NFS server will keep the revoked delegation in
>> clp->cl_revoked forever. This will result in following sequence
>> response with RECALLABLE_STATE_REVOKED flag. Client will send
>> test_state_id request for all non-revoked delegation.
>> This can only be solved by restarting NFS server.
>> I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not
>> the only case that cause lots of non-terminable test_state_id
>> requests after any non-privilaged request.
>> Wish NFS experts give some advices on this problem.
>>
> You have the following options:
>
> 1. Don't ever use "soft" or "softerr" on the NFS client.
> 2. Reboot your server every now and again.
> 3. Change the server code to not bother caching revoked state. Doing
> so is rather pointless, since there is nothing a client can do
> differently when presented with NFS4ERR_DELEG_REVOKED vs.
> NFS4ERR_BAD_STATEID.
> 4. Change the server code to garbage collect revoked stateids after
> a while.
>
I found that a server-side bug could also cause such behavior, and I've
reproduced the issue based on the master (commit b320789d6883).
nfs4_laundromat nfsd4_delegreturn
list_add // add dp to reaplist
// by dl_recall_lru
list_del_init // delete dp from
// reaplist
destroy_delegation
unhash_delegation_locked
list_del_init
// dp was not added to any list
// via dl_recall_lru
revoke_delegation
list_add // add dp to cl_revoked
// by dl_recall_lru
The delegation will be left in cl_revoked.
I agree with Trond's suggestion to change the server code to fix it.
Thanks,
Lingfeng
Powered by blists - more mailing lists