[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e1eadf5-fab9-4919-a71a-864aa7109c7b@huawei.com>
Date: Tue, 12 Aug 2025 10:45:52 +0800
From: "zhangjian (CG)" <zhangjian496@...wei.com>
To: Jeff Layton <jlayton@...nel.org>, Trond Myklebust <trondmy@...nel.org>,
<anna@...nel.org>
CC: <linux-nfs@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [Question]nfs: never returned delegation
Thanks a lot for reply.
Stateid is marked NFS4_INVALID_STATEID_TYPE when delegation is marked
NFS4ERR_DELEG_REVOKED. nfs_mark_test_expired_delegation will not mark
delegation as NFS_DELEGATION_TEST_EXPIRED again. In this case,
TEST_STATEID and FREE_STATEID will not be send to server any more.
This means that if return-delegation-procedure meet ETIMEOUT, delegation
will be in server clp->cl_revoked list forever.
On 2025/8/11 21:03, Jeff Layton wrote:
> On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
>> Recently, we meet a NFS problem in 5.10. There are so many test_state_id request after a non-privilaged request in tcpdump result. There are 40w+ delegations in client (I read the delegation list from /proc/kcore).
>> Firstly, I think state manager cost a lot in nfs_server_reap_expired_delegations. But I see they are all in NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I read this from /proc/kcore too).
>> I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and never return it again. NFS server will keep the revoked delegation in clp->cl_revoked forever. This will result in following sequence response with RECALLABLE_STATE_REVOKED flag. Client will send test_state_id request for all non-revoked delegation.
>> This can only be solved by restarting NFS server.
>> I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not the only case that cause lots of non-terminable test_state_id requests after any non-privilaged request.
>> Wish NFS experts give some advices on this problem.
>>
>
> What should happen is that the client should issue a TEST_STATEID and
> then follow up with a FREE_STATEID once it's clear that it has been
> revoked. Alternately, if the client expires then the server will purge
> any state it held at that point. The server is required to keep a
> record of these objects until one of those events occurs.
>
> v5.10 is pretty old, and there have been a number of fixes in this area
> in both the client and server over the last several years. You may want
> to try a newer kernel (or look at doing some backporting).
>
> Cheers,
Powered by blists - more mailing lists