[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <efc327e3-5956-4c61-bca5-e41f1e7c3e78@huaweicloud.com>
Date: Wed, 3 Sep 2025 14:45:40 +0800
From: Li Lingfeng <lilingfeng@...weicloud.com>
To: "zhangjian (CG)" <zhangjian496@...wei.com>,
Benjamin Coddington <bcodding@...hat.com>,
Li Lingfeng <lilingfeng3@...wei.com>
Cc: Jeff Layton <jlayton@...nel.org>, chuck.lever@...cle.com,
neil@...wn.name, okorniev@...hat.com, Dai.Ngo@...cle.com, tom@...pey.com,
linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
yukuai1@...weicloud.com, houtao1@...wei.com, yi.zhang@...wei.com,
yangerkun@...wei.com
Subject: Re: [PATCH] nfsd: remove long-standing revoked delegations by force
Hi,
在 2025/9/3 11:46, zhangjian (CG) 写道:
> Hello every experts.
>
> If we can see all delegations on hard-mounted nfs client, which are also
> on server cl_revoked list, changed from
> NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED|NFS_DELEGATION_TEST_EXPIRED
> to NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED, can we give
> some hypothesis on this problem ?
>
> By the way, this problem can be cover over by decreasing file count on
> server.
>
> Thanks,
> zhangjian
I think NFS_DELEGATION_TEST_EXPIRED is cleared as follows:
nfs4_state_manager
nfs4_do_reclaim
nfs4_reclaim_open_state
__nfs4_reclaim_open_state // get nfs4_state from sp->so_states
nfs41_open_expired // status = ops->recover_open
nfs41_check_delegation_stateid
test_and_clear_bit // NFS_DELEGATION_TEST_EXPIRED
After the bug in [1] is triggered, although the delegation is no longer on
server->delegations, it can still be obtained by traversing sp->so_states.
However, I cannot find the connection between the number of files on the
server and this issue.
Thanks,
Lingfeng
>
> On 2025/9/2 20:43, Benjamin Coddington wrote:
>> On 2 Sep 2025, at 8:10, Li Lingfeng wrote:
>>
>>> Our expected outcome was that the client would release the abnormal
>>> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
>>> However, this problematic delegation is no longer present in the
>>> client's server->delegations list—whether due to client-side timeouts or
>>> the server-side bug [1].
>> How does the client timeout TEST_STATEID - are you mounting with 'soft'?
>>
>> We should find the server-side bug and fix it rather than write code to
>> paper over it. I do think the synchronization of state here is a bit
>> fragile and wish the protocol had a generation, sequence, or marker for
>> setting SEQ4_STATUS_ bits..
>>
>>>> Should we instead just administratively evict the client since it's
>>>> clearly not behaving right in this case?
>>> Thanks for the suggestion. While administratively evicting the client would
>>> certainly resolve the immediate delegation issue, I'm concerned that approach
>>> might be a bit heavy-handed.
>>> The problematic behavior seems isolated to a single delegation. Meanwhile,
>>> the client itself likely has numerous other open files and active state on
>>> the server. Forcing a complete client reconnect would tear down all that
>>> state, which could cause significant application disruption and be perceived
>>> as a service outage from the client's perspective.
>>>
>>> [1] https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/
>> ^^ in this thread you reference v5.10 - there was a knfsd fix for a
>> cl_revoked leak "3b816601e279", and there have been 3 or 4 fixes to fix
>> problems and optimize the client walk of delegations since then. Jeff
>> pointed out that there have been fixes in these areas. Are you finding this
>> problem still with all those fixes included?
>>
>> Ben
>>
>>
Powered by blists - more mailing lists