lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BF48C6D1-ED2E-4B9C-A833-FF48D9ACC044@redhat.com>
Date: Tue, 02 Sep 2025 08:43:36 -0400
From: Benjamin Coddington <bcodding@...hat.com>
To: Li Lingfeng <lilingfeng3@...wei.com>
Cc: Jeff Layton <jlayton@...nel.org>, chuck.lever@...cle.com, neil@...wn.name,
 okorniev@...hat.com, Dai.Ngo@...cle.com, tom@...pey.com,
 linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
 yukuai1@...weicloud.com, houtao1@...wei.com, yi.zhang@...wei.com,
 yangerkun@...wei.com, lilingfeng@...weicloud.com, zhangjian496@...wei.com
Subject: Re: [PATCH] nfsd: remove long-standing revoked delegations by force

On 2 Sep 2025, at 8:10, Li Lingfeng wrote:

> Our expected outcome was that the client would release the abnormal
> delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
> However, this problematic delegation is no longer present in the
> client's server->delegations list—whether due to client-side timeouts or
> the server-side bug [1].

How does the client timeout TEST_STATEID - are you mounting with 'soft'?

We should find the server-side bug and fix it rather than write code to
paper over it.  I do think the synchronization of state here is a bit
fragile and wish the protocol had a generation, sequence, or marker for
setting SEQ4_STATUS_ bits..

>>
>> Should we instead just administratively evict the client since it's
>> clearly not behaving right in this case?
> Thanks for the suggestion. While administratively evicting the client would
> certainly resolve the immediate delegation issue, I'm concerned that approach
> might be a bit heavy-handed.
> The problematic behavior seems isolated to a single delegation. Meanwhile,
> the client itself likely has numerous other open files and active state on
> the server. Forcing a complete client reconnect would tear down all that
> state, which could cause significant application disruption and be perceived
> as a service outage from the client's perspective.
>
> [1] https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@huawei.com/

^^ in this thread you reference v5.10 - there was a knfsd fix for a
cl_revoked leak "3b816601e279", and there have been 3 or 4 fixes to fix
problems and optimize the client walk of delegations since then.  Jeff
pointed out that there have been fixes in these areas.  Are you finding this
problem still with all those fixes included?

Ben


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ