lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6e7361b4-7511-4630-9f1b-d7968cbebd41@oracle.com>
Date: Fri, 13 Jun 2025 15:00:57 -0400
From: Chuck Lever <chuck.lever@...cle.com>
To: Mike Snitzer <snitzer@...nel.org>, Jeff Layton <jlayton@...nel.org>
Cc: Neil Brown <neilb@...e.de>, Olga Kornievskaia <okorniev@...hat.com>,
        Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu
 <mhiramat@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
        linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-trace-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface

On 6/13/25 2:57 PM, Mike Snitzer wrote:
> On Thu, Jun 12, 2025 at 11:57:59AM -0400, Jeff Layton wrote:
>> On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
>>> The old nfsdfs interface for starting a server with multiple pools
>>> handles the special case of a single entry array passed down from
>>> userland by distributing the threads over every NUMA node.
>>>
>>> The netlink control interface however constructs an array of length
>>> nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
>>> defeats the special casing that the old interface relies on.
>>>
>>> Change nfsd_nl_threads_set_doit() to pass down the array from userland
>>> as-is.
>>>
>>> Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
>>> Reported-by: Mike Snitzer <snitzer@...nel.org>
>>> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
>>> Signed-off-by: Jeff Layton <jlayton@...nel.org>
>>> ---
>>>  fs/nfsd/nfsctl.c | 5 ++---
>>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>> index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
>>> --- a/fs/nfsd/nfsctl.c
>>> +++ b/fs/nfsd/nfsctl.c
>>> @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
>>>   */
>>>  int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>  {
>>> -	int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
>>> +	int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
>>>  	struct net *net = genl_info_net(info);
>>>  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>>  	const struct nlattr *attr;
>>> @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>  	/* count number of SERVER_THREADS values */
>>>  	nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>  		if (nla_type(attr) == NFSD_A_SERVER_THREADS)
>>> -			count++;
>>> +			nrpools++;
>>>  	}
>>>  
>>>  	mutex_lock(&nfsd_mutex);
>>>  
>>> -	nrpools = max(count, nfsd_nrpools(net));
>>>  	nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
>>>  	if (!nthreads) {
>>>  		ret = -ENOMEM;
>>
>> I noticed that this didn't go in to the recent merge window.
>>
>> This patch fixes a rather nasty regression when you try to start the
>> server on a NUMA-capable box. It all looks like it works, but some RPCs
>> get silently dropped on the floor (if they happen to be received into a
>> node with no threads). It took me a while to track down the problem
>> after Mike reported it.
>>
>> Can we go ahead and pull this in and send it to stable?
>>
>> Also, did this patch fix the problem for you, Mike?
> 
> Hi Jeff,
> 
> I saw your other mail asking the same, figured it best to reply to this
> thread with the patch.
> 
> YES, I just verified this patch fixes the issue I reported.  I didn't
> think I was critical path for confirming the fix, and since I had
> worked around it (by downgrading nfs-utils from EL10's 2.8.2 to EL9's
> 2.5.4 it wasn't a super quick thing for me to test.. it became
> out-of-sight-out-of-mind...
> 
> BTW, Chuck, I think the reason there aren't many/any reports (even
> with RHEL10 or Fedora users) is that the user needs to:
> 1) have a NUMA system
> 2) explicitly change sunrpc's default for pool_mode from global to pernode.

Not a very common thing to do, IME.


> Anyway:
> 
> Tested-by: Mike Snitzer <snitzer@...nel.org>

Tag applied, thanks.


-- 
Chuck Lever

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ