lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7DCDEBE1-1416-4A93-B994-49A6D21DC065@redhat.com>
Date: Fri, 13 Jun 2025 07:33:22 -0400
From: Benjamin Coddington <bcodding@...hat.com>
To: Chuck Lever <chuck.lever@...cle.com>
Cc: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
 Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
 Tom Talpey <tom@...pey.com>, Steven Rostedt <rostedt@...dmis.org>,
 Masami Hiramatsu <mhiramat@...nel.org>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, Mike Snitzer <snitzer@...nel.org>,
 linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface

On 12 Jun 2025, at 12:42, Chuck Lever wrote:

> On 6/12/25 12:15 PM, Jeff Layton wrote:
>> On Thu, 2025-06-12 at 12:05 -0400, Chuck Lever wrote:
>>> On 6/12/25 11:57 AM, Jeff Layton wrote:
>>>> On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
>>>>> The old nfsdfs interface for starting a server with multiple pools
>>>>> handles the special case of a single entry array passed down from
>>>>> userland by distributing the threads over every NUMA node.
>>>>>
>>>>> The netlink control interface however constructs an array of length
>>>>> nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
>>>>> defeats the special casing that the old interface relies on.
>>>>>
>>>>> Change nfsd_nl_threads_set_doit() to pass down the array from userland
>>>>> as-is.
>>>>>
>>>>> Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
>>>>> Reported-by: Mike Snitzer <snitzer@...nel.org>
>>>>> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
>>>>> Signed-off-by: Jeff Layton <jlayton@...nel.org>
>>>>> ---
>>>>>  fs/nfsd/nfsctl.c | 5 ++---
>>>>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>>>> index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
>>>>> --- a/fs/nfsd/nfsctl.c
>>>>> +++ b/fs/nfsd/nfsctl.c
>>>>> @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
>>>>>   */
>>>>>  int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>>  {
>>>>> -	int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
>>>>> +	int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
>>>>>  	struct net *net = genl_info_net(info);
>>>>>  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>>>>  	const struct nlattr *attr;
>>>>> @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>>  	/* count number of SERVER_THREADS values */
>>>>>  	nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>>>  		if (nla_type(attr) == NFSD_A_SERVER_THREADS)
>>>>> -			count++;
>>>>> +			nrpools++;
>>>>>  	}
>>>>>
>>>>>  	mutex_lock(&nfsd_mutex);
>>>>>
>>>>> -	nrpools = max(count, nfsd_nrpools(net));
>>>>>  	nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
>>>>>  	if (!nthreads) {
>>>>>  		ret = -ENOMEM;
>>>>
>>>> I noticed that this didn't go in to the recent merge window.
>>>>
>>>> This patch fixes a rather nasty regression when you try to start the
>>>> server on a NUMA-capable box.
>>>
>>> The NFSD netlink interface is not broadly used yet, is it?
>>>
>>
>> It is. RHEL10 shipped with it, for instance and it's been in Fedora for
>> a while.
>
> RHEL 10 is shiny and new, and Fedora is bleeding edge. It's not likely
> either of these are deployed in production environments yet. Just sayin
> that in this case, the Bayesian filter leans towards waiting a full dev
> cycle.

We don't consider it acceptable to allow known defects to persist in our
products just because they are bleeding edge.

>>> Since this one came in late during the 6.16 dev cycle and the Fixes: tag
>>> references a commit that is already in released kernels, I put in the
>>> "next merge window" pile. On it's own it doesn't look urgent to me.
>>>
>>
>> I'd really like to see this go in soon and to stable. If you want me to
>> respin the changelog, I can. It's not a crash, but it manifests as lost
>> RPCs that just hang. It took me quite a while to figure out what was
>> going on, and I'd prefer that we not put users through that.
>
> If someone can confirm that it is effective, I'll add it to nfsd-fixes.

I'm sure it is if Jeff spent time on it.

We're going to try to get this into RHEL-10 ASAP, because dropped RPCs
manifest as datacenter-wide problems that are very hard to diagnose.  Its a
real pain that we won't have an upstream commit assigned for it.

Ben


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ