[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <df0a6fc5-6ef9-44b6-b6c2-e3cb4a2d1512@oracle.com>
Date: Fri, 13 Jun 2025 10:56:42 -0400
From: Chuck Lever <chuck.lever@...cle.com>
To: Benjamin Coddington <bcodding@...hat.com>
Cc: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
Tom Talpey <tom@...pey.com>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Mike Snitzer <snitzer@...nel.org>, linux-nfs@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
On 6/13/25 7:33 AM, Benjamin Coddington wrote:
> On 12 Jun 2025, at 12:42, Chuck Lever wrote:
>
>> On 6/12/25 12:15 PM, Jeff Layton wrote:
>>> On Thu, 2025-06-12 at 12:05 -0400, Chuck Lever wrote:
>>>> On 6/12/25 11:57 AM, Jeff Layton wrote:
>>>>> On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
>>>>>> The old nfsdfs interface for starting a server with multiple pools
>>>>>> handles the special case of a single entry array passed down from
>>>>>> userland by distributing the threads over every NUMA node.
>>>>>>
>>>>>> The netlink control interface however constructs an array of length
>>>>>> nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
>>>>>> defeats the special casing that the old interface relies on.
>>>>>>
>>>>>> Change nfsd_nl_threads_set_doit() to pass down the array from userland
>>>>>> as-is.
>>>>>>
>>>>>> Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
>>>>>> Reported-by: Mike Snitzer <snitzer@...nel.org>
>>>>>> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
>>>>>> Signed-off-by: Jeff Layton <jlayton@...nel.org>
>>>>>> ---
>>>>>> fs/nfsd/nfsctl.c | 5 ++---
>>>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>>>>> index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
>>>>>> --- a/fs/nfsd/nfsctl.c
>>>>>> +++ b/fs/nfsd/nfsctl.c
>>>>>> @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
>>>>>> */
>>>>>> int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>>> {
>>>>>> - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
>>>>>> + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
>>>>>> struct net *net = genl_info_net(info);
>>>>>> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>>>>> const struct nlattr *attr;
>>>>>> @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>>> /* count number of SERVER_THREADS values */
>>>>>> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>>>> if (nla_type(attr) == NFSD_A_SERVER_THREADS)
>>>>>> - count++;
>>>>>> + nrpools++;
>>>>>> }
>>>>>>
>>>>>> mutex_lock(&nfsd_mutex);
>>>>>>
>>>>>> - nrpools = max(count, nfsd_nrpools(net));
>>>>>> nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
>>>>>> if (!nthreads) {
>>>>>> ret = -ENOMEM;
>>>>>
>>>>> I noticed that this didn't go in to the recent merge window.
>>>>>
>>>>> This patch fixes a rather nasty regression when you try to start the
>>>>> server on a NUMA-capable box.
>>>>
>>>> The NFSD netlink interface is not broadly used yet, is it?
>>>>
>>>
>>> It is. RHEL10 shipped with it, for instance and it's been in Fedora for
>>> a while.
>>
>> RHEL 10 is shiny and new, and Fedora is bleeding edge. It's not likely
>> either of these are deployed in production environments yet. Just sayin
>> that in this case, the Bayesian filter leans towards waiting a full dev
>> cycle.
>
> We don't consider it acceptable to allow known defects to persist in our
> products just because they are bleeding edge.
I'm not letting this issue persist. Proper testing takes time.
The patch description and discussion around this change did not include
any information about its pervasiveness and only a little about its
severity. I used my best judgement and followed my usual rules, which
are:
1. Crashers, data corrupters, and security bugs with public bug reports
and confirmed fix effectiveness go in as quickly as we can test.
Note well that we have to balance the risk of introducing regressions
in this case, since going in quickly means the fix lacks significant
test experience.
1a. Rashes and bug bites require application of topical hydrocortisone.
2. Patches sit in nfsd-testing for at least two weeks; better if they
are there for four. I have CI running daily on that branch, and
sometimes it takes a while for a problem to surface and be noticed.
3. Patches should sit in nfsd-next or nfsd-fixes for at least as long
as it takes for them to matriculate into linux-next and fs-next.
4. If the patch fixes an issue that was introduced in the most recent
merge window, it goes in -fixes .
5. If the patch fixes an issue that is already in released kernels
(and we are at rule 5 because the patch does not fix an immediate
issue) then it goes in -next .
These evidence-oriented guidelines are in place to ensure that we don't
panic and rush commits into the kernel without careful review and
testing. There have been plenty of times when a fix that was pushed
urgently was not complete or even made things worse. It's a long
pipeline on purpose.
The issues with this patch were:
- It was posted very late in the dev cycle for v6.16. (Jeff's urgent
fixes always seem to happen during -rc7 ;-)
- The Fixes: tag refers to a commit that was several releases ago, and
I am not aware of specific reports of anyone hitting a similar issue.
- IME, the adoption of enterprise distributions is slow. RHEL 10 is
still only on its GA release. Therefore my estimation is that the
number of potentially impacted customers will be small for some time,
enough time for us to test Jeff's fix appropriately.
- The issue did not appear to me to be severe, but maybe I didn't read
the patch description carefully enough.
- Although I respect, admire, and greatly appreciate the effort Jeff
made to nail this one, that does not mean it is a pervasive problem.
Jeff is quite capable of applying his own work to the kernels he and
his employer care about.
>>>> Since this one came in late during the 6.16 dev cycle and the Fixes: tag
>>>> references a commit that is already in released kernels, I put in the
>>>> "next merge window" pile. On it's own it doesn't look urgent to me.
>>>>
>>>
>>> I'd really like to see this go in soon and to stable. If you want me to
>>> respin the changelog, I can. It's not a crash, but it manifests as lost
>>> RPCs that just hang. It took me quite a while to figure out what was
>>> going on, and I'd prefer that we not put users through that.
>>
>> If someone can confirm that it is effective, I'll add it to nfsd-fixes.
>
> I'm sure it is if Jeff spent time on it.
>
> We're going to try to get this into RHEL-10 ASAP, because dropped RPCs
> manifest as datacenter-wide problems that are very hard to diagnose.
It sounds like Red Hat also does not have clear evidence that links this
patch to a specific failure experienced by your customers. This affirms
my understanding that this fix is defensive rather than urgent.
As a rule, defensive fixes go in during merge windows.
> Its a
> real pain that we won't have an upstream commit assigned for it.
It's not reasonable for any upstream maintainer not employed by Red Hat
to know about or cleave to Red Hat's internal processes. But, if an
issue is on Red Hat's radar, then you are welcome to make its priority
known to me so I can schedule fixes appropriately.
All that said, I've promoted the fix to nfsd-fixes, since it's narrow
and has several weeks of test experience now.
--
Chuck Lever
Powered by blists - more mailing lists