[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64c216db-57c8-4486-bd40-1d6135478487@oracle.com>
Date: Fri, 13 Jun 2025 11:38:43 -0400
From: Chuck Lever <chuck.lever@...cle.com>
To: Benjamin Coddington <bcodding@...hat.com>
Cc: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
Tom Talpey <tom@...pey.com>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Mike Snitzer <snitzer@...nel.org>, linux-nfs@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface
On 6/13/25 11:23 AM, Benjamin Coddington wrote:
> On 13 Jun 2025, at 10:56, Chuck Lever wrote:
>
>> On 6/13/25 7:33 AM, Benjamin Coddington wrote:
>>> We don't consider it acceptable to allow known defects to persist in our
>>> products just because they are bleeding edge.
>>
>> I'm not letting this issue persist. Proper testing takes time.
>>
>> The patch description and discussion around this change did not include
>> any information about its pervasiveness and only a little about its
>> severity. I used my best judgement and followed my usual rules, which
>> are:
>>
>> 1. Crashers, data corrupters, and security bugs with public bug reports
>> and confirmed fix effectiveness go in as quickly as we can test.
>> Note well that we have to balance the risk of introducing regressions
>> in this case, since going in quickly means the fix lacks significant
>> test experience.
>>
>> 1a. Rashes and bug bites require application of topical hydrocortisone.
>
> :) no rash here, this response is very soothing.
>
>> 2. Patches sit in nfsd-testing for at least two weeks; better if they
>> are there for four. I have CI running daily on that branch, and
>> sometimes it takes a while for a problem to surface and be noticed.
>>
>> 3. Patches should sit in nfsd-next or nfsd-fixes for at least as long
>> as it takes for them to matriculate into linux-next and fs-next.
>>
>> 4. If the patch fixes an issue that was introduced in the most recent
>> merge window, it goes in -fixes .
>>
>> 5. If the patch fixes an issue that is already in released kernels
>> (and we are at rule 5 because the patch does not fix an immediate
>> issue) then it goes in -next .
>>
>> These evidence-oriented guidelines are in place to ensure that we don't
>> panic and rush commits into the kernel without careful review and
>> testing. There have been plenty of times when a fix that was pushed
>> urgently was not complete or even made things worse. It's a long
>> pipeline on purpose.
>
> I totally understand, thanks very much for having a set of rules and
> guidelines and even more for taking the time to spell them out here.
Apologies for the length. I wanted to get these out in the open just
so you and others can slap me with a clue bat if I'm doing something
vastly strange or inappropriate.
> I wanted to express that Red Hat does consider all of its releases to be
> important to fix and maintain. I'd like to speak against arguments about fix
> urgency based on distro versions. I think in this case we innocently crept
> into these arguments as Jeff presented evidence that the problem exists in
> the wild.
I was estimating pervasiveness based on the position of the RHEL 10
distro in its life cycle, nothing more.
>> The issues with this patch were:
>>
>> - It was posted very late in the dev cycle for v6.16. (Jeff's urgent
>> fixes always seem to happen during -rc7 ;-)
>>
>> - The Fixes: tag refers to a commit that was several releases ago, and
>> I am not aware of specific reports of anyone hitting a similar issue.
>>
>> - IME, the adoption of enterprise distributions is slow. RHEL 10 is
>> still only on its GA release. Therefore my estimation is that the
>> number of potentially impacted customers will be small for some time,
>> enough time for us to test Jeff's fix appropriately.
>
> While this is true, I hope we can still treat every release version equally
> /if/ we make any arguments about urgency based on what's currently released
> in a particular distro. Your point is a good counter-arguement to Jeff's
> assertion that the problem has been widely distributed - but it does start
> to creep into a space which feels like we're treating certain early versions
> of a specific distro differently and didn't sit well for me. I'd rather not
> have our upstream work or decisions appear to favor a particular distro.
Understood. I hope I convinced you that I was merely making an evidence-
based estimation about the pervasiveness of any problem this patch might
have been attempting to address.
The shorthand term "bleeding edge" was not intended to be disrespectful,
only descriptive.
--
Chuck Lever
Powered by blists - more mailing lists