linux-kernel - Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64c216db-57c8-4486-bd40-1d6135478487@oracle.com>
Date: Fri, 13 Jun 2025 11:38:43 -0400
From: Chuck Lever <chuck.lever@...cle.com>
To: Benjamin Coddington <bcodding@...hat.com>
Cc: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
        Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
        Tom Talpey <tom@...pey.com>, Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
        Mike Snitzer <snitzer@...nel.org>, linux-nfs@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
        netdev@...r.kernel.org
Subject: Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface

On 6/13/25 11:23 AM, Benjamin Coddington wrote:
> On 13 Jun 2025, at 10:56, Chuck Lever wrote:
> 
>> On 6/13/25 7:33 AM, Benjamin Coddington wrote:
>>> We don't consider it acceptable to allow known defects to persist in our
>>> products just because they are bleeding edge.
>>
>> I'm not letting this issue persist. Proper testing takes time.
>>
>> The patch description and discussion around this change did not include
>> any information about its pervasiveness and only a little about its
>> severity. I used my best judgement and followed my usual rules, which
>> are:
>>
>> 1. Crashers, data corrupters, and security bugs with public bug reports
>>    and confirmed fix effectiveness go in as quickly as we can test.
>>    Note well that we have to balance the risk of introducing regressions
>>    in this case, since going in quickly means the fix lacks significant
>>    test experience.
>>
>> 1a. Rashes and bug bites require application of topical hydrocortisone.
> 
> :) no rash here, this response is very soothing.
> 
>> 2. Patches sit in nfsd-testing for at least two weeks; better if they
>>    are there for four. I have CI running daily on that branch, and
>>    sometimes it takes a while for a problem to surface and be noticed.
>>
>> 3. Patches should sit in nfsd-next or nfsd-fixes for at least as long
>>    as it takes for them to matriculate into linux-next and fs-next.
>>
>> 4. If the patch fixes an issue that was introduced in the most recent
>>    merge window, it goes in -fixes .
>>
>> 5. If the patch fixes an issue that is already in released kernels
>>    (and we are at rule 5 because the patch does not fix an immediate
>>    issue) then it goes in -next .
>>
>> These evidence-oriented guidelines are in place to ensure that we don't
>> panic and rush commits into the kernel without careful review and
>> testing. There have been plenty of times when a fix that was pushed
>> urgently was not complete or even made things worse. It's a long
>> pipeline on purpose.
> 
> I totally understand, thanks very much for having a set of rules and
> guidelines and even more for taking the time to spell them out here.

Apologies for the length. I wanted to get these out in the open just
so you and others can slap me with a clue bat if I'm doing something
vastly strange or inappropriate.


> I wanted to express that Red Hat does consider all of its releases to be
> important to fix and maintain. I'd like to speak against arguments about fix
> urgency based on distro versions.  I think in this case we innocently crept
> into these arguments as Jeff presented evidence that the problem exists in
> the wild.

I was estimating pervasiveness based on the position of the RHEL 10
distro in its life cycle, nothing more.


>> The issues with this patch were:
>>
>> - It was posted very late in the dev cycle for v6.16. (Jeff's urgent
>>   fixes always seem to happen during -rc7 ;-)
>>
>> - The Fixes: tag refers to a commit that was several releases ago, and
>>   I am not aware of specific reports of anyone hitting a similar issue.
>>
>> - IME, the adoption of enterprise distributions is slow. RHEL 10 is
>>   still only on its GA release. Therefore my estimation is that the
>>   number of potentially impacted customers will be small for some time,
>>   enough time for us to test Jeff's fix appropriately.
> 
> While this is true, I hope we can still treat every release version equally
> /if/ we make any arguments about urgency based on what's currently released
> in a particular distro.  Your point is a good counter-arguement to Jeff's
> assertion that the problem has been widely distributed - but it does start
> to creep into a space which feels like we're treating certain early versions
> of a specific distro differently and didn't sit well for me.  I'd rather not
> have our upstream work or decisions appear to favor a particular distro.

Understood. I hope I convinced you that I was merely making an evidence-
based estimation about the pervasiveness of any problem this patch might
have been attempting to address.

The shorthand term "bleeding edge" was not intended to be disrespectful,
only descriptive.


-- 
Chuck Lever