linux-kernel - Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6D3B09C4-0E35-4A98-8C29-C2EDDBD17163@redhat.com>
Date: Fri, 13 Jun 2025 11:23:07 -0400
From: Benjamin Coddington <bcodding@...hat.com>
To: Chuck Lever <chuck.lever@...cle.com>
Cc: Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
 Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
 Tom Talpey <tom@...pey.com>, Steven Rostedt <rostedt@...dmis.org>,
 Masami Hiramatsu <mhiramat@...nel.org>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, Mike Snitzer <snitzer@...nel.org>,
 linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface

On 13 Jun 2025, at 10:56, Chuck Lever wrote:

> On 6/13/25 7:33 AM, Benjamin Coddington wrote:
>> We don't consider it acceptable to allow known defects to persist in our
>> products just because they are bleeding edge.
>
> I'm not letting this issue persist. Proper testing takes time.
>
> The patch description and discussion around this change did not include
> any information about its pervasiveness and only a little about its
> severity. I used my best judgement and followed my usual rules, which
> are:
>
> 1. Crashers, data corrupters, and security bugs with public bug reports
>    and confirmed fix effectiveness go in as quickly as we can test.
>    Note well that we have to balance the risk of introducing regressions
>    in this case, since going in quickly means the fix lacks significant
>    test experience.
>
> 1a. Rashes and bug bites require application of topical hydrocortisone.

:) no rash here, this response is very soothing.

> 2. Patches sit in nfsd-testing for at least two weeks; better if they
>    are there for four. I have CI running daily on that branch, and
>    sometimes it takes a while for a problem to surface and be noticed.
>
> 3. Patches should sit in nfsd-next or nfsd-fixes for at least as long
>    as it takes for them to matriculate into linux-next and fs-next.
>
> 4. If the patch fixes an issue that was introduced in the most recent
>    merge window, it goes in -fixes .
>
> 5. If the patch fixes an issue that is already in released kernels
>    (and we are at rule 5 because the patch does not fix an immediate
>    issue) then it goes in -next .
>
> These evidence-oriented guidelines are in place to ensure that we don't
> panic and rush commits into the kernel without careful review and
> testing. There have been plenty of times when a fix that was pushed
> urgently was not complete or even made things worse. It's a long
> pipeline on purpose.

I totally understand, thanks very much for having a set of rules and
guidelines and even more for taking the time to spell them out here.

I wanted to express that Red Hat does consider all of its releases to be
important to fix and maintain. I'd like to speak against arguments about fix
urgency based on distro versions.  I think in this case we innocently crept
into these arguments as Jeff presented evidence that the problem exists in
the wild.

> The issues with this patch were:
>
> - It was posted very late in the dev cycle for v6.16. (Jeff's urgent
>   fixes always seem to happen during -rc7 ;-)
>
> - The Fixes: tag refers to a commit that was several releases ago, and
>   I am not aware of specific reports of anyone hitting a similar issue.
>
> - IME, the adoption of enterprise distributions is slow. RHEL 10 is
>   still only on its GA release. Therefore my estimation is that the
>   number of potentially impacted customers will be small for some time,
>   enough time for us to test Jeff's fix appropriately.

While this is true, I hope we can still treat every release version equally
/if/ we make any arguments about urgency based on what's currently released
in a particular distro.  Your point is a good counter-arguement to Jeff's
assertion that the problem has been widely distributed - but it does start
to creep into a space which feels like we're treating certain early versions
of a specific distro differently and didn't sit well for me.  I'd rather not
have our upstream work or decisions appear to favor a particular distro.

> - The issue did not appear to me to be severe, but maybe I didn't read
>   the patch description carefully enough.
>
> - Although I respect, admire, and greatly appreciate the effort Jeff
>   made to nail this one, that does not mean it is a pervasive problem.
>   Jeff is quite capable of applying his own work to the kernels he and
>   his employer care about.
>
<snip>
>
> It sounds like Red Hat also does not have clear evidence that links this
> patch to a specific failure experienced by your customers. This affirms
> my understanding that this fix is defensive rather than urgent.

Also true - not yet, but there's a significant lag between customers
discovering a problem and our engineers knowing about it, and during that
lag all sorts of time, money, and reputation points are lost.

> As a rule, defensive fixes go in during merge windows.
>
>> Its a real pain that we won't have an upstream commit assigned for it.
>
> It's not reasonable for any upstream maintainer not employed by Red Hat
> to know about or cleave to Red Hat's internal processes. But, if an
> issue is on Red Hat's radar, then you are welcome to make its priority
> known to me so I can schedule fixes appropriately.

Thanks!  I realize that, which is why I spoke up.

> All that said, I've promoted the fix to nfsd-fixes, since it's narrow
> and has several weeks of test experience now.

Again, thanks!  We greatly appreciate the work you're doing.

Best,
Ben