lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wi8+Ecn9VJH8WYPb7BR4ECYRZGKiiWdhcCjTKZbNkbTkQ@mail.gmail.com>
Date: Fri, 11 Jul 2025 14:46:01 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Jakub Kicinski <kuba@...nel.org>, Frederic Weisbecker <frederic@...nel.org>, 
	Valentin Schneider <vschneid@...hat.com>, Nam Cao <namcao@...utronix.de>, 
	Christian Brauner <brauner@...nel.org>
Cc: Thomas Zimmermann <tzimmermann@...e.de>, Simona Vetter <simona@...ll.ch>, 
	Dave Airlie <airlied@...il.com>, davem@...emloft.net, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, pabeni@...hat.com, 
	dri-devel <dri-devel@...ts.freedesktop.org>
Subject: Re: [GIT PULL] Networking for v6.16-rc6 (follow up)

On Fri, 11 Jul 2025 at 13:35, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Indeed. It turns out that the problem actually started somewhere
> between rc4 and rc5, and all my previous bisections never even came
> close, because kernels usually work well enough that I never realized
> that it went back that far.

It looks like it's actually due to commit 8c44dac8add7 ("eventpoll:
Fix priority inversion problem"), and it's been going on for a while
now and the behavior was just too subtle for me to have noticed.

Does not look hardware-specific, except in the sense that it probably
needs several CPU's along with the odd startup pattern to trigger
this.

It's possible that the bisection ended up wrong, and when it appeared
to start going off in the weeds I was like "this is broken again", but
before I marked a kernel "good" I tested it several times, and then in
the end that "eventpoll: Fix priority inversion problem" kind of makes
sense after all.

I would never have guessed at that commit otherwise (well, considering
that I blamed both the drm code and the netlink code first, that goes
without saying), but at the same time, that *is* the kind of change
that would certainly make user space get hung up with odd timeouts.

I've only tested the previous commit being good twice now, but I'll go
back to the head of tree and try a revert to verify that this is
really it. Because maybe it's the now Nth time I found something that
hides the problem, not the real issue.

Fingers crossed that this very timing-dependent odd problem really did
bisect right finally, after many false starts.

                 Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ