linux-kernel - Re: [GIT PULL] Scheduler updates for v6.17

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wg7Ad6zjs8QdgDkS-8oJD2EbLK2Ne-WRo36ZXVHa=hmWw@mail.gmail.com>
Date: Sat, 2 Aug 2025 11:43:40 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Ingo Molnar <mingo@...nel.org>
Cc: linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>, Tejun Heo <tj@...nel.org>, 
	Valentin Schneider <vschneid@...hat.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>
Subject: Re: [GIT PULL] Scheduler updates for v6.17

On Wed, 30 Jul 2025 at 20:31, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Sun, 27 Jul 2025 at 23:48, Ingo Molnar <mingo@...nel.org> wrote:
> >
> > PSI:
> >
> >  - Improve scalability by optimizing psi_group_change() cpu_clock() usage
> >    (Peter Zijlstra)
>
> I suspect this is buggy.
>
> Maybe this is coincidence, but that sounds very unlikely:
>
>   watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:3:7996]
>   CPU#0 Utilization every 4s during lockup:

Happened again this morning, and as far as I can tell the machine was
just sitting there idle at the desktop.

I've only seen this on my laptop, so maybe it's some hw dependency,
but it *really* smells like commit 570c8efd5eb7 ("sched/psi: Optimize
psi_group_change() cpu_clock() usage") from the symptoms. It's
literally hanging on that psi_read_begin(), which is that
read_seqcount_begin() on that new per-cpu psi_seq counter.

Now, I'm not seeing how it could possibly trigger - I looked through
all the psi_write_begin() users, and they all *seem* to be (a) under
rq_lock_irq and (b) paired with a psi_write_end() with the same cpu.

But the symptoms have been very consistent both times it happened: the
RIP always a watchdog in collect_percpu_times(), always at that
'pause' in the "wait for seqcount to be even".

It's typically been in that psi_avgs_work kworker, but once it was
systemd-oomd that apparently had done a "read()" on it, so it went
through "psi_show()" instead.

Now, the *writers* all take the proper locks, but the readers don't.
And my laptop has CONFIG_PREMPT_VOLUNTARY in its config (random old
setting).

I'm not seeing why that would matter, since the seq count should
become even at some point, but it does mean that the seqcount read
loop looks like it's an endless kernel loop when it triggers. I don't
see how that would make a difference, since the seqcount should become
even on the writer side and the writers shouldn't be preempted and get
some kind of priority inversion with a reader that doesn't go away,
but *if* there is some bug in this area, maybe that config is why I'm
seeing it and others aren't?

Any ideas, people?

              Linus