lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCrYbL+E9M1vYf-JQHLWohs+6YEQDz=9DKx62ed=fFj2NA@mail.gmail.com>
Date: Mon, 17 Nov 2025 16:45:22 -0800
From: John Stultz <jstultz@...gle.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Johannes Weiner <hannes@...xchg.org>, Suren Baghdasaryan <surenb@...gle.com>, linux-kernel@...r.kernel.org, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>
Subject: Re: [RFC PATCH 0/5] sched/psi: Fix PSI accounting with proxy execution

On Mon, Nov 17, 2025 at 10:56 AM K Prateek Nayak <kprateek.nayak@....com> wrote:
>
> When booting into a kernel with CONFIG_SCHED_PROXY_EXEC and CONFIG_PSI,
> a inconsistent task state warning was noticed soon after the boot
> similar to:
>
>     psi: inconsistent task state! task=... cpu=... psi_flags=4 clear=0 set=4
>
> On analysis, the following sequence of event was found to be the cause
> of the splat:
>
> o Blocked task is retained on the runqueue.
> o psi_sched_switch() sees task_on_rq_queued() and retains the runnable
>   signals for the task.
> o Tasks blocks later via proxy_deactivate() but psi_dequeue() doesn't
>   adjust the PSI flags since DEQUEUE_SLEEP is set expecting
>   psi_sched_switch() to fix the signals.
> o The blocked task is woken up with the PSI state still reflecting that
>   the task is runnable (TSK_RUNNING) leading to the splat.

Hey, K Prateek!
  Thanks for chasing this down and sending this series out!

I'm still getting my head around the description above (its been
awhile since I last looked at the PSI code), but early on I often hit
PSI splats, and I thought I had addressed it with the patch here:
  https://github.com/johnstultz-work/linux-dev/commit/f60923a6176b3778a8fc9b9b0bbe4953153ce565

And with that I've not run across any warnings since.

Now, I hadn't tripped over the issue recently with the subset of the
full series I've been pushing upstream, and as I most easily ran into
it with the sleeping owner enqueuing feature I was holding the fix
back for those changes. But I realize unfortunately CONFIG_PSI at some
point got disabled in my test defconfig, so I've not had the
opportunity to trip it, and sure enough I can trivially see it booting
with the current upstream code.

Applying that fix does seem to avoid the warnings in my trivial
testing, but again I've not dug through the logic in awhile, so you
may have a better sense of the inadequacies of that fix.

If it looks reasonable to you, I'll rework the commit message so it
isn't so focused on the sleeping-owner-enquing case and submit it.

I'll have to spend some time here looking more at your proposed
solution. On the initial glance, I do fret a little with the
task->sched_proxy bit overlapping a bit in meaning with the
task->blocked_on value.

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ