linux-kernel - Re: Error: psi: inconsistent task state! task=1:swapper/0 cpu=0 psi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20241001012153.GC1349@cmpxchg.org>
Date: Mon, 30 Sep 2024 21:21:53 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Parag W <parag.lkml@...il.com>
Cc: anna-maria@...utronix.de, frederic@...nel.org,
	linux-kernel@...r.kernel.org, peterz@...radead.org,
	pmenzel@...gen.mpg.de, regressions@...ts.linux.dev,
	surenb@...gle.com, tglx@...utronix.de
Subject: Re: Error: psi: inconsistent task state! task=1:swapper/0 cpu=0
 psi_flags=4 clear=0 set=4

On Mon, Sep 23, 2024 at 11:46:08AM -0400, Johannes Weiner wrote:
> On Mon, Sep 23, 2024 at 08:03:39AM -0400, Parag W wrote:
> > FWIW, moving psi_enqueue to be after ->enqueue_task() in
> > sched/core.c made no difference - I still get the inconsistent task
> > state error. psi_dequeue() is already before ->dequeue_task() in
> > line with uclamp.
> 
> Yes, that isn't enough.
> 
> AFAICS, in psi want to know when a task gets dequeued from a core POV,
> even if the class holds on to it until picked again. If it's later
> picked and dequeued by the class, I don't think there is a possible
> call into psi. Lastly, if a sched_delayed task is woken and enqueued
> from core, psi wants to know - we should call psi_enqueue() after
> ->enqueue_task has cleared sched_delayed.
> 
> I don't think we want the ttwu_runnable() callback: since the task
> hasn't been dequeued yet from a core & PSI perspective, we shouldn't
> update psi states either. The sched_delayed check in psi_enqueue()
> should accomplish that. Oh, but wait: ->enqueue_task() will clear
> sched_delayed beforehand. We should probably filter ENQUEUE_DELAYED?
> 
> This leaves me with the below diff. But I'm still getting the double
> enqueue with it applied:
> 
> [root@ham ~]# dmesg | grep -i psi
> [    0.350533] psi: inconsistent task state! task=1:swapper/0 cpu=0 psi_flags=4 clear=0 set=4
> 
> Peter, what am I missing here?

Peter, any thoughts on this? It appears to be a regression caused by
152e11f6df293e816a6a37c69757033cdc72667d.

It's not just the warning in dmesg. The task state corruption causes a
permanent CPU pressure indication, which messes with workload/machine
health monitoring.