[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241001012153.GC1349@cmpxchg.org>
Date: Mon, 30 Sep 2024 21:21:53 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Parag W <parag.lkml@...il.com>
Cc: anna-maria@...utronix.de, frederic@...nel.org,
linux-kernel@...r.kernel.org, peterz@...radead.org,
pmenzel@...gen.mpg.de, regressions@...ts.linux.dev,
surenb@...gle.com, tglx@...utronix.de
Subject: Re: Error: psi: inconsistent task state! task=1:swapper/0 cpu=0
psi_flags=4 clear=0 set=4
On Mon, Sep 23, 2024 at 11:46:08AM -0400, Johannes Weiner wrote:
> On Mon, Sep 23, 2024 at 08:03:39AM -0400, Parag W wrote:
> > FWIW, moving psi_enqueue to be after ->enqueue_task() in
> > sched/core.c made no difference - I still get the inconsistent task
> > state error. psi_dequeue() is already before ->dequeue_task() in
> > line with uclamp.
>
> Yes, that isn't enough.
>
> AFAICS, in psi want to know when a task gets dequeued from a core POV,
> even if the class holds on to it until picked again. If it's later
> picked and dequeued by the class, I don't think there is a possible
> call into psi. Lastly, if a sched_delayed task is woken and enqueued
> from core, psi wants to know - we should call psi_enqueue() after
> ->enqueue_task has cleared sched_delayed.
>
> I don't think we want the ttwu_runnable() callback: since the task
> hasn't been dequeued yet from a core & PSI perspective, we shouldn't
> update psi states either. The sched_delayed check in psi_enqueue()
> should accomplish that. Oh, but wait: ->enqueue_task() will clear
> sched_delayed beforehand. We should probably filter ENQUEUE_DELAYED?
>
> This leaves me with the below diff. But I'm still getting the double
> enqueue with it applied:
>
> [root@ham ~]# dmesg | grep -i psi
> [ 0.350533] psi: inconsistent task state! task=1:swapper/0 cpu=0 psi_flags=4 clear=0 set=4
>
> Peter, what am I missing here?
Peter, any thoughts on this? It appears to be a regression caused by
152e11f6df293e816a6a37c69757033cdc72667d.
It's not just the warning in dmesg. The task state corruption causes a
permanent CPU pressure indication, which messes with workload/machine
health monitoring.
Powered by blists - more mailing lists