lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240923154601.GC437832@cmpxchg.org>
Date: Mon, 23 Sep 2024 11:46:01 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Parag W <parag.lkml@...il.com>
Cc: anna-maria@...utronix.de, frederic@...nel.org,
	linux-kernel@...r.kernel.org, peterz@...radead.org,
	pmenzel@...gen.mpg.de, regressions@...ts.linux.dev,
	surenb@...gle.com, tglx@...utronix.de
Subject: Re: Error: psi: inconsistent task state! task=1:swapper/0 cpu=0
 psi_flags=4 clear=0 set=4

On Mon, Sep 23, 2024 at 08:03:39AM -0400, Parag W wrote:
> FWIW, moving psi_enqueue to be after ->enqueue_task() in
> sched/core.c made no difference - I still get the inconsistent task
> state error. psi_dequeue() is already before ->dequeue_task() in
> line with uclamp.

Yes, that isn't enough.

AFAICS, in psi want to know when a task gets dequeued from a core POV,
even if the class holds on to it until picked again. If it's later
picked and dequeued by the class, I don't think there is a possible
call into psi. Lastly, if a sched_delayed task is woken and enqueued
from core, psi wants to know - we should call psi_enqueue() after
->enqueue_task has cleared sched_delayed.

I don't think we want the ttwu_runnable() callback: since the task
hasn't been dequeued yet from a core & PSI perspective, we shouldn't
update psi states either. The sched_delayed check in psi_enqueue()
should accomplish that. Oh, but wait: ->enqueue_task() will clear
sched_delayed beforehand. We should probably filter ENQUEUE_DELAYED?

This leaves me with the below diff. But I'm still getting the double
enqueue with it applied:

[root@ham ~]# dmesg | grep -i psi
[    0.350533] psi: inconsistent task state! task=1:swapper/0 cpu=0 psi_flags=4 clear=0 set=4

Peter, what am I missing here?

---

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b6cc1cf499d6..4f036c66cf07 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2012,11 +2012,6 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 	if (!(flags & ENQUEUE_NOCLOCK))
 		update_rq_clock(rq);
 
-	if (!(flags & ENQUEUE_RESTORE)) {
-		sched_info_enqueue(rq, p);
-		psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED));
-	}
-
 	p->sched_class->enqueue_task(rq, p, flags);
 	/*
 	 * Must be after ->enqueue_task() because ENQUEUE_DELAYED can clear
@@ -2024,6 +2019,11 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 	 */
 	uclamp_rq_inc(rq, p);
 
+	if (!(flags & (ENQUEUE_RESTORE|ENQUEUE_DELAYED))) {
+		sched_info_enqueue(rq, p);
+		psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED));
+	}
+
 	if (sched_core_enabled(rq))
 		sched_core_enqueue(rq, p);
 }
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 237780aa3c53..138c52c2f2c9 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -129,6 +129,9 @@ static inline void psi_enqueue(struct task_struct *p, bool wakeup)
 	if (static_branch_likely(&psi_disabled))
 		return;
 
+	if (p->se.sched_delayed)
+		return;
+
 	if (p->in_memstall)
 		set |= TSK_MEMSTALL_RUNNING;
 
@@ -148,6 +151,9 @@ static inline void psi_dequeue(struct task_struct *p, bool sleep)
 	if (static_branch_likely(&psi_disabled))
 		return;
 
+	if (p->se.sched_delayed)
+		return;
+
 	/*
 	 * A voluntary sleep is a dequeue followed by a task switch. To
 	 * avoid walking all ancestors twice, psi_task_switch() handles


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ