lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250801072946.nTiUlMwS@linutronix.de>
Date: Fri, 1 Aug 2025 09:29:46 +0200
From: Nam Cao <namcao@...utronix.de>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Gabriele Monaco <gmonaco@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH 4/5] sched: Add rt task enqueue/dequeue trace points

On Fri, Aug 01, 2025 at 09:12:08AM +0530, K Prateek Nayak wrote:
> Just thinking out loud, putting this tracepoint here can lead to a
> "dequeued -> dequeued" transition for fair task when they are in delayed
> dequeue state.
> 
>     dequeue_task(p)
>       trace_dequeue_task_tp(p) # First time
>       dequeue_task_fair(p)
>         p->se.delayed = 1
>     ...
>     <sched_switch> # p is still delayed
>     ...
>     sched_setscheduler(p)
>       if (prev_class != next_class && p->se.sched_delayed)
>         dequeue_task(p, DEQUEUE_DELAYED);
>           trace_dequeue_task_tp(p) # Second time
> 
> It is not an issue as such but it might come as a surprise if users are
> expecting a behavior like below which would be the case for !fair task
> currently (and for all tasks before v6.12):
> 
>     digraph state_automaton {
>         center = true;
>         size = "7,11";
>         {node [shape = plaintext, style=invis, label=""] "__init_enqueue_dequeue_cycle"};
>         {node [shape = ellipse] "enqueued"};
>         {node [shape = ellipse] "dequeued"};
>         "__init_enqueue_dequeue_cycle" -> "enqueued";
>         "__init_enqueue_dequeue_cycle" -> "dequeued";
>         "enqueued" [label = "enqueued", color = green3];
>         "enqueued" -> "dequeued" [ label = "dequeue_task" ];
>         "dequeued" [label = "dequeued", color = red];
>         "dequeued" -> "enqueued" [ label = "enqueue_task" ];
>         { rank = min ;
>             "__init_enqueue_dequeue_cycle";
>             "dequeued";
>             "enqueued";
>         }
>     }
> 
> 
> Another:
> 
>    "dequeued" -> "dequeued" [ label = "dequeue_task" ];
> 
> edge would be needed in that case for >= v6.12. It is probably nothing
> and can be easily handled by the users if they run into it but just
> putting it out there for the record in case you only want to consider a
> complete dequeue as "dequeued". Feel free to ignore since I'm completely
> out of my depth when it comes to the usage of RV in the field :)

Ah, thanks for pointing this out. I do want to only consider complete
dequeue as "dequeued".

These tracepoints are not visible from userspace, and RV does not care
about enqueue/dequeue of fair tasks at the moment, so it is not a problem
for now. But as a precaution, I trust the below patch will do.

Nam

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index c38f12f7f903..b50668052f99 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -906,6 +906,14 @@ DECLARE_TRACE(dequeue_task_rt,
 	TP_PROTO(int cpu, struct task_struct *task),
 	TP_ARGS(cpu, task));
 
+DECLARE_TRACE(enqueue_task,
+	TP_PROTO(int cpu, struct task_struct *task),
+	TP_ARGS(cpu, task));
+
+DECLARE_TRACE(dequeue_task,
+	TP_PROTO(int cpu, struct task_struct *task),
+	TP_ARGS(cpu, task));
+
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b485e0639616..553c08a63395 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2077,6 +2077,8 @@ unsigned long get_wchan(struct task_struct *p)
 
 void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 {
+	trace_enqueue_task_tp(rq->cpu, p);
+
 	if (!(flags & ENQUEUE_NOCLOCK))
 		update_rq_clock(rq);
 
@@ -2119,7 +2121,11 @@ inline bool dequeue_task(struct rq *rq, struct task_struct *p, int flags)
 	 * and mark the task ->sched_delayed.
 	 */
 	uclamp_rq_dec(rq, p);
-	return p->sched_class->dequeue_task(rq, p, flags);
+	if (p->sched_class->dequeue_task(rq, p, flags)) {
+		trace_dequeue_task_tp(rq->cpu, p);
+		return true;
+	}
+	return false;
 }
 
 void activate_task(struct rq *rq, struct task_struct *p, int flags)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ