linux-kernel - Re: [PATCH 4/5] sched: Add rt task enqueue/dequeue trace points

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c403aacc94dc09aa9ad4441be6095d00b2091f68.camel@redhat.com>
Date: Fri, 01 Aug 2025 13:04:48 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: K Prateek Nayak <kprateek.nayak@....com>, Nam Cao <namcao@...utronix.de>
Cc: Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu
 <mhiramat@...nel.org>,  Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org,  Ingo
 Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri
 Lelli <juri.lelli@...hat.com>,  Vincent Guittot
 <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Ben Segall	 <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin
 Schneider	 <vschneid@...hat.com>
Subject: Re: [PATCH 4/5] sched: Add rt task enqueue/dequeue trace points

On Fri, 2025-08-01 at 15:26 +0530, K Prateek Nayak wrote:
> There are a few more cases with delayed dequeue:
> 
> 1. A delayed task can be reenqueued before it is completely dequeued
> and
>    will lead to a enqueue -> enqueue transition if we don't trace the
>    first dequeue.
> 
> 2. There are cases in set_user_nice() and __sched_setscheduler()
> where
>    a delayed task is dequeued in delayed state and be put back in the
>    cfs_rq in delayed state - an attempt to handle 1. can trip this.
> 
> Best I could think of is the below diff on top of your suggestion
> where
> a "delayed -> reenqueue" is skipped but the case 2. is captures in
> case
> one needs to inspect some properties from set_user_nice() /
> __sched_setscheduler():
> 
> (only build tested on top of the diff you had pasted)
> 

Hello Prateek,

thanks for the comments, this looks much more convoluted than I would
have expected.
As Nam pointed out, currently RV is not going to rely on those events
for fair tasks (existing monitors are fine with tracepoints like
wakeup/set_state/switch).

For the time being it might be better just add the tracepoints in the
RT enqueue/dequeue (the only needed for this series) and not complicate
things.

At most we may want to keep tracepoint names general, allowing the
tracing call to be added later to other locations (or moved to a
general one) without changing too much, besides existing users.
And perhaps a comment saying the tracepoints are currently only
supported on RT would do.

Anyway, that's your call Nam, I'm fine with your initial proposal as
well, unless some scheduler guys complain.

Thanks,
Gabriele

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9598984bee8d..1fc5a97bba6b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2071,7 +2071,8 @@ unsigned long get_wchan(struct task_struct *p)
>  
>  void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
>  {
> -	trace_enqueue_task_tp(rq->cpu, p);
> +	if (!p->se.sched_delayed || !move_entity(flags))
> +		trace_enqueue_task_tp(rq->cpu, p);
>  
>  	if (!(flags & ENQUEUE_NOCLOCK))
>  		update_rq_clock(rq);
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b173a059315c..1e2a636d6804 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5444,7 +5444,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct
> sched_entity *se, int flags)
>  	 * put back on, and if we advance min_vruntime, we'll be
> placed back
>  	 * further than we started -- i.e. we'll be penalized.
>  	 */
> -	if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) != DEQUEUE_SAVE)
> +	if (move_entity(flags))
>  		update_min_vruntime(cfs_rq);
>  
>  	if (flags & DEQUEUE_DELAYED)
> @@ -7054,6 +7054,7 @@ static int dequeue_entities(struct rq *rq,
> struct sched_entity *se, int flags)
>  
>  		/* Fix-up what dequeue_task_fair() skipped */
>  		hrtick_update(rq);
> +		trace_dequeue_task_tp(rq->cpu, p);
>  
>  		/*
>  		 * Fix-up what block_task() skipped.
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 7936d4333731..33897d35744a 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1196,19 +1196,6 @@ void dec_rt_tasks(struct sched_rt_entity
> *rt_se, struct rt_rq *rt_rq)
>  	dec_rt_group(rt_se, rt_rq);
>  }
>  
> -/*
> - * Change rt_se->run_list location unless SAVE && !MOVE
> - *
> - * assumes ENQUEUE/DEQUEUE flags match
> - */
> -static inline bool move_entity(unsigned int flags)
> -{
> -	if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
> -		return false;
> -
> -	return true;
> -}
> -
>  static void __delist_rt_entity(struct sched_rt_entity *rt_se, struct
> rt_prio_array *array)
>  {
>  	list_del_init(&rt_se->run_list);
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index d3f33d10c58c..37730cd834ba 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2361,6 +2361,20 @@ extern const
> u32		sched_prio_to_wmult[40];
>  
>  #define RETRY_TASK		((void *)-1UL)
>  
> +/*
> + * Checks for a SAVE/RESTORE without MOVE. Returns false if
> + * SAVE and !MOVE.
> + *
> + * Assumes ENQUEUE/DEQUEUE flags match.
> + */
> +static inline bool move_entity(unsigned int flags)
> +{
> +	if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
> +		return false;
> +
> +	return true;
> +}
> +
>  struct affinity_context {
>  	const struct cpumask	*new_mask;
>  	struct cpumask		*user_mask;
> ---
> 
> Thoughts?