lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3882929e-d4a1-48b3-b518-1582e6e1c872@amd.com>
Date: Tue, 1 Jul 2025 16:30:07 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Zihuan Zhang <zhangzihuan@...inos.cn>, xuewen.yan@...soc.com,
 vincent.guittot@...aro.org, mingo@...hat.com, peterz@...radead.org,
 juri.lelli@...hat.com
Cc: rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
 vschneid@...hat.com, hongyan.xia2@....com, linux-kernel@...r.kernel.org,
 ke.wang@...soc.com, di.shen@...soc.com, xuewen.yan94@...il.com,
 kuyo.chang@...iatek.com, juju.sung@...iatek.com, qyousef@...alina.io
Subject: Re: [PATCH v1] sched/uclamp: Skip uclamp_rq_dec() for non-final
 dequeue of delayed tasks

Hello Zihuan Zhang,

On 7/1/2025 3:04 PM, Zihuan Zhang wrote:
> Currently, uclamp_rq_inc() skips updating the clamp aggregation for
> delayed tasks unless ENQUEUE_DELAYED is set, to ensure we only track the
> real enqueue of a task that was previously marked as sched_delayed.
> 
> However, the corresponding uclamp_rq_dec() path only checks
> sched_delayed, and misses the DEQUEUE_DELAYED flag. As a result, we may
> skip dec for a delayed task that is now being truly dequeued, leading to
> uclamp aggregation mismatch.
> 
> This patch makes uclamp_rq_dec() consistent with uclamp_rq_inc() by
> checking both sched_delayed and DEQUEUE_DELAYED, ensuring correct
> accounting symmetry.
> 
> Fixes: 90ca9410dab2 ("sched/uclamp: Align uclamp and util_est and call before freq update")
> Signed-off-by: Zihuan Zhang <zhangzihuan@...inos.cn>
> ---
>   kernel/sched/core.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8988d38d46a3..99f1542cff7d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1781,7 +1781,7 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags
>   		rq->uclamp_flags &= ~UCLAMP_FLAG_IDLE;
>   }
>   
> -static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
> +static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p, int flags)
>   {
>   	enum uclamp_id clamp_id;
>   
> @@ -1797,7 +1797,8 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
>   	if (unlikely(!p->sched_class->uclamp_enabled))
>   		return;
>   
> -	if (p->se.sched_delayed)
> +	/* Skip dec if this is a delayed task not being truly dequeued */
> +	if (p->se.sched_delayed && !(flags & DEQUEUE_DELAYED))
>   		return;

Consider the following case:

- p is a fair task with uclamp constraints set.


- P blocks and dequeue_task() calls uclamp_rq_dec() and later
   p->sched_class->dequeue_task() sets "p->se.sched_delayed" to 1.

   uclamp_rq_dec() is called for the first time here and has already
   decremented the clamp_id from the hierarchy.


- Before P can be completely dequeued, P is moved to an RT class
   with p->se.sched_delayed still set to 1 which invokes the following
   call-chain:
   
   __sched_setscheduler() {
     dequeue_task(DEQUEUE_DELAYED) {
       uclamp_rq_dec() {
         if (p->se.sched_delayed && !(flags & DEQUEUE_DELAYED)) /* false */
           return;

         /* !! Decrements clamp_id again !! */
       }
       /* Dequeues from fair class */
     }
     /* Enqueues onto the RT class */
   }


 From my reading, the current code is correct and the special handling in
uclamp_rq_inc() is required because enqueue_task() does a
uclamp_rq_inc() first before calling p->sched_class->enqueue_task()
which will clear "p->se.sched_delayed" if ENQUEUE_DELAYED is set.

dequeue_task() already does a uclamp_rq_dec() before task is delayed and
any further dequeue of a delayed task should not decrement the
uclamp_id.

Please let me know if I've missed something.

>   
>   	for_each_clamp_id(clamp_id)
> @@ -2039,7 +2040,7 @@ static void __init init_uclamp(void)
>   
>   #else /* !CONFIG_UCLAMP_TASK */
>   static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags) { }
> -static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { }
> +static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p, int flags) { }
>   static inline void uclamp_fork(struct task_struct *p) { }
>   static inline void uclamp_post_fork(struct task_struct *p) { }
>   static inline void init_uclamp(void) { }
> @@ -2112,7 +2113,7 @@ inline bool dequeue_task(struct rq *rq, struct task_struct *p, int flags)
>   	 * Must be before ->dequeue_task() because ->dequeue_task() can 'fail'
>   	 * and mark the task ->sched_delayed.
>   	 */
> -	uclamp_rq_dec(rq, p);
> +	uclamp_rq_dec(rq, p, flags);
>   	return p->sched_class->dequeue_task(rq, p, flags);
>   }
>   

-- 
Thanks and Regards,
Prateek


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ