linux-kernel - Re: [PATCH v2] sched/uclamp: Align uclamp and util

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d19cc24f-32a4-4d10-a51c-466476616e7d@amd.com>
Date: Wed, 26 Mar 2025 10:07:20 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Xuewen Yan <xuewen.yan94@...il.com>
CC: Xuewen Yan <xuewen.yan@...soc.com>, <dietmar.eggemann@....com>,
	<mingo@...hat.com>, <peterz@...radead.org>, <juri.lelli@...hat.com>,
	<vincent.guittot@...aro.org>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
	<mgorman@...e.de>, <vschneid@...hat.com>, <hongyan.xia2@....com>,
	<qyousef@...alina.io>, <ke.wang@...soc.com>, <di.shen@...soc.com>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] sched/uclamp: Align uclamp and util_est and call
 before freq update

Hello Xuewen,

On 3/26/2025 8:27 AM, Xuewen Yan wrote:
> Hi Prateek,
> 
> On Wed, Mar 26, 2025 at 12:54 AM K Prateek Nayak <kprateek.nayak@....com> wrote:
>>
>> Hello Xuewen,
>>
>> On 3/25/2025 7:17 AM, Xuewen Yan wrote:
>>> When task's uclamp is set, we hope that the CPU frequency
>>> can increase as quickly as possible when the task is enqueued.
>>> Because the cpu frequency updating happens during the enqueue_task(),
>>> so the rq's uclamp needs to be updated before the task is enqueued,
>>> just like util_est.

I thought the frequency ramp up / ramp down was a problem with
delayed tasks being requeued.

>>> So, aline the uclamp and util_est and call before freq update.
>>>
>>> For sched-delayed tasks, the rq uclamp/util_est should only be updated
>>> when they are enqueued upon being awakened.
>>> So simply the logic of util_est's enqueue/dequeue check.
>>>
>>> Signed-off-by: Xuewen Yan <xuewen.yan@...soc.com>

[..snip..]

>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index c798d2795243..c92fee07fb7b 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6930,7 +6930,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>>>         * Let's add the task's estimated utilization to the cfs_rq's
>>>         * estimated utilization, before we update schedutil.
>>>         */
>>> -     if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & ENQUEUE_RESTORE))))
>>> +     if (!p->se.sched_delayed || (flags & ENQUEUE_DELAYED))
>>>                util_est_enqueue(&rq->cfs, p);
>>
>> Wouldn't this do a util_est_{dequeue,enqueue}() for a save restore
>> operation too of a non-delayed task? Is that desired?
> 
> For delayed-task, its util_est should dequeue/enqueue only for its
> sleeping and waking up,
> For the save restore operation, there is no need to enqueue it,
> because it is not woken up.
> So the condition of enqueue actually is:
> if (!p->se.sched_delayed || (p->se.sched_delayed && (flags & ENQUEUE_DELAYED)))
> And, this is equal to :
> if (!p->se.sched_delayed || (flags & ENQUEUE_DELAYED))
> 
> More details here:
> https://lore.kernel.org/all/84441660bef0a5e67fd09dc3787178d0276dad31.1740664400.git.hongyan.xia2@arm.com/T/#ma2505e90489316eb354390b42dee9d053f6fd1e9
> 

Ah! Correct! I got my "&&"s and "||"s confused. Sorry about that.

>>
>> On a larger note ...
>>
>> An enqueue of a delayed task will call requeue_delayed_entity() which
>> will only enqueue p->se on its cfs_rq and do an update_load_avg() for
>> that cfs_rq alone.
>>
>> With cgroups enabled, this cfs_rq might not be the root cfs_rq and
>> cfs_rq_util_change() will not call cpufreq_update_util() leaving the
>> CPU running at the older frequency despite the updated uclamp
>> constraints.
>>
>> If think cfs_rq_util_change() should be called for the root cfs_rq
>> when a task is delayed or when it is re-enqueued to re-evaluate
>> the uclamp constraints.
> 
> I think you're referring to a different issue with the delayed-task's
> util_ets/uclamp.
> This issue is unrelated to util-est and uclamp, because even without
> these two features, the problem you're mentioning still exists.
> Specifically, if the delayed-task is not the root CFS task, the CPU
> frequency might not be updated in time when the delayed-task is
> enqueued.
> Maybe we could add the update_load_avg() in clear_delayed to solve the issue?

I thought something like:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a0c4cd26ee07..007b0bb91529 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5473,6 +5473,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
  		if (sched_feat(DELAY_DEQUEUE) && delay &&
  		    !entity_eligible(cfs_rq, se)) {
  			update_load_avg(cfs_rq, se, 0);
+			/* Reevaluate frequency since uclamp may have changed */
+			if (cfs_rq != rq->cfs)
+				cfs_rq_util_change(rq->cfs, 0);
  			set_delayed(se);
  			return false;
  		}
@@ -6916,6 +6919,9 @@ requeue_delayed_entity(struct sched_entity *se)
  	}
  
  	update_load_avg(cfs_rq, se, 0);
+	/* Reevaluate frequency since uclamp may have changed */
+	if (cfs_rq != rq->cfs)
+		cfs_rq_util_change(rq->cfs, 0);
  	clear_delayed(se);
  }
  
---

to ensure that schedutil knows about any changes in the uclamp
constraints at the first dequeue, at reenqueue.

> 
> -->8--
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a0c4cd26ee07..c75d50dab86b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5435,6 +5435,7 @@ static void clear_delayed(struct sched_entity *se)
>          for_each_sched_entity(se) {
>                  struct cfs_rq *cfs_rq = cfs_rq_of(se);
> 
> +               update_load_avg(cfs_rq, se, UPDATE_TG);

For finish_delayed_dequeue_entity() calling into clear_delayed(),
UPDATE_TG would be done already in dequeue_entity().

For requeue, I believe the motivation to skip UPDATE_TG was for
the entity to compete with its original weight to be picked off
later.

>                  cfs_rq->h_nr_runnable++;
>                  if (cfs_rq_throttled(cfs_rq))
>                          break;
> 
> ---
> 
> BR
> xuewen

-- 
Thanks and Regards,
Prateek