[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d19cc24f-32a4-4d10-a51c-466476616e7d@amd.com>
Date: Wed, 26 Mar 2025 10:07:20 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Xuewen Yan <xuewen.yan94@...il.com>
CC: Xuewen Yan <xuewen.yan@...soc.com>, <dietmar.eggemann@....com>,
<mingo@...hat.com>, <peterz@...radead.org>, <juri.lelli@...hat.com>,
<vincent.guittot@...aro.org>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
<mgorman@...e.de>, <vschneid@...hat.com>, <hongyan.xia2@....com>,
<qyousef@...alina.io>, <ke.wang@...soc.com>, <di.shen@...soc.com>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] sched/uclamp: Align uclamp and util_est and call
before freq update
Hello Xuewen,
On 3/26/2025 8:27 AM, Xuewen Yan wrote:
> Hi Prateek,
>
> On Wed, Mar 26, 2025 at 12:54 AM K Prateek Nayak <kprateek.nayak@....com> wrote:
>>
>> Hello Xuewen,
>>
>> On 3/25/2025 7:17 AM, Xuewen Yan wrote:
>>> When task's uclamp is set, we hope that the CPU frequency
>>> can increase as quickly as possible when the task is enqueued.
>>> Because the cpu frequency updating happens during the enqueue_task(),
>>> so the rq's uclamp needs to be updated before the task is enqueued,
>>> just like util_est.
I thought the frequency ramp up / ramp down was a problem with
delayed tasks being requeued.
>>> So, aline the uclamp and util_est and call before freq update.
>>>
>>> For sched-delayed tasks, the rq uclamp/util_est should only be updated
>>> when they are enqueued upon being awakened.
>>> So simply the logic of util_est's enqueue/dequeue check.
>>>
>>> Signed-off-by: Xuewen Yan <xuewen.yan@...soc.com>
[..snip..]
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index c798d2795243..c92fee07fb7b 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6930,7 +6930,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>>> * Let's add the task's estimated utilization to the cfs_rq's
>>> * estimated utilization, before we update schedutil.
>>> */
>>> - if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & ENQUEUE_RESTORE))))
>>> + if (!p->se.sched_delayed || (flags & ENQUEUE_DELAYED))
>>> util_est_enqueue(&rq->cfs, p);
>>
>> Wouldn't this do a util_est_{dequeue,enqueue}() for a save restore
>> operation too of a non-delayed task? Is that desired?
>
> For delayed-task, its util_est should dequeue/enqueue only for its
> sleeping and waking up,
> For the save restore operation, there is no need to enqueue it,
> because it is not woken up.
> So the condition of enqueue actually is:
> if (!p->se.sched_delayed || (p->se.sched_delayed && (flags & ENQUEUE_DELAYED)))
> And, this is equal to :
> if (!p->se.sched_delayed || (flags & ENQUEUE_DELAYED))
>
> More details here:
> https://lore.kernel.org/all/84441660bef0a5e67fd09dc3787178d0276dad31.1740664400.git.hongyan.xia2@arm.com/T/#ma2505e90489316eb354390b42dee9d053f6fd1e9
>
Ah! Correct! I got my "&&"s and "||"s confused. Sorry about that.
>>
>> On a larger note ...
>>
>> An enqueue of a delayed task will call requeue_delayed_entity() which
>> will only enqueue p->se on its cfs_rq and do an update_load_avg() for
>> that cfs_rq alone.
>>
>> With cgroups enabled, this cfs_rq might not be the root cfs_rq and
>> cfs_rq_util_change() will not call cpufreq_update_util() leaving the
>> CPU running at the older frequency despite the updated uclamp
>> constraints.
>>
>> If think cfs_rq_util_change() should be called for the root cfs_rq
>> when a task is delayed or when it is re-enqueued to re-evaluate
>> the uclamp constraints.
>
> I think you're referring to a different issue with the delayed-task's
> util_ets/uclamp.
> This issue is unrelated to util-est and uclamp, because even without
> these two features, the problem you're mentioning still exists.
> Specifically, if the delayed-task is not the root CFS task, the CPU
> frequency might not be updated in time when the delayed-task is
> enqueued.
> Maybe we could add the update_load_avg() in clear_delayed to solve the issue?
I thought something like:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a0c4cd26ee07..007b0bb91529 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5473,6 +5473,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
if (sched_feat(DELAY_DEQUEUE) && delay &&
!entity_eligible(cfs_rq, se)) {
update_load_avg(cfs_rq, se, 0);
+ /* Reevaluate frequency since uclamp may have changed */
+ if (cfs_rq != rq->cfs)
+ cfs_rq_util_change(rq->cfs, 0);
set_delayed(se);
return false;
}
@@ -6916,6 +6919,9 @@ requeue_delayed_entity(struct sched_entity *se)
}
update_load_avg(cfs_rq, se, 0);
+ /* Reevaluate frequency since uclamp may have changed */
+ if (cfs_rq != rq->cfs)
+ cfs_rq_util_change(rq->cfs, 0);
clear_delayed(se);
}
---
to ensure that schedutil knows about any changes in the uclamp
constraints at the first dequeue, at reenqueue.
>
> -->8--
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a0c4cd26ee07..c75d50dab86b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5435,6 +5435,7 @@ static void clear_delayed(struct sched_entity *se)
> for_each_sched_entity(se) {
> struct cfs_rq *cfs_rq = cfs_rq_of(se);
>
> + update_load_avg(cfs_rq, se, UPDATE_TG);
For finish_delayed_dequeue_entity() calling into clear_delayed(),
UPDATE_TG would be done already in dequeue_entity().
For requeue, I believe the motivation to skip UPDATE_TG was for
the entity to compete with its original weight to be picked off
later.
> cfs_rq->h_nr_runnable++;
> if (cfs_rq_throttled(cfs_rq))
> break;
>
> ---
>
> BR
> xuewen
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists