[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAB8ipk-G_Q20Cyx69GRY5pdTj9C4SyVyiuRVFi-i-otQ1zBVFg@mail.gmail.com>
Date: Wed, 16 Apr 2025 19:07:18 +0800
From: Xuewen Yan <xuewen.yan94@...il.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Xuewen Yan <xuewen.yan@...soc.com>, dietmar.eggemann@....com, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, rostedt@...dmis.org,
bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
hongyan.xia2@....com, qyousef@...alina.io, ke.wang@...soc.com,
di.shen@...soc.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched/uclamp: Align uclamp and util_est and call
before freq update
On Wed, Apr 16, 2025 at 5:42 PM Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Wed, 16 Apr 2025 at 04:55, Xuewen Yan <xuewen.yan94@...il.com> wrote:
> >
> > On Wed, Apr 16, 2025 at 1:05 AM Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> > >
> > > On Tue, 25 Mar 2025 at 02:48, Xuewen Yan <xuewen.yan@...soc.com> wrote:
> > > >
> > > > When task's uclamp is set, we hope that the CPU frequency
> > > > can increase as quickly as possible when the task is enqueued.
> > > > Because the cpu frequency updating happens during the enqueue_task(),
> > >
> > > Strictly speaking, it doesn't happen during enqueue_task but when :
> > > - attach/detach tasks when migrating
> > > - update_load_avg decayed
> > > - io_wait
> > >
> > > This often happens during an enqueue but not always ...
> >
> > Okay, I would make some adjustments to these descriptions.
> >
> > >
> > > > so the rq's uclamp needs to be updated before the task is enqueued,
> > >
> > > this doesn't ensure that new rq's uclamp will be taken into account
> >
> > Did I miss something?
> >
> > As following stack:
> > enqueue_task_fair()
> > update_load_avg()
> > cfs_rq_util_change(cfs_rq, 0);
>
> As mentioned above, this doesn't always happen so you are not ensured
> to take uclamp into account. If you mandate to take uclamp value into
> account immediately this is not enough
I understand your point now. I think what you're referring to is a
different issue, just like what we discussed earlier with Prateek:
https://lore.kernel.org/all/CAB8ipk_1=U_HgVQrfQ4VRUDrcHJBQd2LJ9aXh8PG6E-Z4_xS+g@mail.gmail.com/
However, I think the purpose of this patch is to ensure that during
the enqueue_task process, if a frequency change is triggered, the
uclamp has already been updated before the frequency is changed.
>
> > cpufreq_update_util()
> > sugov_update_shared()
> > sugov_next_freq_shared()
> > sugov_get_util()
> > effective_cpu_util()
> > *min = max(irq + cpu_bw_dl(rq), uclamp_rq_get(rq, UCLAMP_MIN));
> > *max = min(scale, uclamp_rq_get(rq, UCLAMP_MAX));
> >
> > So, the rq's uclamp value should update before enqueue_task().
> > >
> > > > just like util_est.
> > >
> > > just like util_est
> > >
> > > > So, aline the uclamp and util_est and call before freq update.
> > >
> > > nit s/aline/align/ ?
> > align.
> > >
> > > >
> > > > For sched-delayed tasks, the rq uclamp/util_est should only be updated
> > > > when they are enqueued upon being awakened.
> > > > So simply the logic of util_est's enqueue/dequeue check.
> > > >
> > > > Signed-off-by: Xuewen Yan <xuewen.yan@...soc.com>
> > > > ---
> > > > v2:
> > > > - simply the util-est's en/dequeue check;
> > > > ---
> > > > Previous discussion:
> > > > https://lore.kernel.org/all/CAB8ipk8pEvOtCm-d0o1rsekwxPWUHk9iBGtt9TLTWW-iWTQKiA@mail.gmail.com/
> > > > https://lore.kernel.org/all/84441660bef0a5e67fd09dc3787178d0276dad31.1740664400.git.hongyan.xia2@arm.com/T/#u
> > > > https://lore.kernel.org/all/CAB8ipk9LpbiUDnbcV6+59+Sa=Ai7tFzO===mpLD3obNdV4=J-A@mail.gmail.com/T/#u
> > > > https://lore.kernel.org/all/aa8baf67-a8ec-4ad8-a6a8-afdcd7036771@arm.com/
> > > > ---
> > > > kernel/sched/core.c | 17 ++++++++++-------
> > > > kernel/sched/fair.c | 4 ++--
> > > > 2 files changed, 12 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > > index 042351c7afce..72fbe2031e54 100644
> > > > --- a/kernel/sched/core.c
> > > > +++ b/kernel/sched/core.c
> > > > @@ -1747,7 +1747,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
> > > > }
> > > > }
> > > >
> > > > -static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
> > > > +static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags)
> > > > {
> > > > enum uclamp_id clamp_id;
> > > >
> > > > @@ -1763,7 +1763,8 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
> > > > if (unlikely(!p->sched_class->uclamp_enabled))
> > > > return;
> > > >
> > > > - if (p->se.sched_delayed)
> > > > + /* Only inc the delayed task which being woken up. */
> > > > + if (p->se.sched_delayed && !(flags & ENQUEUE_DELAYED))
> > > > return;
> > > >
> > > > for_each_clamp_id(clamp_id)
> > > > @@ -2031,7 +2032,7 @@ static void __init init_uclamp(void)
> > > > }
> > > >
> > > > #else /* !CONFIG_UCLAMP_TASK */
> > > > -static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) { }
> > > > +static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags) { }
> > > > static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { }
> > > > static inline void uclamp_fork(struct task_struct *p) { }
> > > > static inline void uclamp_post_fork(struct task_struct *p) { }
> > > > @@ -2067,12 +2068,14 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
> > > > if (!(flags & ENQUEUE_NOCLOCK))
> > > > update_rq_clock(rq);
> > > >
> > > > - p->sched_class->enqueue_task(rq, p, flags);
> > > > /*
> > > > - * Must be after ->enqueue_task() because ENQUEUE_DELAYED can clear
> > > > - * ->sched_delayed.
> > > > + * Can be before ->enqueue_task() because uclamp considers the
> > > > + * ENQUEUE_DELAYED task before its ->sched_delayed gets cleared
> > > > + * in ->enqueue_task().
> > > > */
> > > > - uclamp_rq_inc(rq, p);
> > > > + uclamp_rq_inc(rq, p, flags);
> > > > +
> > > > + p->sched_class->enqueue_task(rq, p, flags);
> > > >
> > > > psi_enqueue(p, flags);
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index c798d2795243..c92fee07fb7b 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -6930,7 +6930,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > * Let's add the task's estimated utilization to the cfs_rq's
> > > > * estimated utilization, before we update schedutil.
> > > > */
> > > > - if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & ENQUEUE_RESTORE))))
> > > > + if (!p->se.sched_delayed || (flags & ENQUEUE_DELAYED))
> > >
> > > commit message doesn't explain why you change util_est condition
> >
> > Because, the sched_delayed flag is set when dequeue_entity, and clear
> > after the condition,
> > so for migrating/prio_change, we could just check the sched_delayed.
>
> Why is testing sched_delayed enough for migrating/prio_change ?
> With your change, we will remove then add back util_est when changing
> prio of the task which is useless
I sincerely apologize for any misunderstanding my previous description
may have caused.
When changing prio without changing class, the delayed_task's
sched_delayed flag is not changed,
we would not remove then add back util_est.
If the class was changed:
if (prev_class != next_class && p->se.sched_delayed)
dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED |
DEQUEUE_NOCLOCK);
It will dequeue the delayed-task first, and will not enqueue it.
As for normal tasks which are not delayed, indeed, the issue you
mentioned can occur, but it seems that this problem has always
existed. Perhaps this is a new issue that has come up.
Thanks!
---
xuewen.yan
>
>
> > And for the wakeup, because the the sched_delayed flag is cleared after this,
> > so use the ENQUEUE_DELAYED flag to ensure the util_est could enqueue.
> >
> > >
> > > > util_est_enqueue(&rq->cfs, p);
> > > >
> > > > if (flags & ENQUEUE_DELAYED) {
> > > > @@ -7168,7 +7168,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
> > > > */
> > > > static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > {
> > > > - if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & DEQUEUE_SAVE))))
> > > > + if (!p->se.sched_delayed)
> > >
> > > same here, you should explain in commit message why it's okay to do so
> >
> > Same as above, the sched_delayed flag is set when dequeue_entity, so
> > this place,
> > the sched_delayed was not set when sleeping, If the flag is set, it
> > indicates that it
> > was migrating or prio changing.
> >
> > By the way, I will kindly update these reasons in the commit message.
> >
> > Thanks!
> >
> > BR
> > ---
> > xuewen
Powered by blists - more mailing lists