[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ed142f6-ec00-4cd2-ac26-eff198d6a4d4@arm.com>
Date: Tue, 20 Aug 2024 17:23:03 +0100
From: Hongyan Xia <hongyan.xia2@....com>
To: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com, linux-kernel@...r.kernel.org
Cc: kprateek.nayak@....com, wuyun.abel@...edance.com,
youssefesmat@...omium.org, tglx@...utronix.de, efault@....de,
Luis Machado <luis.machado@....com>
Subject: Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue
On 27/07/2024 11:27, Peter Zijlstra wrote:
> Delayed dequeue has tasks sit around on the runqueue that are not
> actually runnable -- specifically, they will be dequeued the moment
> they get picked.
>
> One side-effect is that such a task can get migrated, which leads to a
> 'nested' dequeue_task() scenario that messes up uclamp if we don't
> take care.
>
> Notably, dequeue_task(DEQUEUE_SLEEP) can 'fail' and keep the task on
> the runqueue. This however will have removed the task from uclamp --
> per uclamp_rq_dec() in dequeue_task(). So far so good.
>
> However, if at that point the task gets migrated -- or nice adjusted
> or any of a myriad of operations that does a dequeue-enqueue cycle --
> we'll pass through dequeue_task()/enqueue_task() again. Without
> modification this will lead to a double decrement for uclamp, which is
> wrong.
>
> Reported-by: Luis Machado <luis.machado@....com>
> Reported-by: Hongyan Xia <hongyan.xia2@....com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
> kernel/sched/core.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1676,6 +1676,9 @@ static inline void uclamp_rq_inc(struct
> if (unlikely(!p->sched_class->uclamp_enabled))
> return;
>
> + if (p->se.sched_delayed)
> + return;
> +
> for_each_clamp_id(clamp_id)
> uclamp_rq_inc_id(rq, p, clamp_id);
>
> @@ -1700,6 +1703,9 @@ static inline void uclamp_rq_dec(struct
> if (unlikely(!p->sched_class->uclamp_enabled))
> return;
>
> + if (p->se.sched_delayed)
> + return;
> +
> for_each_clamp_id(clamp_id)
> uclamp_rq_dec_id(rq, p, clamp_id);
> }
> @@ -1979,8 +1985,12 @@ void enqueue_task(struct rq *rq, struct
> psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED));
> }
>
> - uclamp_rq_inc(rq, p);
> p->sched_class->enqueue_task(rq, p, flags);
> + /*
> + * Must be after ->enqueue_task() because ENQUEUE_DELAYED can clear
> + * ->sched_delayed.
> + */
> + uclamp_rq_inc(rq, p);
Apart from the typo in the title, this is a notable functional change.
Both classes that support uclamp update the CPU frequency in
enqueue_task(). Before, a task that have uclamp_min will immediately
drive up the frequency the moment it is enqueued. Now, driving up the
frequency is delayed until the next util update.
I do not yet have evidence suggesting this is quantitatively bad, like
first frame drops, but we might want to keep an eye on this, and switch
back to the old way if possible.
>
> if (sched_core_enabled(rq))
> sched_core_enqueue(rq, p);
> @@ -2002,6 +2012,10 @@ inline bool dequeue_task(struct rq *rq,
> psi_dequeue(p, flags & DEQUEUE_SLEEP);
> }
>
> + /*
> + * Must be before ->dequeue_task() because ->dequeue_task() can 'fail'
> + * and mark the task ->sched_delayed.
> + */
> uclamp_rq_dec(rq, p);
> return p->sched_class->dequeue_task(rq, p, flags);
> }
>
>
Powered by blists - more mailing lists