[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b26df23-4977-4fad-8721-137a23932b6e@arm.com>
Date: Wed, 5 Jun 2024 10:14:47 +0100
From: Luis Machado <luis.machado@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
linux-kernel@...r.kernel.org, kprateek.nayak@....com,
wuyun.abel@...edance.com, tglx@...utronix.de, efault@....de,
John Stultz <jstultz@...gle.com>, Hongyan.Xia2@....com
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue
On 6/5/24 08:22, Peter Zijlstra wrote:
> On Tue, Jun 04, 2024 at 09:12:20PM +0200, Peter Zijlstra wrote:
>
>> But with the above, you skip inc for sched_delayed, but dequeue_task()
>> will have done the dec, so isn't it then still unbalanced?
>>
>> Oh well, I'll go stare at this in tomorrow.
It did not appear to keep things unbalanced towards uclamp_rq_dec, but
might've been luck of not hitting the right code path.
In any case ...
>
> OK, just before I went to play in my giant hamster wheel it hit me.
>
> When dequeue_task() 'fails' and sets sched_delayed, we'll have done
> uclamp_rq_dec().
>
> Then, since the delayed task is still on the rq -- per the failure -- it
> can be migrated, this will again do dequeue_task() which will *agian* do
> a uclamp_rq_dec().
>
> So now we have a double dequeue -- *FAIL*.
>
> Worse, the migration will then do an enqueue_task() on the new rq
> causing uclamp_rq_inc(). If you then get a ttwu() / ENQUEUE_DELAYED, you
> can tickle yet another uclamp_rq_inc() for another fail.
>
> Something like the below avoids uclamp_rq_{inc,dec}() when
> ->sched_delayed, and moves it post class->enqueue_task() such that for
> the ENQUEUE_DELAYED case, we *will* do the inc after ->sched_delayed
> gets cleared.
>
> Hmm?
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9ac1054c2a4bb..965e6464e68e9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1676,6 +1676,9 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
> if (unlikely(!p->sched_class->uclamp_enabled))
> return;
>
> + if (p->se.sched_delayed)
> + return;
> +
> for_each_clamp_id(clamp_id)
> uclamp_rq_inc_id(rq, p, clamp_id);
>
> @@ -1700,6 +1703,9 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
> if (unlikely(!p->sched_class->uclamp_enabled))
> return;
>
> + if (p->se.sched_delayed)
> + return;
> +
> for_each_clamp_id(clamp_id)
> uclamp_rq_dec_id(rq, p, clamp_id);
> }
> @@ -1979,9 +1985,10 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
> psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED));
> }
>
> - uclamp_rq_inc(rq, p);
> p->sched_class->enqueue_task(rq, p, flags);
>
> + uclamp_rq_inc(rq, p);
> +
> if (sched_core_enabled(rq))
> sched_core_enqueue(rq, p);
> }
> @@ -2003,6 +2010,7 @@ bool dequeue_task(struct rq *rq, struct task_struct *p, int flags)
> }
>
> uclamp_rq_dec(rq, p);
> +
> return p->sched_class->dequeue_task(rq, p, flags);
> }
>
... thanks for the patch! The above seems to do it for me. I can see
more reasonable energy use with the eevdf-complete series. Still a
bit higher. Might be noise, we'll see.
I'll go stare at it and run some more tests on our end with this fix
applied on top and will report back.
Powered by blists - more mailing lists