[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7cad55e3-3a61-498a-9364-7a2d69a20757@arm.com>
Date: Tue, 5 Dec 2023 14:24:23 +0000
From: Hongyan Xia <hongyan.xia2@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Juri Lelli <juri.lelli@...hat.com>,
Qais Yousef <qyousef@...alina.io>,
Morten Rasmussen <morten.rasmussen@....com>,
Lukasz Luba <lukasz.luba@....com>,
Christian Loehle <christian.loehle@....com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/6] sched/uclamp: Track uclamped util_avg in
sched_avg
On 04/12/2023 16:07, Vincent Guittot wrote:
> On Wed, 4 Oct 2023 at 11:05, Hongyan Xia <Hongyan.Xia2@....com> wrote:
>>
>> From: Hongyan Xia <hongyan.xia2@....com>
>>
>> Track a uclamped version of util_avg in sched_avg, which clamps util_avg
>> within [uclamp[UCLAMP_MIN], uclamp[UCLAMP_MAX]] every time util_avg is
>> updated. At the CFS rq level, cfs_rq->avg.util_avg_uclamp must always be
>> the sum of all util_avg_uclamp of entities on this cfs_rq. So, each
>> time the util_avg_uclamp of an entity gets updated, we also track the
>> delta and update the cfs_rq.
>
> No please don't do that. Don't start to mix several different signals
> in one. Typically uclamp_min doen't say you want to use at least this
> amount of cpu capacity.
But I'd say with uclamp, PELT is already broken and is a mixed signal
anyway. When uclamp is used, a util_avg value X doesn't mean X, it means
X under some rq uclamp, and the worst part is that the rq uclamp may or
may not have anything to do with this task's uclamp.
Pretending X is a true PELT value now (and not a mixed value) is why we
have so many problems today. For example in the frequency spike problem,
if a task A has no idle time under uclamp_max, its PELT does not reflect
reality. The moment another task B comes in and uncap the rq uclamp_max,
the current scheduler code thinks the 1024 of A means a real 1024, which
is wrong and is exactly why we see a spike when B joins. It's also why
we need to special case 0 spare capacity with recent patches, because rq
util_avg has lost its original PELT meaning under uclamp.
Because X is not the true PELT, we always have to do something to bring
it back into reality. What the current max aggregation code does is to
introduce corner cases, like treating 0 spare capacity as potential
opportunity to queue more tasks (which creates further problems in my
tests), and maybe introducing uclamp load balancing special cases in the
future, or introducing uclamp filtering to delay the effect of wrong
PELT values.
What this series does is not much different. We keep the somewhat wrong
value X, but we also remember under what uclamp values did we get that X
to bring things back into reality, which means now we have [X,
uclamp_min when X happens, uclamp_max when X happens]. To save space,
this becomes [X, clamped X], which is what this series does. The
original PELT value X is kept, but we use the clamped X in several
places to improve our decisions.
>
> With tasks with a util_avg of 10 but a uclamp min of 1024 what does it
> mean when you start to sum this value ?
Like I replied in another comment, assuming a uclamp_min of 1024 is a
hint to run the task on the big CPUs, I don't think it's right to
directly use uclamp as a CPU placement indicator. A uclamp value may
come from ADFP from userspace. An ADPF uclamp_min value of little CPU
capacity + 1 certainly doesn't mean a game on purpose wants to avoid the
little core. It simply means it wants at least this much performance,
and whether this results in placing the game thread on a big CPU is
purely the job of EAS (or CAS, etc.). We want to use little CPU + 1 as
uclamp_min because we know the SoC and the little CPU is bad, but uclamp
should be generic and should not rely on knowing the SoC.
Basically, under sum aggregation one would not use a uclamp_min value of
1024 to place a small task on a big core. A uclamp_min of 1024 under sum
aggregation has the meaning in ADPF, which is a hint to try to run me as
fast as possible.
>
>
>> [...]
Powered by blists - more mailing lists