[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f7725461-bb0e-27fd-04cc-a225c97a9940@arm.com>
Date: Mon, 12 Nov 2018 18:53:01 -0800
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>, peterz@...radead.org,
mingo@...nel.org, linux-kernel@...r.kernel.org
Cc: rjw@...ysocki.net, Morten.Rasmussen@....com,
patrick.bellasi@....com, pjt@...gle.com, bsegall@...gle.com,
thara.gopinath@...aro.org, pkondeti@...eaurora.org,
quentin.perret@....com
Subject: Re: [PATCH v6 2/2] sched/fair: update scale invariance of PELT
On 11/9/18 8:20 AM, Vincent Guittot wrote:
[...]
> In order to achieve this time scaling, a new clock_pelt is created per rq.
> The increase of this clock scales with current capacity when something
> is running on rq and synchronizes with clock_task when rq is idle. With
> this mecanism, we ensure the same running and idle time whatever the
nitpick: s/mecanism/mechanism
[...]
> The responsivness of PELT is improved when CPU is not running at max
nitpick: s/responsivness/responsiveness
> capacity with this new algorithm. I have put below some examples of
> duration to reach some typical load values according to the capacity of the
> CPU with current implementation and with this patch. These values has been
> computed based on the geometric serie and the half period value:
nitpick: s/serie/series
[...]
> +/*
> + * The clock_pelt scales the time to reflect the effective amount of
> + * computation done during the running delta time but then sync back to
> + * clock_task when rq is idle.
> + *
> + *
> + * absolute time | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16
> + * @ max capacity ------******---------------******---------------
> + * @ half capacity ------************---------************---------
> + * clock pelt | 1| 2| 3| 4| 7| 8| 9| 10| 11|14|15|16
> + *
> + */
> +static inline void update_rq_clock_pelt(struct rq *rq, s64 delta)
> +{
> + if (unlikely(is_idle_task(rq->curr))) {
> + /* The rq is idle, we can sync to clock_task */
> + rq->clock_pelt = rq_clock_task(rq);
> + return;
I think the term (time) stretching was used to to describe what's
happening to the clock_pelt values at lower capacity and to this re-sync
with the clock task. But IMHO, one has to be called stretching and the
other compressing so it makes sense. I think it's a question of definition.
> + }
> +
> + /*
> + * When a rq runs at a lower compute capacity, it will need
> + * more time to do the same amount of work than at max
> + * capacity: either because it takes more time to compute the
> + * same amount of work or because taking more time means
> + * sharing more often the CPU between entities.
I wonder if since clock_pelt is related to the sched_avg(s) of the rq
isn't the only reason the first one "It takes more time to do the same
amount of work"? IMHO, the sharing of sched entities shouldn't be
visible here.
> + * In order to be invariant, we scale the delta to reflect how
> + * much work has been really done.
> + * Running at lower capacity also means running longer to do
> + * the same amount of work and this results in stealing some
This is already mentioned above.
> + * idle time that will disturb the load signal compared to
> + * max capacity; This stolen idle time will be automaticcally
nitpick: s/automaticcally/automatically
> + * reflected when the rq will be idle and the clock will be
> + * synced with rq_clock_task.
> + */
> +
> + /*
> + * scale the elapsed time to reflect the real amount of
> + * computation
> + */
> + delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu_of(rq)));
> + delta = cap_scale(delta, arch_scale_freq_capacity(cpu_of(rq)));
> +
> + rq->clock_pelt += delta;
> +}
> +
> +/*
> + * When rq becomes idle, we have to check if it has lost some idle time
> + * because it was fully busy. A rq is fully used when the /Sum util_sum
> + * is greater or equal to:
> + * (LOAD_AVG_MAX - 1024 + rq->cfs.avg.period_contrib) << SCHED_CAPACITY_SHIFT;
> + * For optimization and computing rounding purpose, we don't take into account
> + * the position in the current window (period_contrib) and we use the maximum
> + * util_avg value minus 1
> + */
In v4 you were using:
u32 divider = (LOAD_AVG_MAX - 1024 + rq->cfs.avg.period_contrib) <<
SCHED_CAPACITY_SHIFT;
and switched in v5 to:
u32 divider = ((LOAD_AVG_MAX - 1024) << SCHED_CAPACITY_SHIFT) -
LOAD_AVG_MAX;
The period_contrib of rq->cfs.avg, rq->avg_rt and rq->avg_dl are not
necessarily aligned but for overload you sum up the util_sum values for
cfs, rt and dl. Was this also a reason why you now assume max util_avg -
1 ?
> +static inline void update_idle_rq_clock_pelt(struct rq *rq)
> +{
> + u32 divider = ((LOAD_AVG_MAX - 1024) << SCHED_CAPACITY_SHIFT) - LOAD_AVG_MAX;
util_avg = util_sum / divider ,maximum util_avg = 1024
1024 = util_sum / (LOAD_AVG_MAX - 1024) w/ period_contrib = 0
util_sum >= (LOAD_AVG_MAX - 1024) * 1024
util_sum >= (LOAD_AVG_MAX - 1024) << SCHED_CAPACITY_SHIFT;
So you want to use 1024 - 1 = 1023 instead. Wouldn't you have to
subtract (LOAD_AVG_MAX - 1024) from (LOAD_AVG_MAX - 1024) <<
SCHED_CAPACITY_SHIFT in this case?
util_sum >= (LOAD_AVG_MAX - 1024) << SCHED_CAPACITY_SHIFT -
(LOAD_AVG_MAX - 1024)
> + u32 overload = rq->cfs.avg.util_sum;
> + overload += rq->avg_rt.util_sum;
> + overload += rq->avg_dl.util_sum;
> +
> + /*
> + * Reflecting some stolen time makes sense only if the idle
> + * phase would be present at max capacity. As soon as the
> + * utilization of a rq has reached the maximum value, it is
> + * considered as an always runnnig rq without idle time to
nitpick: s/runnnig/runnig
> + * steal. This potential idle time is considered as lost in
> + * this case. We keep track of this lost idle time compare to
> + * rq's clock_task.
> + */
> + if ((overload >= divider))
> + rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt;
Shouldn't overload still be called util_sum? Overload (or overutilized
is IMHO the state when util_sum >= divider.
[...]
Powered by blists - more mailing lists