[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ae8412cb-fd78-4e3e-b51a-ee290fd076bd@efficios.com>
Date: Tue, 24 Oct 2023 10:49:37 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Swapnil Sapkal <Swapnil.Sapkal@....com>,
Aaron Lu <aaron.lu@...el.com>, Tim Chen <tim.c.chen@...el.com>,
K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>, x86@...nel.org
Subject: Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY
feature (v2)
On 2023-10-24 02:10, Chen Yu wrote:
> On 2023-10-23 at 11:04:49 -0400, Mathieu Desnoyers wrote:
>> On 2023-10-23 10:11, Dietmar Eggemann wrote:
>>> On 19/10/2023 18:05, Mathieu Desnoyers wrote:
>>
>> [...]
>>>> +static unsigned long scale_rt_capacity(int cpu);
>>>> +
>>>> +/*
>>>> + * Returns true if adding the task utilization to the estimated
>>>> + * utilization of the runnable tasks on @cpu does not exceed the
>>>> + * capacity of @cpu.
>>>> + *
>>>> + * This considers only the utilization of _runnable_ tasks on the @cpu
>>>> + * runqueue, excluding blocked and sleeping tasks. This is achieved by
>>>> + * using the runqueue util_est.enqueued.
>>>> + */
>>>> +static inline bool task_fits_remaining_cpu_capacity(unsigned long task_util,
>>>> + int cpu)
>>>
>>> Or like find_energy_efficient_cpu() (feec(), used in
>>> Energy-Aware-Scheduling (EAS)) which uses cpu_util(cpu, p, cpu, 0) to get:
>>>
>>> max(util_avg(CPU + p), util_est(CPU + p))
>>
>> I've tried using cpu_util(), but unfortunately anything that considers
>> blocked/sleeping tasks in its utilization total does not work for my
>> use-case.
>>
>> From cpu_util():
>>
>> * CPU utilization is the sum of running time of runnable tasks plus the
>> * recent utilization of currently non-runnable tasks on that CPU.
>>
>
> I thought cpu_util() indicates the utilization decay sum of task that was once
> "running" on this CPU, but will not sum up the "util/load" of the blocked/sleeping
> task?
>
> accumulate_sum()
> /* only the running task's util will be sum up */
> if (running)
> sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;
>
> WRITE_ONCE(sa->util_avg, sa->util_sum / divider);
The accumulation into the cfs_rq->avg.util_sum indeed only happens when the task
is running, which means that the task does not actively contribute to increment
the util_sum when it is blocked/sleeping.
However, when the task is blocked/sleeping, the task is still attached to the
runqueue, and therefore its historic util_sum still contributes to the cfs_rq
util_sum/util_avg. This completely differs from what happens when the task is
migrated to a different runqueue, in which case its util_sum contribution is
entirely removed from the cfs_rq util_sum:
static void
enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
[...]
update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
[...]
static void
dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
[...]
if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
action |= DO_DETACH;
[...]
static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
[...]
if (!se->avg.last_update_time && (flags & DO_ATTACH)) {
/*
* DO_ATTACH means we're here from enqueue_entity().
* !last_update_time means we've passed through
* migrate_task_rq_fair() indicating we migrated.
*
* IOW we're enqueueing a task on a new CPU.
*/
attach_entity_load_avg(cfs_rq, se);
update_tg_load_avg(cfs_rq);
} else if (flags & DO_DETACH) {
/*
* DO_DETACH means we're here from dequeue_entity()
* and we are migrating task out of the CPU.
*/
detach_entity_load_avg(cfs_rq, se);
update_tg_load_avg(cfs_rq);
[...]
In comparison, util_est_enqueue()/util_est_dequeue() are called from enqueue_task_fair()
and dequeue_task_fair(), which include blocked/sleeping tasks scenarios. Therefore, util_est
only considers runnable tasks in its cfs_rq->avg.util_est.enqueued.
The current rq utilization total used for rq selection should not include historic
utilization of all blocked/sleeping tasks, because we are taking a decision to bring
back a recently blocked/sleeping task onto a runqueue at that point. Considering
the historic util_sum from the set of other blocked/sleeping tasks still attached to that
runqueue in the current utilization mistakenly makes the rq selection think that the rq is
busier than it really is.
I suspect that cpu_util_without() is an half-successful attempt at solving this by removing
the task p from the considered utilization, but it does not take into account scenarios where many
other tasks happen to be blocked/sleeping as well.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists