lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ae8412cb-fd78-4e3e-b51a-ee290fd076bd@efficios.com>
Date:   Tue, 24 Oct 2023 10:49:37 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Chen Yu <yu.c.chen@...el.com>
Cc:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Swapnil Sapkal <Swapnil.Sapkal@....com>,
        Aaron Lu <aaron.lu@...el.com>, Tim Chen <tim.c.chen@...el.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>, x86@...nel.org
Subject: Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY
 feature (v2)

On 2023-10-24 02:10, Chen Yu wrote:
> On 2023-10-23 at 11:04:49 -0400, Mathieu Desnoyers wrote:
>> On 2023-10-23 10:11, Dietmar Eggemann wrote:
>>> On 19/10/2023 18:05, Mathieu Desnoyers wrote:
>>
>> [...]
>>>> +static unsigned long scale_rt_capacity(int cpu);
>>>> +
>>>> +/*
>>>> + * Returns true if adding the task utilization to the estimated
>>>> + * utilization of the runnable tasks on @cpu does not exceed the
>>>> + * capacity of @cpu.
>>>> + *
>>>> + * This considers only the utilization of _runnable_ tasks on the @cpu
>>>> + * runqueue, excluding blocked and sleeping tasks. This is achieved by
>>>> + * using the runqueue util_est.enqueued.
>>>> + */
>>>> +static inline bool task_fits_remaining_cpu_capacity(unsigned long task_util,
>>>> +						    int cpu)
>>>
>>> Or like find_energy_efficient_cpu() (feec(), used in
>>> Energy-Aware-Scheduling (EAS)) which uses cpu_util(cpu, p, cpu, 0) to get:
>>>
>>>     max(util_avg(CPU + p), util_est(CPU + p))
>>
>> I've tried using cpu_util(), but unfortunately anything that considers
>> blocked/sleeping tasks in its utilization total does not work for my
>> use-case.
>>
>>  From cpu_util():
>>
>>   * CPU utilization is the sum of running time of runnable tasks plus the
>>   * recent utilization of currently non-runnable tasks on that CPU.
>>
> 
> I thought cpu_util() indicates the utilization decay sum of task that was once
> "running" on this CPU, but will not sum up the "util/load" of the blocked/sleeping
> task?
> 
> accumulate_sum()
>      /* only the running task's util will be sum up */
>      if (running)
>         sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;
> 
> WRITE_ONCE(sa->util_avg, sa->util_sum / divider);

The accumulation into the cfs_rq->avg.util_sum indeed only happens when the task
is running, which means that the task does not actively contribute to increment
the util_sum when it is blocked/sleeping.

However, when the task is blocked/sleeping, the task is still attached to the
runqueue, and therefore its historic util_sum still contributes to the cfs_rq
util_sum/util_avg. This completely differs from what happens when the task is
migrated to a different runqueue, in which case its util_sum contribution is
entirely removed from the cfs_rq util_sum:

static void
enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
[...]
         update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
[...]

static void
dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
[...]
         if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
                 action |= DO_DETACH;
[...]

static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
[...]
         if (!se->avg.last_update_time && (flags & DO_ATTACH)) {

                 /*
                  * DO_ATTACH means we're here from enqueue_entity().
                  * !last_update_time means we've passed through
                  * migrate_task_rq_fair() indicating we migrated.
                  *
                  * IOW we're enqueueing a task on a new CPU.
                  */
                 attach_entity_load_avg(cfs_rq, se);
                 update_tg_load_avg(cfs_rq);

         } else if (flags & DO_DETACH) {
                 /*
                  * DO_DETACH means we're here from dequeue_entity()
                  * and we are migrating task out of the CPU.
                  */
                 detach_entity_load_avg(cfs_rq, se);
                 update_tg_load_avg(cfs_rq);
[...]

In comparison, util_est_enqueue()/util_est_dequeue() are called from enqueue_task_fair()
and dequeue_task_fair(), which include blocked/sleeping tasks scenarios. Therefore, util_est
only considers runnable tasks in its cfs_rq->avg.util_est.enqueued.

The current rq utilization total used for rq selection should not include historic
utilization of all blocked/sleeping tasks, because we are taking a decision to bring
back a recently blocked/sleeping task onto a runqueue at that point. Considering
the historic util_sum from the set of other blocked/sleeping tasks still attached to that
runqueue in the current utilization mistakenly makes the rq selection think that the rq is
busier than it really is.

I suspect that cpu_util_without() is an half-successful attempt at solving this by removing
the task p from the considered utilization, but it does not take into account scenarios where many
other tasks happen to be blocked/sleeping as well.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ