linux-kernel - Re: [PATCH v3 10/13] sched/fair: Compute task/cpu utilization at wake-up more correctly

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 19 Aug 2016 15:03:08 +0100
From:   Morten Rasmussen <morten.rasmussen@....com>
To:     Wanpeng Li <kernellwp@...il.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Yuyang Du <yuyang.du@...el.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Mike Galbraith <mgalbraith@...e.de>, sgurrappadi@...dia.com,
        Koan-Sin Tan <freedom.tan@...iatek.com>,
        小林敬太 <keita.kobayashi.ym@...esas.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 10/13] sched/fair: Compute task/cpu utilization at
 wake-up more correctly

On Fri, Aug 19, 2016 at 09:43:00AM +0800, Wanpeng Li wrote:
> 2016-08-18 21:45 GMT+08:00 Morten Rasmussen <morten.rasmussen@....com>:
> > I assume you are referring to using task_util_peak() instead of
> > task_util() in wake_cap()?
> 
> Yes.
> 
> >
> > The peak value should never exceed the util_avg accumulated by the task
> > last time it ran. So any spike has to be caused by the task accumulating
> > more utilization last time it ran. We don't know if it a spike or a more
> 
> I see.
> 
> > permanent change in behaviour, so we have to guess. So a spike on an
> > asymmetric system could cause us to disable wake affine in some
> > circumstances (either prev_cpu or waker cpu has to be low compute
> > capacity) for the following wake-up.
> >
> > SMP should be unaffected as we should bail out on the previous
> > condition.
> 
> Why capacity_orig instead of capacity since it is checked each time
> wakeup and maybe rt class/interrupt have already occupied many cpu
> utilization.

We could switch to capacity for this condition if we also change the
spare capacity evaluation in find_idlest_group() to do the same. It
would open up for SMP systems to take find_idlest_group() route if the
SD_BALANCE_WAKE flag is set.

The reason why I have avoided capacity and used capacity_orig instead
is that in previous discussions about scheduling behaviour under
rt/dl/irq pressure it has been clear to me whether we want to move tasks
away from cpus with capacity < capacity_orig or not. The choice depends
on the use-case.

In some cases taking rt/dl/irq pressure into account is more complicated
as we don't know the capacities available in a sched_group without
iterating over all the cpus. However, I don't think it would complicate
these patches. It is more a question whether everyone are happy with
additional conditions in their wake-up path. I guess we could make it a
sched_feature if people are interested?

In short, I used capacity_orig to play it safe ;-)

> > The counter-example is task with a fairly long busy period and a much
> > longer period (cycle). Its util_avg might have decayed away since the
> > last activation so it appears very small at wake-up and we end up
> > putting it on a low capacity cpu every time even though it keeps the cpu
> > busy for a long time every time it wakes up.
> 
> Agreed, that's the reason for under-estimation concern.
> 
> >
> > Did that answer your question?
> 
> Yeah, thanks for the clarification.

You are welcome.