linux-kernel - Re: [PATCH v3 10/13] sched/fair: Compute task/cpu utilization at wake-up more correctly

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANRm+Cx7YScEtxhSag_uqzdOei+kEjSYsTMHeoMdYq-ijLGGAQ@mail.gmail.com>
Date:   Fri, 19 Aug 2016 09:43:00 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     Morten Rasmussen <morten.rasmussen@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Yuyang Du <yuyang.du@...el.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Mike Galbraith <mgalbraith@...e.de>, sgurrappadi@...dia.com,
        Koan-Sin Tan <freedom.tan@...iatek.com>,
        小林敬太 <keita.kobayashi.ym@...esas.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 10/13] sched/fair: Compute task/cpu utilization at
 wake-up more correctly

2016-08-18 21:45 GMT+08:00 Morten Rasmussen <morten.rasmussen@....com>:
> On Thu, Aug 18, 2016 at 07:46:44PM +0800, Wanpeng Li wrote:
>> 2016-08-18 18:24 GMT+08:00 Morten Rasmussen <morten.rasmussen@....com>:
>> > On Thu, Aug 18, 2016 at 09:40:55AM +0100, Morten Rasmussen wrote:
>> >> On Mon, Aug 15, 2016 at 04:42:37PM +0100, Morten Rasmussen wrote:
>> >> > On Mon, Aug 15, 2016 at 04:23:42PM +0200, Peter Zijlstra wrote:
>> >> > > But unlike that function, it doesn't actually use __update_load_avg().
>> >> > > Why not?
>> >> >
>> >> > Fair question :)
>> >> >
>> >> > We currently exploit the fact that the task utilization is _not_ updated
>> >> > in wake-up balancing to make sure we don't under-estimate the capacity
>> >> > requirements for tasks that have slept for a while. If we update it, we
>> >> > loose the non-decayed 'peak' utilization, but I guess we could just
>> >> > store it somewhere when we do the wake-up decay.
>> >> >
>> >> > I thought there was a better reason when I wrote the patch, but I don't
>> >> > recall right now. I will look into it again and see if we can use
>> >> > __update_load_avg() to do a proper update instead of doing things twice.
>> >>
>> >> AFAICT, we should be able to synchronize the task utilization to the
>> >> previous rq utilization using __update_load_avg() as you suggest. The
>> >> patch below is should work as a replacement without any changes to
>> >> subsequent patches. It doesn't solve the under-estimation issue, but I
>> >> have another patch for that.
>> >
>> > And here is a possible solution to the under-estimation issue. The patch
>> > would have to go at the end of this set.
>> >
>> > ---8<---
>> >
>> > From 5bc918995c6c589b833ba1f189a8b92fa22202ae Mon Sep 17 00:00:00 2001
>> > From: Morten Rasmussen <morten.rasmussen@....com>
>> > Date: Wed, 17 Aug 2016 15:30:43 +0100
>> > Subject: [PATCH] sched/fair: Track peak per-entity utilization
>> >
>> > When using PELT (per-entity load tracking) utilization to place tasks at
>> > wake-up using the decayed utilization (due to sleep) leads to
>> > under-estimation of true utilization of the task. This could mean
>> > putting the task on a cpu with less available capacity than is actually
>> > needed. This issue can be mitigated by using 'peak' utilization instead
>> > of the decayed utilization for placement decisions, e.g. at task
>> > wake-up.
>> >
>> > The 'peak' utilization metric, util_peak, tracks util_avg when the task
>> > is running and retains its previous value while the task is
>> > blocked/waiting on the rq. It is instantly updated to track util_avg
>> > again as soon as the task running again.
>>
>> Maybe this will lead to disable wake affine due to a spike peak value
>> for a low average load task.
>
> I assume you are referring to using task_util_peak() instead of
> task_util() in wake_cap()?

Yes.

>
> The peak value should never exceed the util_avg accumulated by the task
> last time it ran. So any spike has to be caused by the task accumulating
> more utilization last time it ran. We don't know if it a spike or a more

I see.

> permanent change in behaviour, so we have to guess. So a spike on an
> asymmetric system could cause us to disable wake affine in some
> circumstances (either prev_cpu or waker cpu has to be low compute
> capacity) for the following wake-up.
>
> SMP should be unaffected as we should bail out on the previous
> condition.

Why capacity_orig instead of capacity since it is checked each time
wakeup and maybe rt class/interrupt have already occupied many cpu
utilization.

>
> The counter-example is task with a fairly long busy period and a much
> longer period (cycle). Its util_avg might have decayed away since the
> last activation so it appears very small at wake-up and we end up
> putting it on a low capacity cpu every time even though it keeps the cpu
> busy for a long time every time it wakes up.

Agreed, that's the reason for under-estimation concern.

>
> Did that answer your question?

Yeah, thanks for the clarification.

Regards,
Wanpeng Li