linux-kernel - Re: [PATCHv4 00/12] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtCVDJqj64HNKnOM0ZN5d8BXycJj86Sm7ZOyEPCiAHhB_Q@mail.gmail.com>
Date:   Tue, 31 Jul 2018 14:13:07 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Valentin Schneider <valentin.schneider@....com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, gaku.inami.xh@...esas.com,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCHv4 00/12] sched/fair: Migrate 'misfit' tasks on asymmetric
 capacity systems

On Mon, 30 Jul 2018 at 16:30, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>
> On 07/26/2018 07:14 PM, Valentin Schneider wrote:
> > Hi,
> >
> > On 09/07/18 16:08, Morten Rasmussen wrote:
> >> On Fri, Jul 06, 2018 at 12:18:27PM +0200, Vincent Guittot wrote:
> >>> Hi Morten,
> >>>
> >>> On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen <morten.rasmussen@....com> wrote:
>
> [...]
>
> > With that out of the way, I did some lmbench runs:
> >> lat_mem_rd 10 1024
> >
> > With ASYM_PACKING, I still see lmbench tasks remaining on LITTLE CPUs while
> > bigs are free, because ASYM_PACKING only does explicit active balancing on
> > CPU_NEWLY_IDLE balancing - otherwise it'll rely on the nr_balance_failed counter.
> >
> > However, that counter can be reset before it reaches the threshold at which
> > active balance is done, which can lead to huge upmigration delays (almost a
> > full second). I also see the same kind of issues on Juno r0.
> >
> > This could be resolved by extending ASYM_PACKING active balancing to
> > non NEWLY_IDLE cases, but then we'd be thrashing everything. That's another
> > argument for basing upmigration on task load-tracking signals, as we can
> > determine which tasks need active balancing much faster than the
> > nr_balance_failed counter way while not active balancing the world.
>
> The task layout of the test looks like n=85 always running tasks (each
> for ~ 1.25ms on big or little) and they all get created and run one

How mistfit task can make a difference for a benchmark which uses 1.25ms tasks ?

> after the other. So on a big cpu, their util values go from 512 to 1024
> and from 223 to 446 on little cpu (Juno board). Latter thanks to
> Quentin's 'sched/fair: Fix util_avg of new tasks for asymmetric systems'.
>
> root@...o:~# cat /sys/devices/system/cpu/cpu[01]/cpu_capacity
> 446
> 1024
>
> > (lat_mem_rd 10 1024) with ASYM_PACKING:
>
> ...
> > 4.0 148.66   <-----
> > 4.5 10.191
> ...
> > 7.5 10.203
> > 8.0 154.354   <-----
>
> I ran the test affine to big, little and all cpus on tip/sched/core w/o
> ASYM_PACKING or Misfit:
>
> cputype:     big  little     all
> cpumask:    0x06    0x39    0xff
>
> mem size   <---- latency  ---->
>
>   0.00098   3.668   3.595   3.669
>   0.00195   3.668   3.594   3.594
>   0.00293   3.668   3.593   3.595
>   0.00391   3.669   3.596   3.595
>   ...
>   3.75000  58.687 121.934 122.293
>   4.00000  57.054 121.771 120.489
>   4.50000  56.914 121.851  56.729
>   5.00000  57.347 121.777  56.975
>   5.50000  57.705 121.738  68.981
>   6.00000  57.935 121.728  57.542
>   6.50000  58.119 121.694 121.799
>   7.00000  58.194 121.502  57.844
>   7.50000  58.258 121.684  58.050
>   8.00000  58.293 121.725  58.030
>   9.00000  58.309 121.793  58.188
> 10.00000  58.561 122.252 122.078
>
> There is no diff between big and little cpus with small memory sizes,
> just with the MB range.
> If I look into the trace for 'all' it turns out that their are cases in
> which, even if the task only run for ~15% of the time on big, the
> latency value is printed as when it was running affine to big. So using
> the latency value as an indicator where the task was scheduled is IMHO
> not really possible.