linux-kernel - Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtDUa7nARKxT+hR_ci3J_exE85sZFT1oEM84akZ+i_-UgA@mail.gmail.com>
Date:   Fri, 6 Jul 2018 12:18:17 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Morten Rasmussen <morten.rasmussen@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        gaku.inami.xh@...esas.com,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on
 asymmetric cpu capacity domains

On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen <morten.rasmussen@....com> wrote:
>
> The 'prefer sibling' sched_domain flag is intended to encourage
> spreading tasks to sibling sched_domain to take advantage of more caches
> and core for SMT systems. It has recently been changed to be on all
> non-NUMA topology level. However, spreading across domains with cpu
> capacity asymmetry isn't desirable, e.g. spreading from high capacity to
> low capacity cpus even if high capacity cpus aren't overutilized might
> give access to more cache but the cpu will be slower and possibly lead
> to worse overall throughput.
>
> To prevent this, we need to remove SD_PREFER_SIBLING on the sched_domain
> level immediately below SD_ASYM_CPUCAPACITY.

This makes sense. Nevertheless, this patch also raises a scheduling
problem and break the 1 task per CPU policy that is enforced by
SD_PREFER_SIBLING. When running the tests of your cover letter, 1 long
running task is often co scheduled on a big core whereas short pinned
tasks are still running and a little core is idle which is not an
optimal scheduling decision

>
> cc: Ingo Molnar <mingo@...hat.com>
> cc: Peter Zijlstra <peterz@...radead.org>
>
> Signed-off-by: Morten Rasmussen <morten.rasmussen@....com>
> ---
>  kernel/sched/topology.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 29c186961345..00c7a08c7f77 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1140,7 +1140,7 @@ sd_init(struct sched_domain_topology_level *tl,
>                                         | 0*SD_SHARE_CPUCAPACITY
>                                         | 0*SD_SHARE_PKG_RESOURCES
>                                         | 0*SD_SERIALIZE
> -                                       | 0*SD_PREFER_SIBLING
> +                                       | 1*SD_PREFER_SIBLING
>                                         | 0*SD_NUMA
>                                         | sd_flags
>                                         ,
> @@ -1186,17 +1186,21 @@ sd_init(struct sched_domain_topology_level *tl,
>         if (sd->flags & SD_ASYM_CPUCAPACITY) {
>                 struct sched_domain *t = sd;
>
> +               /*
> +                * Don't attempt to spread across cpus of different capacities.
> +                */
> +               if (sd->child)
> +                       sd->child->flags &= ~SD_PREFER_SIBLING;
> +
>                 for_each_lower_domain(t)
>                         t->flags |= SD_BALANCE_WAKE;
>         }
>
>         if (sd->flags & SD_SHARE_CPUCAPACITY) {
> -               sd->flags |= SD_PREFER_SIBLING;
>                 sd->imbalance_pct = 110;
>                 sd->smt_gain = 1178; /* ~15% */
>
>         } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
> -               sd->flags |= SD_PREFER_SIBLING;
>                 sd->imbalance_pct = 117;
>                 sd->cache_nice_tries = 1;
>                 sd->busy_idx = 2;
> @@ -1207,6 +1211,7 @@ sd_init(struct sched_domain_topology_level *tl,
>                 sd->busy_idx = 3;
>                 sd->idle_idx = 2;
>
> +               sd->flags &= ~SD_PREFER_SIBLING;
>                 sd->flags |= SD_SERIALIZE;
>                 if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
>                         sd->flags &= ~(SD_BALANCE_EXEC |
> @@ -1216,7 +1221,6 @@ sd_init(struct sched_domain_topology_level *tl,
>
>  #endif
>         } else {
> -               sd->flags |= SD_PREFER_SIBLING;
>                 sd->cache_nice_tries = 1;
>                 sd->busy_idx = 2;
>                 sd->idle_idx = 1;
> --
> 2.7.4
>