lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 Nov 2017 10:24:57 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Morten Rasmussen <morten.rasmussen@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        mgalbraith@...e.de, linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched/topology: Set SD_PREFER_SIBLING consistently on
 non-NUMA levels

Hi Morten,

On 27 November 2017 at 18:29, Morten Rasmussen <morten.rasmussen@....com> wrote:
> SD_PREFER_SIBLING adds an additional bias towards spreading tasks on the
> _parent_ sched_domain even if a sched_group isn't overloaded. It is
> currently set on:
>
>    1. SMT level to promote spreading to sibling cores rather than using
>       sibling HW-threads (caff37ef96eac7fe96a).
>
>    2. Non-NUMA levels which don't have the SD_SHARE_PKG_RESOURCES flag
>       set (= DIE level in the default topology) as it was found to
>       improve benchmarks on certain NUMA systems (6956dc568f34107f1d02b).

So the goal is to have the last non-NUMA level with the flag so we can
spread between "DIE"

>
>    3. Any non-NUMA level that inherits the flag due to elimination of
>       its parent sched_domain level in the de-generate step of the
>       sched_domain hierarchy set up (= MC level in the default
>       topology).

This is to ensure that the last non NUMA level has the flag when the
DIE one disappears but the goal is the same as 2.

>
> Preferring siblings seems to be a useful tweak for all non-NUMA levels,
> so we should enable it on all non-NUMA levels. As it is, it is possible
> to have it SMT and DIE, but not MC in between when using the default
> topology.

So you want to extend it to all non NUMA level. And especially you
want to spread tasks in each MC groups when we have DIE and MC levels.
Have you got benchmark results to show improvement or is it just to
align topology configuration?
The fact that this flag improves bench for SMT and NUMA level doesn't
mean that it will improve for MC level as well. We have the
wake_wide/wake_affine stuffs that tries to do similar thing
dynamically and it regularly improves/regresses benchmark like
sysbench or hackbench



>
> Signed-off-by: Morten Rasmussen <morten.rasmussen@....com>
> cc: Ingo Molnar <mingo@...hat.com>
> cc: Peter Zijlstra <peterz@...radead.org>
> ---
>  kernel/sched/topology.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 6798276d29af..7f70806bfa0f 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1122,7 +1122,7 @@ sd_init(struct sched_domain_topology_level *tl,
>                                         | 0*SD_SHARE_CPUCAPACITY
>                                         | 0*SD_SHARE_PKG_RESOURCES
>                                         | 0*SD_SERIALIZE
> -                                       | 0*SD_PREFER_SIBLING
> +                                       | 1*SD_PREFER_SIBLING
>                                         | 0*SD_NUMA
>                                         | sd_flags
>                                         ,
> @@ -1153,7 +1153,6 @@ sd_init(struct sched_domain_topology_level *tl,
>         }
>
>         if (sd->flags & SD_SHARE_CPUCAPACITY) {
> -               sd->flags |= SD_PREFER_SIBLING;
>                 sd->imbalance_pct = 110;
>                 sd->smt_gain = 1178; /* ~15% */
>
> @@ -1168,6 +1167,7 @@ sd_init(struct sched_domain_topology_level *tl,
>                 sd->busy_idx = 3;
>                 sd->idle_idx = 2;
>
> +               sd->flags &= ~SD_PREFER_SIBLING;
>                 sd->flags |= SD_SERIALIZE;
>                 if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
>                         sd->flags &= ~(SD_BALANCE_EXEC |
> @@ -1177,7 +1177,6 @@ sd_init(struct sched_domain_topology_level *tl,
>
>  #endif
>         } else {
> -               sd->flags |= SD_PREFER_SIBLING;
>                 sd->cache_nice_tries = 1;
>                 sd->busy_idx = 2;
>                 sd->idle_idx = 1;
> --
> 2.7.4
>

Powered by blists - more mailing lists