linux-kernel - Re: [RFC 2/4] sched/numa: replace runnable_load_avg by load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDA5GamN4A1SnegYwYCk123TqUDE9EHFbHTgKCMR+yqGQ@mail.gmail.com>
Date:   Thu, 13 Feb 2020 09:05:35 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Phil Auld <pauld@...hat.com>, Parth Shah <parth@...ux.ibm.com>,
        Valentin Schneider <valentin.schneider@....com>
Subject: Re: [RFC 2/4] sched/numa: replace runnable_load_avg by load_avg

On Wed, 12 Feb 2020 at 20:49, Mel Gorman <mgorman@...hsingularity.net> wrote:
>
> On Wed, Feb 12, 2020 at 01:37:15PM +0000, Mel Gorman wrote:
> > >  static void task_numa_assign(struct task_numa_env *env,
> > >                          struct task_struct *p, long imp)
> > >  {
> > > @@ -1556,6 +1594,11 @@ static bool load_too_imbalanced(long src_load, long dst_load,
> > >     long orig_src_load, orig_dst_load;
> > >     long src_capacity, dst_capacity;
> > >
> > > +
> > > +   /* If dst node has spare capacity, there is no real load imbalance */
> > > +   if (env->dst_stats.node_type == node_has_spare)
> > > +           return false;
> > > +
> >
> > Not exactly what the load balancer thinks though, the load balancer
> > may decide to balance the tasks between NUMA groups even when there is
> > capacity. That said, what you did here is compatible with the current code.
> >
> > I'll want to test this further but in general
> >
> > Acked-by: Mel Gorman <mgorman@...hsingularity.net>
> >
>
> And I was wrong. I worried that the conflict with the load balancer would
> be a problem when I wrote this comment.  That was based on finding that
> I had to account for the load balancer when using capacity to decide
> whether an idle CPU can be used. When I didn't, performance went to
> hell but thought that maybe you had somehow avoided the same problem.
> Unfortunately, testing indicates that the same, or similar, problem is
> here.
>
> This is specjbb running two JVMs on a 2-socket Haswell machine with 48
> cores in total. This is just your first two patches, I have found that

thanks for the test results. Unfortunately i haven't specjbb to
analyse the metrics and study what is going wrong

> the group classify patches do not recover it.
>
> Hmean     tput-1     39748.93 (   0.00%)    39568.77 (  -0.45%)
> Hmean     tput-2     88648.59 (   0.00%)    85302.00 (  -3.78%)
> Hmean     tput-3    136285.01 (   0.00%)   131280.85 (  -3.67%)
> Hmean     tput-4    181312.69 (   0.00%)   179903.17 (  -0.78%)
> Hmean     tput-5    228725.85 (   0.00%)   223499.65 (  -2.28%)
> Hmean     tput-6    273246.83 (   0.00%)   268984.52 (  -1.56%)
> Hmean     tput-7    317708.89 (   0.00%)   314593.10 (  -0.98%)
> Hmean     tput-8    362378.08 (   0.00%)   337205.87 *  -6.95%*
> Hmean     tput-9    403792.00 (   0.00%)   349014.39 * -13.57%*
> Hmean     tput-10   446000.88 (   0.00%)   379242.88 * -14.97%*
> Hmean     tput-11   486188.58 (   0.00%)   341951.27 * -29.67%*
> Hmean     tput-12   522288.84 (   0.00%)   388877.07 * -25.54%*
> Hmean     tput-13   532394.06 (   0.00%)   437005.34 * -17.92%*
> Hmean     tput-14   539440.66 (   0.00%)   467479.65 * -13.34%*
> Hmean     tput-15   541843.50 (   0.00%)   455477.03 ( -15.94%)
> Hmean     tput-16   546510.71 (   0.00%)   483599.58 ( -11.51%)
> Hmean     tput-17   544501.21 (   0.00%)   506189.64 (  -7.04%)
> Hmean     tput-18   544802.98 (   0.00%)   535869.05 (  -1.64%)
> Hmean     tput-19   545265.27 (   0.00%)   533050.57 (  -2.24%)
> Hmean     tput-20   543284.33 (   0.00%)   528100.03 (  -2.79%)
> Hmean     tput-21   543375.11 (   0.00%)   528692.97 (  -2.70%)
> Hmean     tput-22   542536.60 (   0.00%)   527160.27 (  -2.83%)
> Hmean     tput-23   536402.28 (   0.00%)   521023.49 (  -2.87%)
> Hmean     tput-24   532307.76 (   0.00%)   519208.85 *  -2.46%*
> Stddev    tput-1      1426.23 (   0.00%)      464.57 (  67.43%)
> Stddev    tput-2      4437.10 (   0.00%)     1489.17 (  66.44%)
> Stddev    tput-3      3021.47 (   0.00%)      750.95 (  75.15%)
> Stddev    tput-4      4098.39 (   0.00%)     1678.67 (  59.04%)
> Stddev    tput-5      3524.22 (   0.00%)     1025.30 (  70.91%)
> Stddev    tput-6      3237.13 (   0.00%)     2198.39 (  32.09%)
> Stddev    tput-7      2534.27 (   0.00%)     3261.18 ( -28.68%)
> Stddev    tput-8      3847.37 (   0.00%)     4355.78 ( -13.21%)
> Stddev    tput-9      5278.55 (   0.00%)     4145.06 (  21.47%)
> Stddev    tput-10     5311.08 (   0.00%)    10274.26 ( -93.45%)
> Stddev    tput-11     7537.76 (   0.00%)    16882.17 (-123.97%)
> Stddev    tput-12     5023.29 (   0.00%)    12735.70 (-153.53%)
> Stddev    tput-13     3852.32 (   0.00%)    15385.23 (-299.38%)
> Stddev    tput-14    11859.59 (   0.00%)    24273.56 (-104.67%)
> Stddev    tput-15    16298.10 (   0.00%)    45409.69 (-178.62%)
> Stddev    tput-16     9041.77 (   0.00%)    58839.77 (-550.75%)
> Stddev    tput-17     9322.50 (   0.00%)    77205.45 (-728.16%)
> Stddev    tput-18    16040.01 (   0.00%)    15021.07 (   6.35%)
> Stddev    tput-19     8814.09 (   0.00%)    27509.99 (-212.11%)
> Stddev    tput-20     7812.82 (   0.00%)    31578.68 (-304.19%)
> Stddev    tput-21     6584.58 (   0.00%)     3639.48 (  44.73%)
> Stddev    tput-22     8294.36 (   0.00%)     3033.49 (  63.43%)
> Stddev    tput-23     6887.93 (   0.00%)     5450.38 (  20.87%)
> Stddev    tput-24     6081.83 (   0.00%)      390.32 (  93.58%)
>
> Note that how it reaches the point where the node is almost utilised
> ( near tput-12) that performance drops.  Graphing mpstat on a per-node
> basis shows there is imbalance in the CPU utilisation between nodes.

ok. So the has_spare_capacity condition that i added in
load_too_imbalanced() is probably to aggressive although the goal was
to let task moving on their preferred node

>
> NUMA locality is not bad, scanning and migrations are a bit higher but
> overall this is a problem.
>
> On a 4-socket machine, I see slightly different results -- there is still
> a big drop although not a statistically significant one. However, the
> per-node CPU utilisation is skewed as nodes approach being fully utilised.
>
> I see similarly bad results on a 2-socket Broadwell machine.
>
> I don't have other results yet for other workloads but I'm not optimistic
> that I'll see anything different. I still think that this probably didn't
> show up with hackbench because it's too short-lived for NUMA balancing
> to be a factor.
>
> --
> Mel Gorman
> SUSE Labs