lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDnbnNLgE40Xk1r_Hv6J3=wUkLzV15cdmijoVuf9Cy2+A@mail.gmail.com>
Date:   Tue, 31 May 2022 12:26:51 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Valentin Schneider <valentin.schneider@....com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Ying Huang <ying.huang@...el.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 0/4] Mitigate inconsistent NUMA imbalance behaviour

On Wed, 25 May 2022 at 14:49, Mel Gorman <mgorman@...hsingularity.net> wrote:
>
> On Tue, May 24, 2022 at 06:01:07PM +0200, Vincent Guittot wrote:
> > > This is the min, max and range of run time for mg.D parallelised with ~25%
> > > of the CPUs parallelised by MPICH running on a 2-socket machine (80 CPUs,
> > > 16 active for mg.D due to limitations of mg.D).
> > >
> > > v5.3                                     Min  95.84 Max  96.55 Range   0.71 Mean  96.16
> > > v5.7                                     Min  95.44 Max  96.51 Range   1.07 Mean  96.14
> > > v5.8                                     Min  96.02 Max 197.08 Range 101.06 Mean 154.70
> > > v5.12                                    Min 104.45 Max 111.03 Range   6.58 Mean 105.94
> > > v5.13                                    Min 104.38 Max 170.37 Range  65.99 Mean 117.35
> > > v5.13-revert-c6f886546cb8                Min 104.40 Max 110.70 Range   6.30 Mean 105.68
> > > v5.18rc4-baseline                        Min 110.78 Max 169.84 Range  59.06 Mean 131.22
> > > v5.18rc4-revert-c6f886546cb8             Min 113.98 Max 117.29 Range   3.31 Mean 114.71
> > > v5.18rc4-this_series                     Min  95.56 Max 163.97 Range  68.41 Mean 105.39
> > > v5.18rc4-this_series-revert-c6f886546cb8 Min  95.56 Max 104.86 Range   9.30 Mean  97.00
> >
> > I'm interested to understand why such instability can be introduced by
> > c6f886546cb8 as it aims to do the opposite by not waking up a random
> > idle cpu but using the current cpu which is becoming idle, instead. I
> > haven't been able to reproduce your problem with my current setup but
> > I assume this is specific to some use cases so I will try to reproduce
> > the mg.D test above. If you have more details on the setup to ease the
> > reproduction of the problem I'm interested.
> >
>
> Thanks Vincent,
>
> The most straight-forward way to reproduce is via mmtests.
>
> # git clone https://github.com/gormanm/mmtests/
> # cd mmtests
> # ./bin/generate-generic-configs
> # ./run-mmtests.sh --run-monitor --config configs/config-hpc-nas-mpich-quarter-mgD-many test-mgD-many
> # cd work/log
> # ../../compare-kernels.sh
>
> nas-mpich-mg NAS Time
>                                  test
>                              mgD-many
> Min       mg.D       95.80 (   0.00%)
> Amean     mg.D      110.77 (   0.00%)
> Stddev    mg.D       21.55 (   0.00%)
> CoeffVar  mg.D       19.46 (   0.00%)
> Max       mg.D      155.35 (   0.00%)
> BAmean-50 mg.D       96.05 (   0.00%)
> BAmean-95 mg.D      107.83 (   0.00%)
> BAmean-99 mg.D      109.23 (   0.00%)
>
> Note the min of 95.80 seconds, max of 155.35 and high stddev indicating
> the results are not stable.
>
> The generated config is for openSUSE so it may not work for you. After
> installing the mpich package, you'll need to adjust these lines
>
> export NAS_MPICH_PATH=/usr/$MMTESTS_LIBDIR/mpi/gcc/$NAS_MPICH_VERSION/bin
> export NAS_MPICH_LIBPATH=/usr/$MMTESTS_LIBDIR/mpi/gcc/$NAS_MPICH_VERSION/$MMTESTS_LIBDIR
>
> NAS_MPICH_PATH and NAS_MPICH_LIBPATH need to point to the bin and lib
> path for the mpich package your distribution ships.

I have been able to run your tests on my setup: aarch64 2 nodes * 28
cores * 4 threads. But I can't reproduce the problem, results stay
stable before and after reverting c6f886546cb8.

I will continue to try to reproduce it

nas-mpich-mg NAS Time
                                 test                   test
                     mgD-many-v5.18-0 mgD-many-v5.18-revert-0
Min       mg.D       78.76 (   0.00%)       78.78 (  -0.03%)
Amean     mg.D       81.13 (   0.00%)       81.45 *  -0.40%*
Stddev    mg.D        0.96 (   0.00%)        1.12 ( -16.84%)
CoeffVar  mg.D        1.18 (   0.00%)        1.37 ( -16.38%)
Max       mg.D       82.71 (   0.00%)       82.91 (  -0.24%)
BAmean-50 mg.D       80.41 (   0.00%)       80.65 (  -0.30%)
BAmean-95 mg.D       81.02 (   0.00%)       81.34 (  -0.39%)
BAmean-99 mg.D       81.07 (   0.00%)       81.40 (  -0.40%)

>
> --
> Mel Gorman
> SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ