lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220525124933.GA3441@techsingularity.net>
Date:   Wed, 25 May 2022 13:49:33 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Valentin Schneider <valentin.schneider@....com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Ying Huang <ying.huang@...el.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 0/4] Mitigate inconsistent NUMA imbalance behaviour

On Tue, May 24, 2022 at 06:01:07PM +0200, Vincent Guittot wrote:
> > This is the min, max and range of run time for mg.D parallelised with ~25%
> > of the CPUs parallelised by MPICH running on a 2-socket machine (80 CPUs,
> > 16 active for mg.D due to limitations of mg.D).
> >
> > v5.3                                     Min  95.84 Max  96.55 Range   0.71 Mean  96.16
> > v5.7                                     Min  95.44 Max  96.51 Range   1.07 Mean  96.14
> > v5.8                                     Min  96.02 Max 197.08 Range 101.06 Mean 154.70
> > v5.12                                    Min 104.45 Max 111.03 Range   6.58 Mean 105.94
> > v5.13                                    Min 104.38 Max 170.37 Range  65.99 Mean 117.35
> > v5.13-revert-c6f886546cb8                Min 104.40 Max 110.70 Range   6.30 Mean 105.68
> > v5.18rc4-baseline                        Min 110.78 Max 169.84 Range  59.06 Mean 131.22
> > v5.18rc4-revert-c6f886546cb8             Min 113.98 Max 117.29 Range   3.31 Mean 114.71
> > v5.18rc4-this_series                     Min  95.56 Max 163.97 Range  68.41 Mean 105.39
> > v5.18rc4-this_series-revert-c6f886546cb8 Min  95.56 Max 104.86 Range   9.30 Mean  97.00
> 
> I'm interested to understand why such instability can be introduced by
> c6f886546cb8 as it aims to do the opposite by not waking up a random
> idle cpu but using the current cpu which is becoming idle, instead. I
> haven't been able to reproduce your problem with my current setup but
> I assume this is specific to some use cases so I will try to reproduce
> the mg.D test above. If you have more details on the setup to ease the
> reproduction of the problem I'm interested.
> 

Thanks Vincent,

The most straight-forward way to reproduce is via mmtests.

# git clone https://github.com/gormanm/mmtests/
# cd mmtests
# ./bin/generate-generic-configs
# ./run-mmtests.sh --run-monitor --config configs/config-hpc-nas-mpich-quarter-mgD-many test-mgD-many
# cd work/log
# ../../compare-kernels.sh

nas-mpich-mg NAS Time
                                 test
                             mgD-many
Min       mg.D       95.80 (   0.00%)
Amean     mg.D      110.77 (   0.00%)
Stddev    mg.D       21.55 (   0.00%)
CoeffVar  mg.D       19.46 (   0.00%)
Max       mg.D      155.35 (   0.00%)
BAmean-50 mg.D       96.05 (   0.00%)
BAmean-95 mg.D      107.83 (   0.00%)
BAmean-99 mg.D      109.23 (   0.00%)

Note the min of 95.80 seconds, max of 155.35 and high stddev indicating
the results are not stable.

The generated config is for openSUSE so it may not work for you. After
installing the mpich package, you'll need to adjust these lines

export NAS_MPICH_PATH=/usr/$MMTESTS_LIBDIR/mpi/gcc/$NAS_MPICH_VERSION/bin
export NAS_MPICH_LIBPATH=/usr/$MMTESTS_LIBDIR/mpi/gcc/$NAS_MPICH_VERSION/$MMTESTS_LIBDIR

NAS_MPICH_PATH and NAS_MPICH_LIBPATH need to point to the bin and lib
path for the mpich package your distribution ships.

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ