linux-kernel - Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131220155143.GA22595@localhost>
Date:	Fri, 20 Dec 2013 23:51:43 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Alex Shi <alex.shi@...aro.org>, Ingo Molnar <mingo@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	H Peter Anvin <hpa@...or.com>, Linux-X86 <x86@...nel.org>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB
 range flush v2

On Thu, Dec 19, 2013 at 02:34:50PM +0000, Mel Gorman wrote:
> On Wed, Dec 18, 2013 at 03:28:14PM +0800, Fengguang Wu wrote:
> > Hi Mel,
> > 
> > I'd like to share some test numbers with your patches applied on top of v3.13-rc3.
> > 
> > Basically there are
> > 
> > 1) no big performance changes
> > 
> >   76628486           -0.7%   76107841       TOTAL vm-scalability.throughput
> >     407038           +1.2%     412032       TOTAL hackbench.throughput
> >      50307           -1.5%      49549       TOTAL ebizzy.throughput
> > 
> 
> I'm assuming this was an ivybridge processor.

The test boxes brickland2 and lkp-ib03 are ivybridge; lkp-snb01 is sandybridge.

> How many threads were ebizzy tested with?

The below case has params string "400%-5-30", which means

        nr_threads = 400% * nr_cpu = 4 * 48 = 192
        iterations = 5
        duration = 30

      v3.13-rc3       eabb1f89905a0c809d13
---------------  -------------------------  
     50307 ~ 1%      -1.5%      49549 ~ 0%  lkp-ib03/micro/ebizzy/400%-5-30
     50307           -1.5%      49549       TOTAL ebizzy.throughput

> The memory ranges used by the vm scalability benchmarks are
> probably too large to be affected by the series but I'm guessing.

Do you mean these lines?

   3345155 ~ 0%      -0.3%    3335172 ~ 0%  brickland2/micro/vm-scalability/16G-shm-pread-rand-mt
  33249939 ~ 0%      +3.3%   34336155 ~ 1%  brickland2/micro/vm-scalability/1T-shm-pread-seq     

The two cases run 128 threads/processes, each accessing randomly/sequentially
a 64GB shm file concurrently. Sorry the 16G/1T prefixes are somehow misleading.

> I doubt hackbench is doing any flushes and the 1.2% is noise.

Here are the proc-vmstat.nr_tlb_remote_flush numbers for hackbench:

       513 ~ 3%  +4.3e+16%  2.192e+17 ~85%  lkp-nex05/micro/hackbench/800%-process-pipe
       603 ~ 3%  +7.7e+16%  4.669e+17 ~13%  lkp-nex05/micro/hackbench/800%-process-socket
      6124 ~17%  +5.7e+15%  3.474e+17 ~26%  lkp-nex05/micro/hackbench/800%-threads-pipe
      7565 ~49%  +5.5e+15%  4.128e+17 ~68%  lkp-nex05/micro/hackbench/800%-threads-socket
     21252 ~ 6%  +1.3e+15%  2.728e+17 ~39%  lkp-snb01/micro/hackbench/1600%-threads-pipe
     24516 ~16%  +8.3e+14%  2.034e+17 ~53%  lkp-snb01/micro/hackbench/1600%-threads-socket

I tried rebuild kernels with distclean and this time got the below
hackbench changes. I'll queue the hackbench test in all our test boxes
to get a more complete evaluation.

      v3.13-rc3       eabb1f89905a0c809d13  
---------------  -------------------------  
    232925 ~ 0%      -8.4%     213339 ~ 5%  lkp-snb01/micro/hackbench/1600%-process-pipe
    232925           -8.4%     213339       TOTAL hackbench.throughput

This time, the ebizzy params are refreshed and the test case is
exercised in all our test machines. The results that have changed are:

      v3.13-rc3       eabb1f89905a0c809d13  
---------------  -------------------------  
       873 ~ 0%      +0.7%        879 ~ 0%  lkp-a03/micro/ebizzy/200%-100-10
       873 ~ 0%      +0.7%        879 ~ 0%  lkp-a04/micro/ebizzy/200%-100-10
       873 ~ 0%      +0.8%        880 ~ 0%  lkp-a06/micro/ebizzy/200%-100-10
     49242 ~ 0%      -1.2%      48650 ~ 0%  lkp-ib03/micro/ebizzy/200%-100-10
     26176 ~ 0%      -1.6%      25760 ~ 0%  lkp-sbx04/micro/ebizzy/200%-100-10
      2738 ~ 0%      +0.2%       2744 ~ 0%  lkp-t410/micro/ebizzy/200%-100-10
     80776           -1.2%      79793       TOTAL ebizzy.throughput

The full change set is attached.

> > 2) huge proc-vmstat.nr_tlb_* increases
> > 
> >   99986527         +3e+14%  2.988e+20       TOTAL proc-vmstat.nr_tlb_local_flush_one
> >  3.812e+08       +2.2e+13%  8.393e+19       TOTAL proc-vmstat.nr_tlb_remote_flush_received
> >  3.301e+08       +2.2e+13%  7.241e+19       TOTAL proc-vmstat.nr_tlb_remote_flush
> >    5990864       +1.2e+15%  7.032e+19       TOTAL proc-vmstat.nr_tlb_local_flush_all
> > 
> 
> The accounting changes can be mostly explained by "x86: mm: Clean up
> inconsistencies when flushing TLB ranges". flush_all was simply not
> being counted before so I would claim that the old figure was simply
> wrong and did not reflect reality.
> 
> Alterations on when range versus global flushes would affect the other
> counters but arguably it's now behaving as originally intended by the tlb
> flush shift.

OK.

> > Here are the detailed numbers. eabb1f89905a0c809d13 is the HEAD commit
> > with 4 patches applied. The "~ N%" notations are the stddev percent.
> > The "[+-] N%" notations are the increase/decrease percent. The
> > brickland2, lkp-snb01, lkp-ib03 etc. are testbox names.
> > 
> 
> Are positive numbers always better?

Not necessarily. A positive change merely means the absolute numbers
of hackbench.throughput, ebizzy.throughput, etc. are increased in the
new kernel. But yes, for the above stats, it happen to be "the higher,
the better".

> If so, most of these figures look good to me and support the series
> being merged. Please speak up if that is in error.

Agreed, except that I'll need to re-evaluate the hackbench test case.

> I do see a few major regressions like this
> 
> >     324497 ~ 0%    -100.0%          0 ~ 0%  brickland2/micro/vm-scalability/16G-truncate
> 
> but I have no idea what the test is doing and whether something happened
> that the test broke that time or if it's something to be really
> concerned about.

This test case simply creates sparse files, populate them with zeros,
then delete them in parallel. Here $mem is physical memory size 128G,
$nr_cpu is 120.

for i in `seq $nr_cpu`
do      
        create_sparse_file $SPARSE_FILE-$i $((mem / nr_cpu))
        cp $SPARSE_FILE-$i /dev/null
done

for i in `seq $nr_cpu`
do      
        rm $SPARSE_FILE-$i &
done

Thanks,
Fengguang

View attachment "eabb1f89905a0c809d13ec27795ced089c107eb8" of type "text/plain" (74167 bytes)