lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 27 Mar 2015 16:49:18 +0800
From:	Huang Ying <ying.huang@...el.com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	LKML <linux-kernel@...r.kernel.org>, LKP ML <lkp@...org>
Subject: Re: [LKP] [mm] 3484b2de949: -46.2% aim7.jobs-per-min

On Wed, 2015-03-25 at 10:54 +0000, Mel Gorman wrote:
> On Mon, Mar 23, 2015 at 04:46:21PM +0800, Huang Ying wrote:
> > > My attention is occupied by the automatic NUMA regression at the moment
> > > but I haven't forgotten this. Even with the high client count, I was not
> > > able to reproduce this so it appears to depend on the number of CPUs
> > > available to stress the allocator enough to bypass the per-cpu allocator
> > > enough to contend heavily on the zone lock. I'm hoping to think of a
> > > better alternative than adding more padding and increasing the cache
> > > footprint of the allocator but so far I haven't thought of a good
> > > alternative. Moving the lock to the end of the freelists would probably
> > > address the problem but still increases the footprint for order-0
> > > allocations by a cache line.
> > 
> > Any update on this?  Do you have some better idea?  I guess this may be
> > fixed via putting some fields that are only read during order-0
> > allocation with the same cache line of lock, if there are any.
> > 
> 
> Sorry for the delay, the automatic NUMA regression took a long time to
> close and it potentially affected anybody with a NUMA machine, not just
> stress tests on large machines.
> 
> Moving it beside other fields shifts the problems. The lock is related
> to the free areas so it really belongs nearby and from my own testing,
> it does not affect mid-sized machines. I'd rather not put the lock in its
> own cache line unless we have to. Can you try the following untested patch
> instead? It is untested but builds and should be safe.
> 
> It'll increase the footprint of the page allocator but so would padding.
> It means it will contend with high-order free page breakups but that
> is not likely to happen during stress tests. It also collides with flags
> but they are relatively rarely updated.
> 
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f279d9c158cd..2782df47101e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -474,16 +474,15 @@ struct zone {
>  	unsigned long		wait_table_bits;
>  
>  	ZONE_PADDING(_pad1_)
> -
> -	/* Write-intensive fields used from the page allocator */
> -	spinlock_t		lock;
> -
>  	/* free areas of different sizes */
>  	struct free_area	free_area[MAX_ORDER];
>  
>  	/* zone flags, see below */
>  	unsigned long		flags;
>  
> +	/* Write-intensive fields used from the page allocator */
> +	spinlock_t		lock;
> +
>  	ZONE_PADDING(_pad2_)
>  
>  	/* Write-intensive fields used by page reclaim */

Stress page allocator tests here shows that the performance restored to
its previous level with the patch above.  I applied your patch on lasted
upstream kernel.  Result is as below:

testbox/testcase/testparams: brickland1/aim7/performance-6000-page_test

c875f421097a55d9  dbdc458f1b7d07f32891509c06  
----------------  --------------------------  
         %stddev     %change         %stddev
             \          |                \  
     84568 ±  1%     +94.3%     164280 ±  1%  aim7.jobs-per-min
   2881944 ±  2%     -35.1%    1870386 ±  8%  aim7.time.voluntary_context_switches
       681 ±  1%      -3.4%        658 ±  0%  aim7.time.user_time
   5538139 ±  0%     -12.1%    4867884 ±  0%  aim7.time.involuntary_context_switches
     44174 ±  1%     -46.0%      23848 ±  1%  aim7.time.system_time
       426 ±  1%     -48.4%        219 ±  1%  aim7.time.elapsed_time
       426 ±  1%     -48.4%        219 ±  1%  aim7.time.elapsed_time.max
       468 ±  1%     -43.1%        266 ±  2%  uptime.boot
     13691 ±  0%     -24.2%      10379 ±  1%  softirqs.NET_RX
    931382 ±  2%     +24.9%    1163065 ±  1%  softirqs.RCU
    407717 ±  2%     -36.3%     259521 ±  9%  softirqs.SCHED
  19690372 ±  0%     -34.8%   12836548 ±  1%  softirqs.TIMER
      2442 ±  1%     -28.9%       1737 ±  5%  vmstat.procs.b
      3016 ±  3%     +19.4%       3603 ±  4%  vmstat.procs.r
    104330 ±  1%     +34.6%     140387 ±  0%  vmstat.system.in
     22172 ±  0%     +48.3%      32877 ±  2%  vmstat.system.cs
      1891 ± 12%     -48.2%        978 ± 10%  numa-numastat.node0.other_node
      1785 ± 14%     -47.7%        933 ±  6%  numa-numastat.node1.other_node
      1790 ± 12%     -47.8%        935 ± 10%  numa-numastat.node2.other_node
      1766 ± 14%     -47.0%        935 ± 12%  numa-numastat.node3.other_node
       426 ±  1%     -48.4%        219 ±  1%  time.elapsed_time.max
       426 ±  1%     -48.4%        219 ±  1%  time.elapsed_time
   5538139 ±  0%     -12.1%    4867884 ±  0%  time.involuntary_context_switches
     44174 ±  1%     -46.0%      23848 ±  1%  time.system_time
   2881944 ±  2%     -35.1%    1870386 ±  8%  time.voluntary_context_switches
   7831898 ±  4%     +31.8%   10325919 ±  5%  meminfo.Active
   7742498 ±  4%     +32.2%   10237222 ±  5%  meminfo.Active(anon)
   7231211 ±  4%     +28.7%    9308183 ±  5%  meminfo.AnonPages
  7.55e+11 ±  4%     +19.6%  9.032e+11 ±  8%  meminfo.Committed_AS
     14010 ±  1%     -17.4%      11567 ±  1%  meminfo.Inactive(anon)
    668946 ±  4%     +40.8%     941815 ± 27%  meminfo.PageTables
     15392 ±  1%     -15.9%      12945 ±  1%  meminfo.Shmem
      1185 ±  0%      -4.4%       1133 ±  0%  turbostat.Avg_MHz
      3.29 ±  6%     -64.5%       1.17 ± 14%  turbostat.CPU%c1
      0.10 ± 12%     -90.3%       0.01 ±  0%  turbostat.CPU%c3
      2.95 ±  3%     +73.9%       5.13 ±  3%  turbostat.CPU%c6
       743 ±  9%     -70.7%        217 ± 17%  turbostat.CorWatt
       300 ±  0%      -9.4%        272 ±  0%  turbostat.PKG_%
      1.58 ±  2%     +59.6%       2.53 ± 20%  turbostat.Pkg%pc2
       758 ±  9%     -69.3%        232 ± 16%  turbostat.PkgWatt
     15.08 ±  0%      +5.4%      15.90 ±  1%  turbostat.RAMWatt
    105729 ±  6%     -47.0%      56005 ± 25%  cpuidle.C1-IVT-4S.usage
 2.535e+08 ± 12%     -62.7%   94532092 ± 22%  cpuidle.C1-IVT-4S.time
 4.386e+08 ±  4%     -79.4%   90246312 ± 23%  cpuidle.C1E-IVT-4S.time
     83425 ±  6%     -71.7%      23571 ± 23%  cpuidle.C1E-IVT-4S.usage
     14237 ±  8%     -79.0%       2983 ± 19%  cpuidle.C3-IVT-4S.usage
 1.242e+08 ±  7%     -87.5%   15462238 ± 18%  cpuidle.C3-IVT-4S.time
     87857 ±  7%     -71.1%      25355 ±  5%  cpuidle.C6-IVT-4S.usage
 2.359e+09 ±  2%     -38.2%  1.458e+09 ±  2%  cpuidle.C6-IVT-4S.time
   1960460 ±  3%     +31.7%    2582336 ±  4%  proc-vmstat.nr_active_anon
      5548 ±  2%     +53.2%       8498 ±  3%  proc-vmstat.nr_alloc_batch
   1830492 ±  3%     +28.4%    2349846 ±  3%  proc-vmstat.nr_anon_pages
      3514 ±  1%     -17.7%       2893 ±  1%  proc-vmstat.nr_inactive_anon
    168712 ±  4%     +40.3%     236768 ± 27%  proc-vmstat.nr_page_table_pages
      3859 ±  1%     -16.1%       3238 ±  1%  proc-vmstat.nr_shmem
   1997823 ±  5%     -27.4%    1450005 ±  5%  proc-vmstat.numa_hint_faults
   1413076 ±  6%     -25.3%    1056268 ±  5%  proc-vmstat.numa_hint_faults_local
      7213 ±  6%     -47.3%       3799 ±  7%  proc-vmstat.numa_other
    406056 ±  3%     -41.9%     236064 ±  6%  proc-vmstat.numa_pages_migrated
   7242333 ±  3%     -29.2%    5130788 ± 10%  proc-vmstat.numa_pte_updates
    406056 ±  3%     -41.9%     236064 ±  6%  proc-vmstat.pgmigrate_success
    484141 ±  3%     +32.7%     642529 ±  5%  numa-vmstat.node0.nr_active_anon
 1.509e+08 ±  0%     -12.6%  1.319e+08 ±  3%  numa-vmstat.node0.numa_hit
    452041 ±  3%     +29.9%     587214 ±  5%  numa-vmstat.node0.nr_anon_pages
      1484 ±  1%     +36.5%       2026 ± 24%  numa-vmstat.node0.nr_alloc_batch
 1.509e+08 ±  0%     -12.6%  1.319e+08 ±  3%  numa-vmstat.node0.numa_local
    493672 ±  8%     +30.5%     644195 ± 11%  numa-vmstat.node1.nr_active_anon
      1481 ±  9%     +52.5%       2259 ±  8%  numa-vmstat.node1.nr_alloc_batch
    462466 ±  8%     +27.4%     589287 ± 10%  numa-vmstat.node1.nr_anon_pages
    485463 ±  6%     +29.1%     626539 ±  4%  numa-vmstat.node2.nr_active_anon
       422 ± 15%     -63.1%        156 ± 38%  numa-vmstat.node2.nr_inactive_anon
     32587 ±  9%     +71.0%      55722 ± 32%  numa-vmstat.node2.nr_page_table_pages
      1365 ±  5%     +68.7%       2303 ± 11%  numa-vmstat.node2.nr_alloc_batch
    453583 ±  6%     +26.1%     572097 ±  4%  numa-vmstat.node2.nr_anon_pages
 1.378e+08 ±  2%      -8.5%   1.26e+08 ±  2%  numa-vmstat.node3.numa_local
    441345 ± 10%     +28.4%     566740 ±  6%  numa-vmstat.node3.nr_anon_pages
 1.378e+08 ±  2%      -8.5%  1.261e+08 ±  2%  numa-vmstat.node3.numa_hit
    471252 ± 10%     +31.9%     621440 ±  7%  numa-vmstat.node3.nr_active_anon
      1359 ±  4%     +75.1%       2380 ± 16%  numa-vmstat.node3.nr_alloc_batch
   1826489 ±  0%     +30.0%    2375174 ±  4%  numa-meminfo.node0.AnonPages
   2774145 ±  8%     +26.1%    3497281 ±  9%  numa-meminfo.node0.MemUsed
   1962338 ±  0%     +32.5%    2599292 ±  4%  numa-meminfo.node0.Active(anon)
   1985987 ±  0%     +32.0%    2621356 ±  4%  numa-meminfo.node0.Active
   2768321 ±  6%     +27.7%    3534224 ± 11%  numa-meminfo.node1.MemUsed
   1935382 ±  5%     +34.2%    2597532 ± 11%  numa-meminfo.node1.Active
   1913696 ±  5%     +34.6%    2575266 ± 11%  numa-meminfo.node1.Active(anon)
   1784346 ±  6%     +31.7%    2349891 ± 10%  numa-meminfo.node1.AnonPages
      1678 ± 15%     -62.7%        625 ± 39%  numa-meminfo.node2.Inactive(anon)
   2532834 ±  4%     +27.4%    3227116 ±  8%  numa-meminfo.node2.MemUsed
    132885 ±  9%     +67.9%     223159 ± 32%  numa-meminfo.node2.PageTables
   2004439 ±  5%     +26.1%    2528019 ±  5%  numa-meminfo.node2.Active
   1856674 ±  5%     +23.0%    2283461 ±  5%  numa-meminfo.node2.AnonPages
   1981962 ±  5%     +26.4%    2505422 ±  5%  numa-meminfo.node2.Active(anon)
   1862203 ±  8%     +33.0%    2476954 ±  6%  numa-meminfo.node3.Active(anon)
   1883841 ±  7%     +32.6%    2498686 ±  6%  numa-meminfo.node3.Active
   2572461 ± 11%     +24.2%    3195556 ±  8%  numa-meminfo.node3.MemUsed
   1739646 ±  8%     +29.4%    2250696 ±  6%  numa-meminfo.node3.AnonPages

Best Regards,
Huang, Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ