[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1427446158.17170.72.camel@intel.com>
Date: Fri, 27 Mar 2015 16:49:18 +0800
From: Huang Ying <ying.huang@...el.com>
To: Mel Gorman <mgorman@...e.de>
Cc: LKML <linux-kernel@...r.kernel.org>, LKP ML <lkp@...org>
Subject: Re: [LKP] [mm] 3484b2de949: -46.2% aim7.jobs-per-min
On Wed, 2015-03-25 at 10:54 +0000, Mel Gorman wrote:
> On Mon, Mar 23, 2015 at 04:46:21PM +0800, Huang Ying wrote:
> > > My attention is occupied by the automatic NUMA regression at the moment
> > > but I haven't forgotten this. Even with the high client count, I was not
> > > able to reproduce this so it appears to depend on the number of CPUs
> > > available to stress the allocator enough to bypass the per-cpu allocator
> > > enough to contend heavily on the zone lock. I'm hoping to think of a
> > > better alternative than adding more padding and increasing the cache
> > > footprint of the allocator but so far I haven't thought of a good
> > > alternative. Moving the lock to the end of the freelists would probably
> > > address the problem but still increases the footprint for order-0
> > > allocations by a cache line.
> >
> > Any update on this? Do you have some better idea? I guess this may be
> > fixed via putting some fields that are only read during order-0
> > allocation with the same cache line of lock, if there are any.
> >
>
> Sorry for the delay, the automatic NUMA regression took a long time to
> close and it potentially affected anybody with a NUMA machine, not just
> stress tests on large machines.
>
> Moving it beside other fields shifts the problems. The lock is related
> to the free areas so it really belongs nearby and from my own testing,
> it does not affect mid-sized machines. I'd rather not put the lock in its
> own cache line unless we have to. Can you try the following untested patch
> instead? It is untested but builds and should be safe.
>
> It'll increase the footprint of the page allocator but so would padding.
> It means it will contend with high-order free page breakups but that
> is not likely to happen during stress tests. It also collides with flags
> but they are relatively rarely updated.
>
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f279d9c158cd..2782df47101e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -474,16 +474,15 @@ struct zone {
> unsigned long wait_table_bits;
>
> ZONE_PADDING(_pad1_)
> -
> - /* Write-intensive fields used from the page allocator */
> - spinlock_t lock;
> -
> /* free areas of different sizes */
> struct free_area free_area[MAX_ORDER];
>
> /* zone flags, see below */
> unsigned long flags;
>
> + /* Write-intensive fields used from the page allocator */
> + spinlock_t lock;
> +
> ZONE_PADDING(_pad2_)
>
> /* Write-intensive fields used by page reclaim */
Stress page allocator tests here shows that the performance restored to
its previous level with the patch above. I applied your patch on lasted
upstream kernel. Result is as below:
testbox/testcase/testparams: brickland1/aim7/performance-6000-page_test
c875f421097a55d9 dbdc458f1b7d07f32891509c06
---------------- --------------------------
%stddev %change %stddev
\ | \
84568 ± 1% +94.3% 164280 ± 1% aim7.jobs-per-min
2881944 ± 2% -35.1% 1870386 ± 8% aim7.time.voluntary_context_switches
681 ± 1% -3.4% 658 ± 0% aim7.time.user_time
5538139 ± 0% -12.1% 4867884 ± 0% aim7.time.involuntary_context_switches
44174 ± 1% -46.0% 23848 ± 1% aim7.time.system_time
426 ± 1% -48.4% 219 ± 1% aim7.time.elapsed_time
426 ± 1% -48.4% 219 ± 1% aim7.time.elapsed_time.max
468 ± 1% -43.1% 266 ± 2% uptime.boot
13691 ± 0% -24.2% 10379 ± 1% softirqs.NET_RX
931382 ± 2% +24.9% 1163065 ± 1% softirqs.RCU
407717 ± 2% -36.3% 259521 ± 9% softirqs.SCHED
19690372 ± 0% -34.8% 12836548 ± 1% softirqs.TIMER
2442 ± 1% -28.9% 1737 ± 5% vmstat.procs.b
3016 ± 3% +19.4% 3603 ± 4% vmstat.procs.r
104330 ± 1% +34.6% 140387 ± 0% vmstat.system.in
22172 ± 0% +48.3% 32877 ± 2% vmstat.system.cs
1891 ± 12% -48.2% 978 ± 10% numa-numastat.node0.other_node
1785 ± 14% -47.7% 933 ± 6% numa-numastat.node1.other_node
1790 ± 12% -47.8% 935 ± 10% numa-numastat.node2.other_node
1766 ± 14% -47.0% 935 ± 12% numa-numastat.node3.other_node
426 ± 1% -48.4% 219 ± 1% time.elapsed_time.max
426 ± 1% -48.4% 219 ± 1% time.elapsed_time
5538139 ± 0% -12.1% 4867884 ± 0% time.involuntary_context_switches
44174 ± 1% -46.0% 23848 ± 1% time.system_time
2881944 ± 2% -35.1% 1870386 ± 8% time.voluntary_context_switches
7831898 ± 4% +31.8% 10325919 ± 5% meminfo.Active
7742498 ± 4% +32.2% 10237222 ± 5% meminfo.Active(anon)
7231211 ± 4% +28.7% 9308183 ± 5% meminfo.AnonPages
7.55e+11 ± 4% +19.6% 9.032e+11 ± 8% meminfo.Committed_AS
14010 ± 1% -17.4% 11567 ± 1% meminfo.Inactive(anon)
668946 ± 4% +40.8% 941815 ± 27% meminfo.PageTables
15392 ± 1% -15.9% 12945 ± 1% meminfo.Shmem
1185 ± 0% -4.4% 1133 ± 0% turbostat.Avg_MHz
3.29 ± 6% -64.5% 1.17 ± 14% turbostat.CPU%c1
0.10 ± 12% -90.3% 0.01 ± 0% turbostat.CPU%c3
2.95 ± 3% +73.9% 5.13 ± 3% turbostat.CPU%c6
743 ± 9% -70.7% 217 ± 17% turbostat.CorWatt
300 ± 0% -9.4% 272 ± 0% turbostat.PKG_%
1.58 ± 2% +59.6% 2.53 ± 20% turbostat.Pkg%pc2
758 ± 9% -69.3% 232 ± 16% turbostat.PkgWatt
15.08 ± 0% +5.4% 15.90 ± 1% turbostat.RAMWatt
105729 ± 6% -47.0% 56005 ± 25% cpuidle.C1-IVT-4S.usage
2.535e+08 ± 12% -62.7% 94532092 ± 22% cpuidle.C1-IVT-4S.time
4.386e+08 ± 4% -79.4% 90246312 ± 23% cpuidle.C1E-IVT-4S.time
83425 ± 6% -71.7% 23571 ± 23% cpuidle.C1E-IVT-4S.usage
14237 ± 8% -79.0% 2983 ± 19% cpuidle.C3-IVT-4S.usage
1.242e+08 ± 7% -87.5% 15462238 ± 18% cpuidle.C3-IVT-4S.time
87857 ± 7% -71.1% 25355 ± 5% cpuidle.C6-IVT-4S.usage
2.359e+09 ± 2% -38.2% 1.458e+09 ± 2% cpuidle.C6-IVT-4S.time
1960460 ± 3% +31.7% 2582336 ± 4% proc-vmstat.nr_active_anon
5548 ± 2% +53.2% 8498 ± 3% proc-vmstat.nr_alloc_batch
1830492 ± 3% +28.4% 2349846 ± 3% proc-vmstat.nr_anon_pages
3514 ± 1% -17.7% 2893 ± 1% proc-vmstat.nr_inactive_anon
168712 ± 4% +40.3% 236768 ± 27% proc-vmstat.nr_page_table_pages
3859 ± 1% -16.1% 3238 ± 1% proc-vmstat.nr_shmem
1997823 ± 5% -27.4% 1450005 ± 5% proc-vmstat.numa_hint_faults
1413076 ± 6% -25.3% 1056268 ± 5% proc-vmstat.numa_hint_faults_local
7213 ± 6% -47.3% 3799 ± 7% proc-vmstat.numa_other
406056 ± 3% -41.9% 236064 ± 6% proc-vmstat.numa_pages_migrated
7242333 ± 3% -29.2% 5130788 ± 10% proc-vmstat.numa_pte_updates
406056 ± 3% -41.9% 236064 ± 6% proc-vmstat.pgmigrate_success
484141 ± 3% +32.7% 642529 ± 5% numa-vmstat.node0.nr_active_anon
1.509e+08 ± 0% -12.6% 1.319e+08 ± 3% numa-vmstat.node0.numa_hit
452041 ± 3% +29.9% 587214 ± 5% numa-vmstat.node0.nr_anon_pages
1484 ± 1% +36.5% 2026 ± 24% numa-vmstat.node0.nr_alloc_batch
1.509e+08 ± 0% -12.6% 1.319e+08 ± 3% numa-vmstat.node0.numa_local
493672 ± 8% +30.5% 644195 ± 11% numa-vmstat.node1.nr_active_anon
1481 ± 9% +52.5% 2259 ± 8% numa-vmstat.node1.nr_alloc_batch
462466 ± 8% +27.4% 589287 ± 10% numa-vmstat.node1.nr_anon_pages
485463 ± 6% +29.1% 626539 ± 4% numa-vmstat.node2.nr_active_anon
422 ± 15% -63.1% 156 ± 38% numa-vmstat.node2.nr_inactive_anon
32587 ± 9% +71.0% 55722 ± 32% numa-vmstat.node2.nr_page_table_pages
1365 ± 5% +68.7% 2303 ± 11% numa-vmstat.node2.nr_alloc_batch
453583 ± 6% +26.1% 572097 ± 4% numa-vmstat.node2.nr_anon_pages
1.378e+08 ± 2% -8.5% 1.26e+08 ± 2% numa-vmstat.node3.numa_local
441345 ± 10% +28.4% 566740 ± 6% numa-vmstat.node3.nr_anon_pages
1.378e+08 ± 2% -8.5% 1.261e+08 ± 2% numa-vmstat.node3.numa_hit
471252 ± 10% +31.9% 621440 ± 7% numa-vmstat.node3.nr_active_anon
1359 ± 4% +75.1% 2380 ± 16% numa-vmstat.node3.nr_alloc_batch
1826489 ± 0% +30.0% 2375174 ± 4% numa-meminfo.node0.AnonPages
2774145 ± 8% +26.1% 3497281 ± 9% numa-meminfo.node0.MemUsed
1962338 ± 0% +32.5% 2599292 ± 4% numa-meminfo.node0.Active(anon)
1985987 ± 0% +32.0% 2621356 ± 4% numa-meminfo.node0.Active
2768321 ± 6% +27.7% 3534224 ± 11% numa-meminfo.node1.MemUsed
1935382 ± 5% +34.2% 2597532 ± 11% numa-meminfo.node1.Active
1913696 ± 5% +34.6% 2575266 ± 11% numa-meminfo.node1.Active(anon)
1784346 ± 6% +31.7% 2349891 ± 10% numa-meminfo.node1.AnonPages
1678 ± 15% -62.7% 625 ± 39% numa-meminfo.node2.Inactive(anon)
2532834 ± 4% +27.4% 3227116 ± 8% numa-meminfo.node2.MemUsed
132885 ± 9% +67.9% 223159 ± 32% numa-meminfo.node2.PageTables
2004439 ± 5% +26.1% 2528019 ± 5% numa-meminfo.node2.Active
1856674 ± 5% +23.0% 2283461 ± 5% numa-meminfo.node2.AnonPages
1981962 ± 5% +26.4% 2505422 ± 5% numa-meminfo.node2.Active(anon)
1862203 ± 8% +33.0% 2476954 ± 6% numa-meminfo.node3.Active(anon)
1883841 ± 7% +32.6% 2498686 ± 6% numa-meminfo.node3.Active
2572461 ± 11% +24.2% 3195556 ± 8% numa-meminfo.node3.MemUsed
1739646 ± 8% +29.4% 2250696 ± 6% numa-meminfo.node3.AnonPages
Best Regards,
Huang, Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists