[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140106082048.GA567@localhost>
Date: Mon, 6 Jan 2014 16:20:48 +0800
From: fengguang.wu@...el.com
To: Dave Chinner <david@...morbit.com>
Cc: Glauber Costa <glommer@...allels.com>,
Linux Memory Management List <linux-mm@...ck.org>,
linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
lkp@...ux.intel.com
Subject: [numa shrinker] 9b17c62382: -36.6% regression on sparse file copy
Hi Dave,
We noticed throughput drop in test case
vm-scalability/300s-lru-file-readtwice (*)
between v3.11 and v3.12, and it's still low as of v3.13-rc6:
v3.11 v3.12 v3.13-rc6
--------------- ------------------------- -------------------------
14934707 ~ 0% -48.8% 7647311 ~ 0% -47.6% 7829487 ~ 0% vm-scalability.throughput
^^ ^^^^^^
stddev% change%
The bisect leads us to commit 9b17c62382 ("fs: convert inode and
dentry shrinking to be node aware"). It's not necessarily the root
cause, however it's good to share with you all the changes comparing
to its parent commit:
1d3d4437eae1bb2 9b17c62382dd2e7507984b989
--------------- -------------------------
78.86 ~23% -83.1% 13.35 ~ 3% vm-scalability.stddev
13222395 ~ 2% -36.6% 8380368 ~ 0% vm-scalability.throughput
160 ~ 7% -83.9% 25 ~ 6% numa-vmstat.node2.nr_isolated_file
161 ~ 5% -83.2% 27 ~10% numa-vmstat.node1.nr_isolated_file
322 ~ 5% -87.9% 39 ~12% numa-vmstat.node0.nr_isolated_file
828 ~ 6% -85.1% 123 ~11% proc-vmstat.nr_isolated_file
3.318e+08 ~ 2% -71.6% 94121050 ~ 0% proc-vmstat.pgsteal_direct_normal
3.319e+08 ~ 2% -71.6% 94296363 ~ 0% proc-vmstat.pgscan_direct_normal
81841025 ~ 2% -71.5% 23286086 ~ 0% proc-vmstat.pgscan_direct_dma32
81839986 ~ 2% -71.5% 23285993 ~ 0% proc-vmstat.pgsteal_direct_dma32
2317157 ~ 1% -71.3% 665435 ~ 0% proc-vmstat.allocstall
86404323 ~ 1% -69.7% 26171022 ~ 0% proc-vmstat.pgalloc_dma32
83815289 ~ 2% -57.4% 35703181 ~ 2% numa-numastat.node0.numa_miss
83815794 ~ 2% -57.4% 35704184 ~ 2% numa-numastat.node0.other_node
1.283e+08 ~ 2% -56.2% 56147285 ~ 1% numa-numastat.node0.local_node
1.283e+08 ~ 2% -56.2% 56148288 ~ 1% numa-numastat.node0.numa_hit
50429805 ~ 3% -55.3% 22560307 ~ 2% numa-vmstat.node0.numa_miss
50485560 ~ 3% -55.2% 22616231 ~ 2% numa-vmstat.node0.numa_other
1.19e+08 ~ 3% -55.0% 53536204 ~ 1% numa-numastat.node1.local_node
1.19e+08 ~ 3% -55.0% 53537244 ~ 1% numa-numastat.node1.numa_hit
4.835e+08 ~ 1% -54.9% 2.18e+08 ~ 1% proc-vmstat.numa_local
4.835e+08 ~ 1% -54.9% 2.18e+08 ~ 1% proc-vmstat.numa_hit
1.182e+08 ~ 1% -54.5% 53767747 ~ 0% numa-numastat.node2.local_node
1.182e+08 ~ 1% -54.5% 53768794 ~ 0% numa-numastat.node2.numa_hit
1.179e+08 ~ 2% -53.7% 54601288 ~ 1% numa-numastat.node3.local_node
1.179e+08 ~ 2% -53.7% 54602308 ~ 1% numa-numastat.node3.numa_hit
80067455 ~ 2% -52.5% 38004701 ~ 1% numa-vmstat.node0.numa_local
80123207 ~ 2% -52.5% 38060622 ~ 1% numa-vmstat.node0.numa_hit
75049747 ~ 3% -51.2% 36624834 ~ 0% numa-vmstat.node1.numa_local
75075335 ~ 3% -51.2% 36639687 ~ 0% numa-vmstat.node1.numa_hit
74169375 ~ 2% -50.4% 36795263 ~ 0% numa-vmstat.node2.numa_local
74228290 ~ 2% -50.3% 36865360 ~ 0% numa-vmstat.node2.numa_hit
6.414e+08 ~ 2% -49.3% 3.251e+08 ~ 0% proc-vmstat.pgfree
74088721 ~ 1% -49.6% 37377389 ~ 1% numa-vmstat.node3.numa_local
74158434 ~ 1% -49.5% 37447354 ~ 1% numa-vmstat.node3.numa_hit
5.55e+08 ~ 2% -46.1% 2.989e+08 ~ 1% proc-vmstat.pgalloc_normal
744013 ~ 1% -40.9% 439561 ~ 1% softirqs.RCU
46158212 ~ 0% -40.3% 27537130 ~ 1% numa-numastat.node1.numa_foreign
42093581 ~ 3% -36.1% 26915182 ~ 1% numa-numastat.node2.numa_foreign
3722419 ~ 0% -20.1% 2975619 ~ 0% proc-vmstat.pgfault
1999446 ~ 6% -73.1% 537069 ~ 0% time.involuntary_context_switches
7774 ~ 7% -70.1% 2322 ~ 0% vmstat.system.cs
260 ~ 6% -48.9% 133 ~ 1% time.user_time
35106 ~ 0% -2.4% 34268 ~ 0% time.system_time
(*) The test case basically does
truncate -s 135080058880 /tmp/vm-scalability.img
mkfs.xfs -q /tmp/vm-scalability.img
mount -o loop /tmp/vm-scalability.img /tmp/vm-scalability
nr_cpu=120
for i in $(seq 1 $nr_cpu)
do
sparse_file=/tmp/vm-scalability/sparse-lru-file-readtwice-$i
truncate $sparse_file -s 36650387592
dd if=$sparse_file of=/dev/null &
dd if=$sparse_file of=/dev/null &
done
The test box is a 4-socket machine with 128G memory.
The bisect looks stable, each point below represents a sample run:
vm-scalability.throughput
1.5e+07 *+-*---*--*---*--------------------------------------------------+
| : |
1.4e+07 ++ : .* |
| : ..*.. .. + .*..|
1.3e+07 ++ *...*..*..*...*..*. *..*...*..* + .. *
| * |
1.2e+07 ++ |
| |
1.1e+07 ++ |
| |
1e+07 ++ |
| |
9e+06 ++ |
O O O O O O O O O |
8e+06 ++------------O--------------------------------------------------+
Attached is the kconfig and a dmesg.
Thanks,
Fengguang
View attachment "x86_64-lkp" of type "text/plain" (80580 bytes)
View attachment ".dmesg" of type "text/plain" (234017 bytes)
Powered by blists - more mailing lists