[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20161221185350.GA21332@yexl-desktop>
Date: Thu, 22 Dec 2016 02:53:50 +0800
From: kernel test robot <xiaolong.ye@...el.com>
To: Jan Kara <jack@...e.cz>
Cc: Stephen Rothwell <sfr@...b.auug.org.au>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Dan Williams <dan.j.williams@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [lkp-developer] [mm] c379ee89cd: vm-scalability.throughput 29.8%
improvement
Greeting,
FYI, we noticed a 29.8% improvement of vm-scalability.throughput due to commit:
commit: c379ee89cd61733d1fc16327eb01d7a65223a970 ("mm: provide helper for finishing mkwrite faults")
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
in testcase: vm-scalability
on test machine: 28 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory
with following parameters:
runtime: 300s
size: 1T
test: msync-mt
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
testcase/path_params/tbox_group/run: vm-scalability/300s-1T-msync-mt-performance/lkp-hsw-ep5
14c0048ae3fa1be3 c379ee89cd61733d1fc16327eb
---------------- --------------------------
%stddev change %stddev
\ | \
4555181 30% 5914712 vm-scalability.throughput
1.997e+08 67% 3.33e+08 interrupts.CAL:Function_call_interrupts
25195 94% 48805 vmstat.io.bo
266151 52% 404168 vmstat.system.in
222772 32% 293582 vmstat.system.cs
26.03 27% 32.98 turbostat.%Busy
320 27% 405 turbostat.Avg_MHz
83.62 5% 87.79 turbostat.RAMWatt
81.33 5% 85.34 turbostat.PkgWatt
24837853 6e+07 85525782 latency_stats.sum.wait_on_page_bit.__migration_entry_wait.migration_entry_wait.do_swap_page.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
5117937 ± 10% 3e+07 36496037 latency_stats.sum.call_rwsem_down_write_failed.xfs_ilock.xfs_vn_update_time.file_update_time.xfs_filemap_page_mkwrite.do_page_mkwrite.do_wp_page.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
3072274 ± 9% 2e+07 21091822 latency_stats.sum.call_rwsem_down_write_failed.xfs_ilock.xfs_file_iomap_begin.iomap_apply.iomap_page_mkwrite.xfs_filemap_page_mkwrite.do_page_mkwrite.do_wp_page.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
1233 ± 72% 3e+04 34873 ± 16% latency_stats.sum.call_rwsem_down_write_failed.xfs_ilock.xfs_vn_update_time.file_update_time.xfs_filemap_page_mkwrite.do_page_mkwrite.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
348 ± 80% 2e+04 16815 ± 11% latency_stats.sum.call_rwsem_down_write_failed.xfs_ilock.xfs_file_iomap_begin.iomap_apply.iomap_page_mkwrite.xfs_filemap_page_mkwrite.do_page_mkwrite.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
6.334e+09 76% 1.118e+10 perf-stat.node-store-misses
5442307 56% 8508591 ± 5% perf-stat.cpu-migrations
2.224e+12 49% 3.32e+12 perf-stat.branch-instructions
9.221e+12 46% 1.346e+13 perf-stat.instructions
1.604e+10 46% 2.338e+10 perf-stat.cache-misses
2.252e+12 42% 3.199e+12 perf-stat.dTLB-loads
8.423e+11 36% 1.144e+12 perf-stat.dTLB-stores
7.875e+10 35% 1.063e+11 perf-stat.cache-references
3.121e+09 34% 4.167e+09 perf-stat.iTLB-loads
2.016e+09 33% 2.69e+09 perf-stat.node-stores
2.135e+08 33% 2.832e+08 perf-stat.context-switches
1.079e+09 32% 1.423e+09 ± 3% perf-stat.dTLB-store-misses
22979 ± 6% 32% 30286 ± 7% perf-stat.instructions-per-iTLB-miss
2.12e+09 ± 4% 29% 2.735e+09 ± 11% perf-stat.dTLB-load-misses
1.687e+13 28% 2.161e+13 perf-stat.cpu-cycles
7.131e+09 25% 8.949e+09 perf-stat.node-load-misses
8.29e+09 22% 1.011e+10 perf-stat.branch-misses
0.55 14% 0.62 perf-stat.ipc
3.84e+08 13% 4.342e+08 ± 4% perf-stat.node-loads
20.37 8% 21.99 perf-stat.cache-miss-rate%
75.86 6% 80.60 perf-stat.node-store-miss-rate%
11.43 ± 6% -15% 9.69 ± 7% perf-stat.iTLB-load-miss-rate%
0.37 -18% 0.30 perf-stat.branch-miss-rate%
perf-stat.cpu-cycles
2.2e+13 ++----------------------O-----------------------------------------+
O OO OO O O OO O O O OO O OO |
2.1e+13 ++ O OO |
| |
| |
2e+13 ++ |
| |
1.9e+13 ++ |
| |
1.8e+13 ++ |
| |
| |
1.7e+13 ++* .**.**.*.**.*.**.**.*.* .* *. .**.**.*.**.**.*
* *.**.* * *.*.**.* *.**.* |
1.6e+13 ++----------------------------------------------------------------+
perf-stat.instructions
1.4e+13 ++----------------------------------------------------------------+
O O O O OO O O OO OO O OO |
1.3e+13 ++OO O O OO |
| |
| |
1.2e+13 ++ |
| |
1.1e+13 ++ |
| |
1e+13 ++ |
| |
| **. *.*.* .**.**.*.* **. *.*.**.**.*
9e+12 ++ + * *.* : + * |
*.**.**.* *.**.*.**.**.*.**.* |
8e+12 ++----------------------------------------------------------------+
perf-stat.cache-references
1.15e+11 ++---------------------------------------------------------------+
| O |
1.1e+11 ++ O O O OO O O |
1.05e+11 O+O OO O O OO O O O |
| O O |
1e+11 ++ |
9.5e+10 ++ |
| |
9e+10 ++ |
8.5e+10 ++ |
| |
8e+10 *+**.**.*. .*. .**.**.*.**.**.*.**. .**. *.**.*
7.5e+10 ++ **.**.** **.**. .** ** *.* |
| * |
7e+10 ++---------------------------------------------------------------+
perf-stat.cache-misses
2.6e+10 ++----------------------------------------------------------------+
| |
2.4e+10 ++ O OO O |
O OO O O O OO O OO O OO OO |
2.2e+10 ++ O |
| |
2e+10 ++ |
| |
1.8e+10 ++ |
| |
1.6e+10 ++ **.**.*.**.*.**.**.*.* **.**.*.**.**.*
| + : + |
1.4e+10 *+**.**.* *.**.*.**.**.*.**.* |
| |
1.2e+10 ++----------------------------------------------------------------+
perf-stat.branch-instructions
3.4e+12 ++----------------------O-----------------------------------------+
O OO OO O O OO O O O O OO O OO |
3.2e+12 ++ O O |
| |
3e+12 ++ |
| |
2.8e+12 ++ |
| |
2.6e+12 ++ |
| |
2.4e+12 ++ |
| .* |
2.2e+12 ++ .**.**.*.**.*.**.**.*.* .**.**.* *.**.*
*.**.**.* *.**.*.**.**.*.**.* |
2e+12 ++----------------------------------------------------------------+
perf-stat.dTLB-loads
3.4e+12 ++----------------------------------------------------------------+
| O O O |
3.2e+12 O+OO OO O OO OO O OO O O O OO |
3e+12 ++ |
| |
2.8e+12 ++ |
| |
2.6e+12 ++ |
| |
2.4e+12 ++ |
2.2e+12 ++ **. *.*. *.*.**. *.*.* **.* .*.**.**.*
| + * * * : + * |
2e+12 *+**.**.* *.**.*.**.**.*.**.* |
| |
1.8e+12 ++----------------------------------------------------------------+
perf-stat.dTLB-stores
1.2e+12 ++---------------------------------------------------------------+
O O OO O O O |
1.15e+12 ++OO O O O OO O O O O |
1.1e+12 ++ O O O |
| |
1.05e+12 ++ |
1e+12 ++ |
| |
9.5e+11 ++ |
9e+11 ++ |
| * *. |
8.5e+11 ++ .**. *. *. + *.* .* .**. + * .**.*
8e+11 *+* *.* * * * :.* *.**. .*.**.* .** ** * |
| *.* * ** *.* |
7.5e+11 ++---------------------------------------------------------------+
perf-stat.iTLB-loads
4.4e+09 ++----------------------------------------------------------------+
O O OO O O OO O O O O |
4.2e+09 ++O O OO O O O O O |
4e+09 ++ |
| |
3.8e+09 ++ |
3.6e+09 ++ |
| |
3.4e+09 ++ |
3.2e+09 ++ .* .* |
| *.**.* *.*.* .**.*.* **.**.* *.**.*
3e+09 ++ * * : : |
2.8e+09 *+**. *. + *.**. .* .**.*. *. : |
| * * * * * * |
2.6e+09 ++----------------------------------------------------------------+
perf-stat.node-load-misses
9.5e+09 ++----------------------------------------------------------------+
| O |
9e+09 ++ O O O O OO O O O O O O OO |
O O O O O O |
8.5e+09 ++ |
| |
8e+09 ++ |
| |
7.5e+09 ++ |
| .*.* .*
7e+09 ++ **.**.*.**.*.**.**.*.* **.** *.** |
| + : + |
6.5e+09 *+**. *.* *.**. .**.**.*. *.* |
| * * * |
6e+09 ++----------------------------------------------------------------+
perf-stat.node-stores
2.8e+09 ++----------------------------------------------------------------+
| O O O O O O O O O OO |
2.6e+09 O+O O OO O OO O O |
| |
| |
2.4e+09 ++ |
| |
2.2e+09 ++ |
| |
2e+09 ++ .* .*.**.**.*
| **.**.*.**.*.**.**.* : **.** |
| : : : |
1.8e+09 ++ *. : : .* *. : |
*.**.* * *.**.* *.**.*.* * |
1.6e+09 ++----------------------------------------------------------------+
perf-stat.node-store-misses
1.2e+10 ++----------------------------------------------------------------+
| O O |
1.1e+10 O+OO OO O O OO O O O O OO OO |
| O O |
1e+10 ++ |
| |
9e+09 ++ |
| |
8e+09 ++ |
| |
7e+09 ++ |
| .*.**.**.*
6e+09 ++ **.**.*.**.*.**.**.*.* **.** |
*.**.**. + *.**.*.* .**.*.**. + |
5e+09 ++------*-------------------------------*---------*---------------+
perf-stat.context-switches
2.9e+08 ++----O----O--O---------O-----------------------------------------+
2.8e+08 O+OO O O O O O OO O O OO O OO |
| |
2.7e+08 ++ |
2.6e+08 ++ |
| |
2.5e+08 ++ |
2.4e+08 ++ |
2.3e+08 ++ |
| |
2.2e+08 ++ .*.* |
2.1e+08 ++ **.**.*.**.*.**.**.*.* **.** *.**.*
| : : : |
2e+08 ++* *. : :.* .* *. *. : |
1.9e+08 *+-*-*--*-----------------------*--*-*--*-*--*-*--*---------------+
perf-stat.branch-miss-rate_
0.44 ++-*-*-*-----------------------------*-**------*-**------------------+
*.* * ** *.** * |
0.42 ++ : : : |
0.4 ++ : : : |
| : *. : : |
0.38 ++ : *.*.*.* *. .*. : *.**.*.* |
| *.* ** **.* *.*.**.*
0.36 ++ |
| |
0.34 ++ |
0.32 ++ |
O OO O OO O OO O O OO O |
0.3 ++ O O O OO O O |
| |
0.28 ++-------------------------------------------------------------------+
perf-stat.node-store-miss-rate_
81 O+---O------------------O-O--------------------------------------------+
| OO O O O O OO O O OO O O |
80 ++ O O O |
| |
| |
79 ++ |
| |
78 ++ |
| |
77 ++ |
| |
| *. .*.* *. *. |
76 ++ *. : * *.*.*.**.*. : * .*. *. *.*.*.* *.*.**.*
*.* *.*.* * + .** *.**.*.* *.*.* |
75 ++-------------------------------*-------------------------------------+
vm-scalability.throughput
6e+06 ++-O--O-----------------O----O------------------------------------+
O O O O O OO O O O O OO OO |
5.8e+06 ++ O O |
5.6e+06 ++ |
| |
5.4e+06 ++ |
5.2e+06 ++ |
| |
5e+06 ++ |
4.8e+06 ++ |
| |
4.6e+06 ++ * *.*.* .* *. * .*.**.**.*
4.4e+06 *+**.**. + *.* *.* *.* *.**.**.*.* .**.*.**. + *.** |
| * * * |
4.2e+06 ++----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong
View attachment "config-4.9.0-11883-gc379ee8" of type "text/plain" (155637 bytes)
View attachment "job-script" of type "text/plain" (6613 bytes)
View attachment "job.yaml" of type "text/plain" (4279 bytes)
View attachment "reproduce" of type "text/plain" (1185 bytes)
Powered by blists - more mailing lists