[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20161114171800.GA8780@yexl-desktop>
Date: Tue, 15 Nov 2016 01:18:00 +0800
From: kernel test robot <xiaolong.ye@...el.com>
To: Jaegeuk Kim <jaegeuk@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Stephen Rothwell <sfr@...b.auug.org.au>, lkp@...org
Subject: [lkp] [f2fs] 442d0256a5: fsmark.files_per_sec 46.1% improvement
Greeting,
FYI, we noticed a 46.1% improvement of fsmark.files_per_sec due to commit:
commit 442d0256a5407a1b89d505f0346d92bf14bb1bf5 ("f2fs: remove percpu_count due to performance regression")
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
in testcase: fsmark
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory
with following parameters:
iterations: 1x
nr_threads: 1t
disk: 1BRD_48G
fs: f2fs
filesize: 4M
test_size: 40G
sync_method: NoSync
cpufreq_governor: performance
test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload.
test-url: https://sourceforge.net/projects/fsmark/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-6/performance/1BRD_48G/4M/f2fs/1x/x86_64-rhel-7.2/1t/debian-x86_64-2016-08-31.cgz/NoSync/ivb44/40G/fsmark
commit:
670be5e771 ("f2fs: make clean inodes when flushing inode page")
442d0256a5 ("f2fs: remove percpu_count due to performance regression")
670be5e771171195 442d0256a5407a1b89d505f034
---------------- --------------------------
%stddev %change %stddev
\ | \
242.02 ± 1% +46.1% 353.62 ± 0% fsmark.files_per_sec
42.33 ± 1% -31.6% 28.97 ± 0% fsmark.time.elapsed_time
42.33 ± 1% -31.6% 28.97 ± 0% fsmark.time.elapsed_time.max
55.50 ± 0% +43.7% 79.75 ± 1% fsmark.time.percent_of_cpu_this_job_got
2773 ± 1% -64.0% 997.25 ± 5% fsmark.time.voluntary_context_switches
34455 ± 2% -16.8% 28678 ± 0% interrupts.CAL:Function_call_interrupts
7351302 ± 0% -15.8% 6191740 ± 1% meminfo.Dirty
16.50 ± 31% -89.4% 1.75 ±109% numa-numastat.node1.other_node
1873 ± 33% +385.7% 9099 ± 79% softirqs.NET_RX
821551 ± 1% +45.3% 1193734 ± 0% vmstat.io.bo
6944 ± 7% -11.5% 6145 ± 4% slabinfo.cred_jar.active_objs
6944 ± 7% -11.5% 6145 ± 4% slabinfo.cred_jar.num_objs
5230 ± 8% +18.8% 6211 ± 2% sched_debug.cpu.nr_switches.max
11.00 ± 26% +100.0% 22.00 ± 8% sched_debug.cpu.nr_uninterruptible.max
4.01 ± 20% +25.5% 5.03 ± 4% sched_debug.cpu.nr_uninterruptible.stddev
34291 ±132% -54.7% 15533 ± 3% latency_stats.avg.max
46416 ± 83% -35.7% 29824 ± 0% latency_stats.max.max
18117134 ± 2% -69.2% 5577804 ± 6% latency_stats.sum.balance_dirty_pages.balance_dirty_pages_ratelimited.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.[f2fs].__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
18117134 ± 2% -69.2% 5577804 ± 6% latency_stats.sum.max
6.40 ± 2% +9.1% 6.98 ± 0% turbostat.%Busy
166.00 ± 2% +13.0% 187.50 ± 0% turbostat.Avg_MHz
26.09 ± 3% -9.7% 23.55 ± 7% turbostat.Pkg%pc2
6.26 ± 1% +11.1% 6.96 ± 1% turbostat.RAMWatt
26255 ± 0% -12.2% 23058 ± 1% cpuidle.C1-IVT.usage
1083489 ± 6% -30.9% 748928 ± 38% cpuidle.C1E-IVT.time
11379459 ± 25% -56.7% 4925050 ± 39% cpuidle.C3-IVT.time
1.929e+09 ± 1% -31.2% 1.328e+09 ± 0% cpuidle.C6-IVT.time
2280801 ± 2% -30.9% 1575163 ± 0% cpuidle.C6-IVT.usage
790.00 ± 22% -73.8% 207.00 ± 55% proc-vmstat.kswapd_low_wmark_hit_quickly
1837687 ± 0% -15.8% 1547845 ± 1% proc-vmstat.nr_dirty
4095 ± 25% -56.9% 1765 ± 25% proc-vmstat.nr_vmscan_immediate_reclaim
1837844 ± 0% -15.8% 1548021 ± 1% proc-vmstat.nr_zone_write_pending
1043 ± 25% -62.9% 387.25 ± 57% proc-vmstat.pageoutrun
100774 ± 1% -26.3% 74263 ± 1% proc-vmstat.pgfault
67078 ± 33% +55.0% 103946 ± 24% numa-meminfo.node0.Active
37255 ± 28% +85.2% 68987 ± 19% numa-meminfo.node0.Active(anon)
11658 ± 46% +150.0% 29143 ± 30% numa-meminfo.node0.AnonHugePages
35098 ± 27% +91.7% 67270 ± 21% numa-meminfo.node0.AnonPages
3756859 ± 8% -19.7% 3017524 ± 4% numa-meminfo.node0.Dirty
52647 ± 19% -62.5% 19762 ± 67% numa-meminfo.node1.Active(anon)
29839 ± 20% -84.9% 4510 ± 77% numa-meminfo.node1.AnonHugePages
52124 ± 18% -64.4% 18556 ± 74% numa-meminfo.node1.AnonPages
10638108 ± 42% +32.6% 14110975 ± 38% numa-meminfo.node1.MemFree
9308 ± 28% +85.2% 17241 ± 19% numa-vmstat.node0.nr_active_anon
8768 ± 27% +91.8% 16815 ± 21% numa-vmstat.node0.nr_anon_pages
939188 ± 8% -19.7% 754348 ± 4% numa-vmstat.node0.nr_dirty
9308 ± 28% +85.3% 17247 ± 19% numa-vmstat.node0.nr_zone_active_anon
939274 ± 8% -19.7% 754420 ± 4% numa-vmstat.node0.nr_zone_write_pending
13154 ± 19% -62.5% 4929 ± 67% numa-vmstat.node1.nr_active_anon
13026 ± 18% -64.5% 4629 ± 75% numa-vmstat.node1.nr_anon_pages
2660256 ± 42% +32.6% 3527067 ± 38% numa-vmstat.node1.nr_free_pages
13154 ± 19% -62.5% 4929 ± 67% numa-vmstat.node1.nr_zone_active_anon
11.00 ± 25% -90.9% 1.00 ±173% numa-vmstat.node1.numa_other
5.459e+10 ± 5% -45.7% 2.963e+10 ± 5% perf-stat.branch-instructions
3.795e+08 ± 2% -40.6% 2.255e+08 ± 6% perf-stat.branch-misses
63.89 ± 0% +5.3% 67.28 ± 1% perf-stat.cache-miss-rate%
2.509e+09 ± 3% -7.3% 2.326e+09 ± 5% perf-stat.cache-misses
3.927e+09 ± 3% -12.0% 3.455e+09 ± 3% perf-stat.cache-references
71651 ± 2% -32.3% 48512 ± 3% perf-stat.context-switches
4.504e+11 ± 5% -25.4% 3.361e+11 ± 5% perf-stat.cpu-cycles
1461 ± 5% -25.8% 1085 ± 2% perf-stat.cpu-migrations
3.19 ± 2% -91.1% 0.28 ± 12% perf-stat.dTLB-load-miss-rate%
1.748e+09 ± 3% -94.5% 95740875 ± 10% perf-stat.dTLB-load-misses
5.302e+10 ± 0% -36.3% 3.376e+10 ± 5% perf-stat.dTLB-loads
3.923e+10 ± 1% -27.2% 2.854e+10 ± 0% perf-stat.dTLB-stores
70.07 ± 1% -6.4% 65.61 ± 2% perf-stat.iTLB-load-miss-rate%
21266825 ± 1% -30.5% 14782098 ± 1% perf-stat.iTLB-load-misses
9081380 ± 2% -14.4% 7773636 ± 9% perf-stat.iTLB-loads
3.06e+11 ± 5% -46.2% 1.648e+11 ± 5% perf-stat.instructions
14385 ± 4% -22.5% 11146 ± 4% perf-stat.instructions-per-iTLB-miss
0.68 ± 2% -27.9% 0.49 ± 1% perf-stat.ipc
88712 ± 1% -28.0% 63845 ± 1% perf-stat.minor-faults
32.29 ± 1% +4.8% 33.84 ± 2% perf-stat.node-store-miss-rate%
88736 ± 1% -28.0% 63861 ± 1% perf-stat.page-faults
8.76 ± 13% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.__percpu_counter_sum.f2fs_balance_fs.f2fs_write_data_page.f2fs_write_cache_pages.f2fs_write_data_pages
1.79 ± 10% -36.2% 1.14 ± 58% perf-profile.calltrace.cycles-pp.__tick_nohz_idle_enter.tick_nohz_irq_exit.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt
1.59 ± 10% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp._find_next_bit.__percpu_counter_sum.f2fs_balance_fs.f2fs_write_data_page.f2fs_write_cache_pages
1.09 ± 14% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp._find_next_bit.find_next_bit.__percpu_counter_sum.f2fs_balance_fs.f2fs_write_data_page
9.78 ± 12% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.f2fs_balance_fs.f2fs_write_data_page.f2fs_write_cache_pages.f2fs_write_data_pages.do_writepages
21.65 ± 13% -41.1% 12.75 ± 60% perf-profile.calltrace.cycles-pp.f2fs_write_data_page.f2fs_write_cache_pages.f2fs_write_data_pages.do_writepages.__writeback_single_inode
1.48 ± 14% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.find_next_bit.__percpu_counter_sum.f2fs_balance_fs.f2fs_write_data_page.f2fs_write_cache_pages
0.89 ± 9% -40.1% 0.53 ± 57% perf-profile.calltrace.cycles-pp.irq_enter.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle
2.23 ± 9% -34.7% 1.46 ± 59% perf-profile.calltrace.cycles-pp.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle
1.44 ± 9% -35.8% 0.92 ± 58% perf-profile.calltrace.cycles-pp.tick_nohz_stop_sched_tick.__tick_nohz_idle_enter.tick_nohz_irq_exit.irq_exit.smp_apic_timer_interrupt
8.82 ± 12% -100.0% 0.00 ± -1% perf-profile.children.cycles-pp.__percpu_counter_sum
1.86 ± 9% -35.7% 1.20 ± 58% perf-profile.children.cycles-pp.__tick_nohz_idle_enter
2.86 ± 11% -96.1% 0.11 ± 58% perf-profile.children.cycles-pp._find_next_bit
9.83 ± 12% -99.2% 0.08 ± 57% perf-profile.children.cycles-pp.f2fs_balance_fs
21.65 ± 13% -41.1% 12.75 ± 60% perf-profile.children.cycles-pp.f2fs_write_data_page
2.52 ± 12% -94.2% 0.15 ± 61% perf-profile.children.cycles-pp.find_next_bit
0.93 ± 10% -38.6% 0.57 ± 58% perf-profile.children.cycles-pp.irq_enter
1.91 ± 10% -35.0% 1.24 ± 58% perf-profile.children.cycles-pp.tick_nohz_irq_exit
1.51 ± 8% -35.9% 0.97 ± 58% perf-profile.children.cycles-pp.tick_nohz_stop_sched_tick
5.12 ± 14% -100.0% 0.00 ± -1% perf-profile.self.cycles-pp.__percpu_counter_sum
2.86 ± 11% -96.1% 0.11 ± 58% perf-profile.self.cycles-pp._find_next_bit
1.36 ± 11% -92.8% 0.10 ± 59% perf-profile.self.cycles-pp.find_next_bit
fsmark.files_per_sec
400 ++--------------------------------------------------------------------+
O O O O O O O O O O O O O |
350 ++ O O O O O O O O O O O O O O O O O O
300 ++ |
| |
250 *+*..* *..*.*.*..*.*.*..*.*.*..*. .*.*.*.. |
| : : *.*. * |
200 ++ : : |
| : : |
150 ++ : : |
100 ++ : : |
| : : |
50 ++ : |
| : |
0 ++-----*--------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong
View attachment "config-4.9.0-rc1-00088-g442d025" of type "text/plain" (153709 bytes)
View attachment "job-script" of type "text/plain" (7166 bytes)
View attachment "job.yaml" of type "text/plain" (4788 bytes)
View attachment "reproduce" of type "text/plain" (330 bytes)
Powered by blists - more mailing lists