[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171225024513.GB31543@yexl-desktop>
Date: Mon, 25 Dec 2017 10:45:13 +0800
From: kernel test robot <xiaolong.ye@...el.com>
To: Scott Mayhew <smayhew@...hat.com>
Cc: Anna Schumaker <Anna.Schumaker@...app.com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [lkp-robot] [nfs] dc4fd9ab01: fsmark.app_overhead 850.9% regression
Greeting,
FYI, we noticed a 850.9% regression of fsmark.app_overhead due to commit:
commit: dc4fd9ab01ab379ae5af522b3efd4187a7c30a31 ("nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: fsmark
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory
with following parameters:
iterations: 1x
nr_threads: 64t
disk: 1BRD_48G
fs: f2fs
fs2: nfsv4
filesize: 4M
test_size: 40G
sync_method: fsyncBeforeClose
cpufreq_governor: performance
test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload.
test-url: https://sourceforge.net/projects/fsmark/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-7/performance/1BRD_48G/4M/nfsv4/f2fs/1x/x86_64-rhel-7.2/64t/debian-x86_64-2016-08-31.cgz/fsyncBeforeClose/ivb44/40G/fsmark
commit:
ccede75985 ("xprtrdma: Spread reply processing over more CPUs")
dc4fd9ab01 ("nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests")
ccede7598588ae34 dc4fd9ab01ab379ae5af522b3e
---------------- --------------------------
%stddev %change %stddev
\ | \
13296395 ± 7% +850.9% 1.264e+08 fsmark.app_overhead
1230639 -2.7% 1197079 fsmark.time.voluntary_context_switches
1.08 +0.1 1.21 mpstat.cpu.iowait%
6.00 +16.7% 7.00 nfsstat.Client.nfs.v4.getattr
910.00 ± 2% -9.5% 823.50 proc-vmstat.kswapd_low_wmark_hit_quickly
23577 -2.2% 23058 vmstat.system.cs
53057401 ± 2% +18.5% 62851465 cpuidle.C3.time
29194791 ± 44% -54.6% 13240888 ± 33% cpuidle.POLL.time
807076 ± 29% -47.3% 425456 ± 13% numa-numastat.node0.numa_miss
812559 ± 29% -46.9% 431720 ± 12% numa-numastat.node0.other_node
807076 ± 29% -47.3% 425456 ± 13% numa-numastat.node1.numa_foreign
6413 +14.7% 7358 numa-meminfo.node0.KernelStack
2369 ± 8% +46.6% 3474 ± 7% numa-meminfo.node0.PageTables
6744 -12.4% 5908 ± 2% numa-meminfo.node1.KernelStack
3512 ± 6% -31.9% 2392 ± 13% numa-meminfo.node1.PageTables
565.50 ± 64% +79.5% 1015 ± 4% slabinfo.btrfs_extent_buffer.active_objs
565.50 ± 64% +79.5% 1015 ± 4% slabinfo.btrfs_extent_buffer.num_objs
1827 ± 4% +20.9% 2210 ± 5% slabinfo.fsnotify_mark_connector.active_objs
1827 ± 4% +20.9% 2210 ± 5% slabinfo.fsnotify_mark_connector.num_objs
809.25 ± 9% +16.9% 946.00 ± 4% slabinfo.nfsd4_openowners.active_objs
809.25 ± 9% +16.9% 946.00 ± 4% slabinfo.nfsd4_openowners.num_objs
194.75 ± 2% -5.0% 185.00 turbostat.Avg_MHz
0.77 ± 2% +0.1 0.91 turbostat.C3%
0.65 ± 2% +22.7% 0.80 turbostat.CPU%c3
62.34 ± 2% -4.5% 59.52 turbostat.CorWatt
70.25 ± 3% -6.8% 65.50 turbostat.PkgTmp
93.02 -3.4% 89.82 turbostat.PkgWatt
6411 +14.7% 7357 numa-vmstat.node0.nr_kernel_stack
590.75 ± 8% +46.7% 866.75 ± 7% numa-vmstat.node0.nr_page_table_pages
120900 ± 13% +27.2% 153728 ± 6% numa-vmstat.node0.numa_foreign
77163 ± 42% -49.3% 39123 ± 24% numa-vmstat.node0.numa_miss
82748 ± 35% -45.1% 45452 ± 25% numa-vmstat.node0.numa_other
6741 -12.4% 5907 ± 2% numa-vmstat.node1.nr_kernel_stack
877.00 ± 6% -31.9% 597.50 ± 13% numa-vmstat.node1.nr_page_table_pages
77170 ± 42% -49.3% 39127 ± 24% numa-vmstat.node1.numa_foreign
120916 ± 13% +27.2% 153763 ± 6% numa-vmstat.node1.numa_miss
285504 ± 5% +11.4% 318085 ± 3% numa-vmstat.node1.numa_other
1737 ± 10% -34.5% 1137 ± 50% sched_debug.cfs_rq:/.exec_clock.max
281.24 ± 5% -33.3% 187.70 ± 48% sched_debug.cfs_rq:/.exec_clock.stddev
417.83 ± 4% -16.9% 347.04 ± 22% sched_debug.cfs_rq:/.runnable_load_avg.max
553.42 ± 45% -37.5% 346.04 ± 22% sched_debug.cpu.cpu_load[0].max
129.77 ± 7% -29.0% 92.14 ± 37% sched_debug.cpu.cpu_load[0].stddev
388.83 ± 14% -24.7% 292.79 ± 26% sched_debug.cpu.cpu_load[2].max
347.25 ± 8% -27.3% 252.50 ± 34% sched_debug.cpu.cpu_load[3].max
320.17 ± 2% -30.7% 222.00 ± 42% sched_debug.cpu.cpu_load[4].max
62796 ± 6% -31.8% 42829 ± 20% sched_debug.cpu.sched_count.max
7742 ± 6% -26.2% 5714 ± 14% sched_debug.cpu.sched_count.stddev
1.374e+11 ± 3% -11.8% 1.212e+11 ± 5% perf-stat.branch-instructions
0.93 ± 5% +0.1 1.07 ± 7% perf-stat.branch-miss-rate%
4.55e+09 ± 3% -6.8% 4.242e+09 ± 3% perf-stat.cache-misses
9.994e+09 ± 2% -5.4% 9.455e+09 ± 3% perf-stat.cache-references
3386653 -1.4% 3340360 perf-stat.context-switches
1.559e+12 -11.8% 1.375e+12 ± 4% perf-stat.cpu-cycles
16599 -25.5% 12360 perf-stat.cpu-migrations
1.712e+11 ± 3% -8.6% 1.564e+11 perf-stat.dTLB-loads
7.092e+11 ± 3% -11.7% 6.263e+11 ± 5% perf-stat.instructions
7701 ± 4% -12.5% 6741 ± 5% perf-stat.instructions-per-iTLB-miss
39.52 -0.5 39.06 perf-stat.node-load-miss-rate%
9.658e+08 -4.0% 9.27e+08 perf-stat.node-load-misses
22.56 -0.6 21.96 perf-stat.node-store-miss-rate%
5.902e+08 -3.8% 5.678e+08 perf-stat.node-store-misses
12.98 ± 48% -1.6 11.41 ± 32% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle
13.80 ± 11% -0.1 13.66 ± 10% perf-profile.calltrace.cycles-pp.__irqentry_text_start.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle
13.64 ± 10% -0.1 13.51 ± 11% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter_state.cpuidle_enter.call_cpuidle
87.54 -0.1 87.48 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle.cpu_startup_entry
86.74 -0.0 86.71 perf-profile.calltrace.cycles-pp.cpuidle_enter.call_cpuidle.do_idle.cpu_startup_entry.start_secondary
86.83 -0.0 86.82 perf-profile.calltrace.cycles-pp.call_cpuidle.do_idle.cpu_startup_entry.start_secondary.verify_cpu
93.11 +0.0 93.12 perf-profile.calltrace.cycles-pp.verify_cpu
5.83 ± 9% +0.1 5.89 ± 14% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter_state.cpuidle_enter
90.70 +0.1 90.79 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.verify_cpu
91.28 +0.2 91.44 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.verify_cpu
91.28 +0.2 91.44 perf-profile.calltrace.cycles-pp.start_secondary.verify_cpu
57.64 ± 6% +1.5 59.11 ± 3% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle
13.04 ± 49% -1.6 11.41 ± 32% perf-profile.children.cycles-pp.poll_idle
88.49 -0.2 88.33 perf-profile.children.cycles-pp.cpuidle_enter
14.31 ± 10% -0.1 14.17 ± 10% perf-profile.children.cycles-pp.__irqentry_text_start
14.06 ± 10% -0.1 13.92 ± 10% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
88.60 -0.1 88.49 perf-profile.children.cycles-pp.call_cpuidle
87.57 -0.1 87.51 perf-profile.children.cycles-pp.cpuidle_enter_state
92.53 -0.1 92.47 perf-profile.children.cycles-pp.do_idle
93.11 +0.0 93.12 perf-profile.children.cycles-pp.verify_cpu
93.11 +0.0 93.12 perf-profile.children.cycles-pp.cpu_startup_entry
6.00 ± 9% +0.1 6.09 ± 14% perf-profile.children.cycles-pp.hrtimer_interrupt
91.28 +0.2 91.44 perf-profile.children.cycles-pp.start_secondary
57.72 ± 6% +1.5 59.18 ± 3% perf-profile.children.cycles-pp.intel_idle
13.02 ± 49% -1.6 11.41 ± 32% perf-profile.self.cycles-pp.poll_idle
57.72 ± 6% +1.5 59.18 ± 3% perf-profile.self.cycles-pp.intel_idle
fsmark.app_overhead
1.6e+08 +-+---------------------------------------------------------------+
| O |
1.4e+08 OO+OOOOOO |
1.2e+08 +-+ OOOOOOO O |
| |
1e+08 +-+ |
| |
8e+07 +-+ |
| |
6e+07 +-+ |
4e+07 +-+ |
| |
2e+07 +-+ + |
|+ ++ + ++++++++.++++++++++++++++++++++++++++++++.++++++++++++++++|
0 +-+---------------------------------------------------------------+
perf-stat.cpu-migrations
20000 +-+-----------------------------------------------------------------+
18000 +-+ + |
| ::+ : ++++++++. ++ ++++ ++++++.++ ++ +++++ + +++++++|
16000 +-++ + ++++++++ ++ + + ++ + + ++.+ |
14000 +-+ : : |
| O: :OOOOOOOO |
12000 OOOOO OO:O |
10000 +-+ : : |
8000 +-+ : : |
| :: |
6000 +-+ :: |
4000 +-+ :: |
| : |
2000 +-+ : |
0 +-+-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong
View attachment "config-4.14.0-rc5-00068-gdc4fd9a" of type "text/plain" (163335 bytes)
View attachment "job-script" of type "text/plain" (7670 bytes)
View attachment "job.yaml" of type "text/plain" (5275 bytes)
View attachment "reproduce" of type "text/plain" (2124 bytes)
Powered by blists - more mailing lists