[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20200805090614.GH23458@shao2-debian>
Date: Wed, 5 Aug 2020 17:06:14 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Vishal Verma <vishal.l.verma@...el.com>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Vivek Goyal <vgoyal@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Erwin Tsaur <erwin.tsaur@...el.com>,
Tony Luck <tony.luck@...el.com>,
LKML <linux-kernel@...r.kernel.org>, linux-nvdimm@...ts.01.org,
lkp@...ts.01.org
Subject: [x86/copy_mc] fb406088ce: fio.read_iops -55.3% regression
Greeting,
FYI, we noticed a -55.3% regression of fio.read_iops due to commit:
commit: fb406088ce0e36122cff0ffeed823023074c7dc6 ("x86/copy_mc: Introduce copy_mc_generic()")
https://git.kernel.org/cgit/linux/kernel/git/nvdimm/nvdimm.git for-5.9/copy_mc
in testcase: fio-basic
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory
with following parameters:
disk: 2pmem
fs: xfs
mount_option: dax
runtime: 200s
nr_task: 50%
time_based: tb
rw: randread
bs: 2M
ioengine: sync
test_size: 200G
cpufreq_governor: performance
ucode: 0x5002f01
test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <rong.a.chen@...el.com>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based/ucode:
2M/gcc-9/performance/2pmem/xfs/sync/x86_64-rhel-8.3/dax/50%/debian-10.4-x86_64-20200603.cgz/200s/randread/lkp-csl-2sp6/200G/fio-basic/tb/0x5002f01
commit:
0a78de3d4b ("x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()")
fb406088ce ("x86/copy_mc: Introduce copy_mc_generic()")
0a78de3d4b7b1b80 fb406088ce0e36122cff0ffeed8
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.44 ± 31% -0.3 0.17 ± 50% fio.latency_1000us%
0.02 ± 65% +1.3 1.31 ± 20% fio.latency_10ms%
0.00 ±173% +0.0 0.03 ± 59% fio.latency_20ms%
97.37 -96.7 0.72 ± 74% fio.latency_2ms%
0.99 ± 10% +96.5 97.44 fio.latency_4ms%
0.56 ± 58% -0.5 0.03 ±100% fio.latency_500us%
74412 -55.3% 33285 fio.read_bw_MBps
1376256 +118.5% 3006464 fio.read_clat_90%_us
1400832 +118.1% 3055616 fio.read_clat_95%_us
1980416 ± 12% +160.6% 5160960 ± 6% fio.read_clat_99%_us
1282194 +124.0% 2872613 fio.read_clat_mean_us
207458 ± 6% +127.8% 472559 ± 7% fio.read_clat_stddev
37206 -55.3% 16642 fio.read_iops
80.95 ± 2% -38.8% 49.50 ± 12% fio.time.user_time
21418 -1.3% 21134 fio.time.voluntary_context_switches
7441285 -55.3% 3328617 fio.workload
30156 ± 4% -24.0% 22920 ± 6% cpuidle.C1.usage
1675 -4.2% 1604 vmstat.system.cs
0.11 ± 3% +0.0 0.14 ± 4% mpstat.cpu.all.soft%
0.51 ± 7% -0.2 0.32 ± 10% mpstat.cpu.all.usr%
114802 -1.7% 112839 proc-vmstat.nr_shmem
20196 ± 5% -21.1% 15925 ± 10% proc-vmstat.pgactivate
63.47 ± 11% -22.2 41.28 ± 10% perf-profile.calltrace.cycles-pp.copy_mc_fragile.copy_mc_to_user.copyout_mc._copy_mc_to_iter.dax_iomap_actor
0.00 +6.6 6.59 ± 27% perf-profile.calltrace.cycles-pp.copy_mc_generic.copy_mc_to_user.copyout_mc._copy_mc_to_iter.dax_iomap_actor
0.00 +31.2 31.16 ± 7% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.copy_mc_generic.copy_mc_to_user.copyout_mc._copy_mc_to_iter
63.59 ± 11% -22.2 41.37 ± 10% perf-profile.children.cycles-pp.copy_mc_fragile
1.54 ± 73% -1.0 0.58 ± 44% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
1.51 ± 74% -1.0 0.56 ± 42% perf-profile.children.cycles-pp.hrtimer_interrupt
0.30 ±112% -0.2 0.05 ± 60% perf-profile.children.cycles-pp.clockevents_program_event
2.07 ± 79% +14.4 16.48 ± 9% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.00 +22.0 22.04 ± 4% perf-profile.children.cycles-pp.copy_mc_generic
62.62 ± 11% -21.6 40.98 ± 10% perf-profile.self.cycles-pp.copy_mc_fragile
0.31 ±123% -0.3 0.03 ±100% perf-profile.self.cycles-pp.ktime_get
0.00 +21.7 21.73 ± 4% perf-profile.self.cycles-pp.copy_mc_generic
188.70 ± 17% +223.4% 610.19 ± 64% sched_debug.cfs_rq:/.exec_clock.min
33475 ± 4% -18.2% 27395 ± 5% sched_debug.cfs_rq:/.exec_clock.stddev
36013 ± 3% -15.7% 30367 ± 3% sched_debug.cfs_rq:/.min_vruntime.stddev
36013 ± 3% -15.7% 30372 ± 3% sched_debug.cfs_rq:/.spread0.stddev
9.33 ± 2% +38.6% 12.94 sched_debug.cpu.clock.stddev
3468 ± 30% -26.4% 2552 ± 7% sched_debug.cpu.nr_switches.stddev
366.94 ± 5% +23.2% 451.94 ± 7% sched_debug.cpu.sched_count.min
2255 ± 8% -24.3% 1706 ± 9% sched_debug.cpu.sched_count.stddev
1145 ± 8% -22.8% 884.59 ± 8% sched_debug.cpu.sched_goidle.stddev
9875 ± 7% -47.4% 5196 ± 18% sched_debug.cpu.ttwu_count.max
1295 ± 5% -32.5% 874.15 ± 11% sched_debug.cpu.ttwu_count.stddev
734.16 ± 6% -16.5% 613.31 ± 5% sched_debug.cpu.ttwu_local.stddev
7840 ± 81% +126.3% 17742 ± 39% softirqs.CPU10.SCHED
24923 ± 11% -36.6% 15811 ± 29% softirqs.CPU12.SCHED
24257 ± 17% -51.8% 11689 ± 48% softirqs.CPU18.SCHED
26470 ± 2% -63.9% 9562 ± 59% softirqs.CPU2.SCHED
23842 ± 20% -50.6% 11786 ± 57% softirqs.CPU20.SCHED
20162 ± 36% -67.9% 6472 ± 74% softirqs.CPU21.SCHED
26425 ± 2% -31.0% 18224 ± 50% softirqs.CPU23.SCHED
13228 ±107% -67.5% 4294 ± 14% softirqs.CPU50.RCU
94827 ± 31% -23.0% 73031 ± 5% softirqs.CPU6.TIMER
5223 ± 62% +162.0% 13683 ± 35% softirqs.CPU60.SCHED
5924 ± 92% +148.7% 14736 ± 35% softirqs.CPU65.SCHED
5072 ± 85% +265.7% 18553 ± 31% softirqs.CPU66.SCHED
6035 ± 87% +191.7% 17605 ± 38% softirqs.CPU68.SCHED
8816 ± 82% +169.1% 23722 ± 18% softirqs.CPU69.SCHED
5842 ± 41% -35.7% 3758 ± 4% softirqs.CPU81.RCU
53.50 ± 31% +77.1% 94.75 ± 36% interrupts.CPU12.RES:Rescheduling_interrupts
3554 ± 33% +42.5% 5066 ± 24% interrupts.CPU18.NMI:Non-maskable_interrupts
3554 ± 33% +42.5% 5066 ± 24% interrupts.CPU18.PMI:Performance_monitoring_interrupts
40.25 ± 91% +243.5% 138.25 ± 11% interrupts.CPU18.RES:Rescheduling_interrupts
7150 ± 11% -42.4% 4121 ± 31% interrupts.CPU19.NMI:Non-maskable_interrupts
7150 ± 11% -42.4% 4121 ± 31% interrupts.CPU19.PMI:Performance_monitoring_interrupts
29.50 ± 48% +422.0% 154.00 ± 20% interrupts.CPU2.RES:Rescheduling_interrupts
69.00 ± 64% +153.6% 175.00 ± 4% interrupts.CPU21.RES:Rescheduling_interrupts
437.50 ± 2% +58.6% 694.00 ± 19% interrupts.CPU22.CAL:Function_call_interrupts
42.50 ± 38% +364.7% 197.50 ± 85% interrupts.CPU35.TLB:TLB_shootdowns
7586 -49.5% 3828 ± 41% interrupts.CPU40.NMI:Non-maskable_interrupts
7586 -49.5% 3828 ± 41% interrupts.CPU40.PMI:Performance_monitoring_interrupts
7126 ± 11% -48.2% 3692 ± 35% interrupts.CPU44.NMI:Non-maskable_interrupts
7126 ± 11% -48.2% 3692 ± 35% interrupts.CPU44.PMI:Performance_monitoring_interrupts
157.75 ± 12% -23.1% 121.25 ± 33% interrupts.CPU46.RES:Rescheduling_interrupts
7205 ± 9% -19.7% 5788 interrupts.CPU47.NMI:Non-maskable_interrupts
7205 ± 9% -19.7% 5788 interrupts.CPU47.PMI:Performance_monitoring_interrupts
7615 -42.5% 4379 ± 34% interrupts.CPU48.NMI:Non-maskable_interrupts
7615 -42.5% 4379 ± 34% interrupts.CPU48.PMI:Performance_monitoring_interrupts
6642 ± 25% -53.8% 3072 ± 11% interrupts.CPU50.NMI:Non-maskable_interrupts
6642 ± 25% -53.8% 3072 ± 11% interrupts.CPU50.PMI:Performance_monitoring_interrupts
182.00 ± 5% -47.8% 95.00 ± 44% interrupts.CPU50.RES:Rescheduling_interrupts
80.75 ± 17% +40.9% 113.75 ± 6% interrupts.CPU51.RES:Rescheduling_interrupts
7619 -47.7% 3981 ± 27% interrupts.CPU57.NMI:Non-maskable_interrupts
7619 -47.7% 3981 ± 27% interrupts.CPU57.PMI:Performance_monitoring_interrupts
164.00 ± 12% -20.9% 129.75 ± 23% interrupts.CPU57.RES:Rescheduling_interrupts
7139 ± 11% -51.2% 3483 ± 53% interrupts.CPU62.NMI:Non-maskable_interrupts
7139 ± 11% -51.2% 3483 ± 53% interrupts.CPU62.PMI:Performance_monitoring_interrupts
6644 ± 24% -54.1% 3048 ± 5% interrupts.CPU66.NMI:Non-maskable_interrupts
6644 ± 24% -54.1% 3048 ± 5% interrupts.CPU66.PMI:Performance_monitoring_interrupts
174.25 ± 19% -51.5% 84.50 ± 53% interrupts.CPU66.RES:Rescheduling_interrupts
179.00 ± 3% -49.2% 91.00 ± 49% interrupts.CPU68.RES:Rescheduling_interrupts
6938 ± 11% -53.1% 3255 ± 48% interrupts.CPU69.NMI:Non-maskable_interrupts
6938 ± 11% -53.1% 3255 ± 48% interrupts.CPU69.PMI:Performance_monitoring_interrupts
6530 ± 16% -45.9% 3531 ± 43% interrupts.CPU72.NMI:Non-maskable_interrupts
6530 ± 16% -45.9% 3531 ± 43% interrupts.CPU72.PMI:Performance_monitoring_interrupts
5519 ± 27% -33.9% 3645 ± 31% interrupts.CPU91.NMI:Non-maskable_interrupts
5519 ± 27% -33.9% 3645 ± 31% interrupts.CPU91.PMI:Performance_monitoring_interrupts
518479 ± 11% -20.2% 413954 ± 15% interrupts.NMI:Non-maskable_interrupts
518479 ± 11% -20.2% 413954 ± 15% interrupts.PMI:Performance_monitoring_interrupts
42.36 +68.2% 71.24 perf-stat.i.MPKI
9.977e+09 ± 2% -54.9% 4.503e+09 perf-stat.i.branch-instructions
0.05 ± 4% +0.0 0.08 perf-stat.i.branch-miss-rate%
3907318 -12.5% 3419750 perf-stat.i.branch-misses
67.59 +10.0 77.64 perf-stat.i.cache-miss-rate%
1.722e+09 -13.1% 1.497e+09 perf-stat.i.cache-misses
2.539e+09 ± 2% -24.4% 1.92e+09 perf-stat.i.cache-references
1658 -5.6% 1565 perf-stat.i.context-switches
2.27 +120.1% 5.00 perf-stat.i.cpi
98.73 -1.4% 97.38 perf-stat.i.cpu-migrations
87.68 +12.0% 98.23 perf-stat.i.cycles-between-cache-misses
1.003e+10 -54.9% 4.525e+09 perf-stat.i.dTLB-loads
0.00 ± 14% +0.0 0.00 ± 10% perf-stat.i.dTLB-store-miss-rate%
9.922e+09 ± 2% -55.1% 4.454e+09 perf-stat.i.dTLB-stores
45.47 +4.5 49.94 ± 2% perf-stat.i.iTLB-load-miss-rate%
2640557 ± 2% -13.6% 2280694 ± 3% perf-stat.i.iTLB-load-misses
3175197 -28.0% 2286190 perf-stat.i.iTLB-loads
5.964e+10 ± 2% -55.0% 2.682e+10 perf-stat.i.instructions
22561 -47.8% 11788 ± 4% perf-stat.i.instructions-per-iTLB-miss
0.44 -54.4% 0.20 perf-stat.i.ipc
339.51 -50.7% 167.36 perf-stat.i.metric.M/sec
1.352e+08 ± 10% +39.4% 1.885e+08 ± 10% perf-stat.i.node-load-misses
1.1e+08 ± 10% +70.1% 1.871e+08 ± 10% perf-stat.i.node-loads
2.496e+08 +15.5% 2.884e+08 perf-stat.i.node-stores
42.58 +68.2% 71.63 perf-stat.overall.MPKI
0.04 +0.0 0.08 perf-stat.overall.branch-miss-rate%
67.82 +10.1 77.94 perf-stat.overall.cache-miss-rate%
2.26 +121.0% 5.00 perf-stat.overall.cpi
78.40 +14.3% 89.61 perf-stat.overall.cycles-between-cache-misses
0.00 ± 21% +0.0 0.00 ± 10% perf-stat.overall.dTLB-load-miss-rate%
0.00 ± 18% +0.0 0.00 ± 16% perf-stat.overall.dTLB-store-miss-rate%
45.42 +4.5 49.93 ± 2% perf-stat.overall.iTLB-load-miss-rate%
22604 -47.8% 11788 ± 4% perf-stat.overall.instructions-per-iTLB-miss
0.44 -54.7% 0.20 perf-stat.overall.ipc
1588901 +1.9% 1619160 perf-stat.overall.path-length
9.822e+09 -54.4% 4.481e+09 perf-stat.ps.branch-instructions
3831180 -11.7% 3383789 perf-stat.ps.branch-misses
1.695e+09 -12.1% 1.49e+09 perf-stat.ps.cache-misses
2.5e+09 -23.5% 1.911e+09 perf-stat.ps.cache-references
1615 -4.2% 1548 perf-stat.ps.context-switches
9.878e+09 -54.4% 4.502e+09 perf-stat.ps.dTLB-loads
9.768e+09 -54.6% 4.433e+09 perf-stat.ps.dTLB-stores
2598003 ± 2% -12.7% 2267198 ± 3% perf-stat.ps.iTLB-load-misses
3121604 -27.2% 2272234 perf-stat.ps.iTLB-loads
5.871e+10 -54.5% 2.668e+10 perf-stat.ps.instructions
1.331e+08 ± 9% +40.9% 1.876e+08 ± 10% perf-stat.ps.node-load-misses
1.081e+08 ± 10% +72.2% 1.862e+08 ± 10% perf-stat.ps.node-loads
2.452e+08 +17.0% 2.868e+08 perf-stat.ps.node-stores
1.182e+13 -54.4% 5.39e+12 perf-stat.total.instructions
fio.read_bw_MBps
80000 +-------------------------------------------------------------------+
75000 |.. .+..+.+..+.. .+..+..+.+..+.+..+.. .+.. .+.. .+.. .+. .+.+..|
| + + + +.+. + +. +. |
70000 |-+ |
65000 |-+ |
| |
60000 |-+ |
55000 |-+ |
50000 |-+ |
| |
45000 |-+ |
40000 |-+ |
| |
35000 |-+O O O O O O O O O O O O O O O O |
30000 +-------------------------------------------------------------------+
fio.read_iops
40000 +-------------------------------------------------------------------+
|.. .+..+.+..+.. .+..+..+.+..+.+..+.. .+.. .+.. .+.. .+. .+.+..|
| + + + +.+. + +. +. |
35000 |-+ |
| |
| |
30000 |-+ |
| |
25000 |-+ |
| |
| |
20000 |-+ |
| |
| O O O O O O O O O O O O O O O O |
15000 +-------------------------------------------------------------------+
fio.read_clat_mean_us
3e+06 +-----------------------------------------------------------------+
| O O O O O O O O O O O O O O O O |
2.8e+06 |-+ |
2.6e+06 |-+ |
| |
2.4e+06 |-+ |
2.2e+06 |-+ |
| |
2e+06 |-+ |
1.8e+06 |-+ |
| |
1.6e+06 |-+ |
1.4e+06 |-+ |
| .+.+..+. .+.+..+.+..+.+.. .+..+..+.+..+.+..+.+.. .+.. .+..+.+..|
1.2e+06 +-----------------------------------------------------------------+
fio.read_clat_90__us
3.2e+06 +-----------------------------------------------------------------+
3e+06 |-+O O O O O O O O O O O O O O O O |
| |
2.8e+06 |-+ |
2.6e+06 |-+ |
| |
2.4e+06 |-+ |
2.2e+06 |-+ |
2e+06 |-+ |
| |
1.8e+06 |-+ |
1.6e+06 |-+ |
| |
1.4e+06 |..+.+..+.+..+.+..+.+..+.+..+.+..+..+.+..+.+..+.+..+.+..+.+..+.+..|
1.2e+06 +-----------------------------------------------------------------+
fio.read_clat_95__us
3.2e+06 +-----------------------------------------------------------------+
3e+06 |-+O O O O O O O O O O O O O O O O |
| |
2.8e+06 |-+ |
2.6e+06 |-+ |
| |
2.4e+06 |-+ |
2.2e+06 |-+ |
2e+06 |-+ |
| |
1.8e+06 |-+ |
1.6e+06 |-+ |
| .+.. .+.. .+. .+. .+.+..+.+..+.+.. |
1.4e+06 |..+.+..+.+..+.+..+ +.+..+ +. +. +. +.+..|
1.2e+06 +-----------------------------------------------------------------+
fio.read_clat_99__us
6.5e+06 +-----------------------------------------------------------------+
6e+06 |-+ O |
| |
5.5e+06 |-+ O O O O O |
5e+06 |-+O O O O O |
| O O O O |
4.5e+06 |-+ O |
4e+06 |-+ |
3.5e+06 |-+ |
| |
3e+06 |-+ |
2.5e+06 |-+ +.. .+..+.+.. .+.. .+.+.. +.+..+.+..+ |
| + + +.+..+ +. +. .. + .|
2e+06 |-.+.+..+ + +..+.+. |
1.5e+06 +-----------------------------------------------------------------+
fio.latency_2ms_
100 +---------------------------------------------------------------------+
90 |-+ + +. +. +. |
| |
80 |-+ |
70 |-+ |
| |
60 |-+ |
50 |-+ |
40 |-+ |
| |
30 |-+ |
20 |-+ |
| |
10 |-+ |
0 +---------------------------------------------------------------------+
fio.latency_4ms_
100 +---------------------------------------------------------------------+
90 |-+ O O O |
| |
80 |-+ |
70 |-+ |
| |
60 |-+ |
50 |-+ |
40 |-+ |
| |
30 |-+ |
20 |-+ |
| |
10 |-+ |
0 +---------------------------------------------------------------------+
fio.latency_10ms_
2.5 +---------------------------------------------------------------------+
| O |
| |
2 |-+ |
| O O |
| |
1.5 |-+ |
| O O O O O O O O O |
1 |-+ O O O O |
| |
| |
0.5 |-+ |
| |
| |
0 +---------------------------------------------------------------------+
fio.workload
8e+06 +-----------------------------------------------------------------+
7.5e+06 |.. .+..+.+..+. .+.+..+.+..+.+..+.. .+.. .+. .+. .+. .+.+..|
| + +. + +.+. +. +. +. |
7e+06 |-+ |
6.5e+06 |-+ |
| |
6e+06 |-+ |
5.5e+06 |-+ |
5e+06 |-+ |
| |
4.5e+06 |-+ |
4e+06 |-+ |
| |
3.5e+06 |-+O O O O O O O O O O O O O O O O |
3e+06 +-----------------------------------------------------------------+
fio.time.user_time
95 +----------------------------------------------------------------------+
90 |-+ + |
| +.. + + .+.. +.. + .+.. .+.. |
85 |.. + +.. + +. + .+..+.+.. .. + .+. + +.. .+. |
80 |-++ + +..+ +. + +. +. +..|
75 |-+ |
70 |-+ |
| |
65 |-+ |
60 |-+ O O |
55 |-+ |
50 |-+O O O O O |
| O O O O O |
45 |-+ O O O O |
40 +----------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-5.8.0-rc5-00002-gfb406088ce0e36" of type "text/plain" (158408 bytes)
View attachment "job-script" of type "text/plain" (8189 bytes)
View attachment "job.yaml" of type "text/plain" (5726 bytes)
View attachment "reproduce" of type "text/plain" (954 bytes)
Powered by blists - more mailing lists