[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210105032313.GA17523@xsang-OptiPlex-9020>
Date: Tue, 5 Jan 2021 11:23:13 +0800
From: kernel test robot <oliver.sang@...el.com>
To: John Hubbard <jhubbard@...dia.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Chris Wilson <chris@...is-wilson.co.uk>,
Daniel Vetter <daniel@...ll.ch>,
David Airlie <airlied@...ux.ie>,
Jani Nikula <jani.nikula@...ux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
Matthew Auld <matthew.auld@...el.com>,
Matthew Wilcox <willy@...radead.org>,
Rodrigo Vivi <rodrigo.vivi@...el.com>,
Souptick Joarder <jrdr.linux@...il.com>,
Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
zhengjun.xing@...el.com
Subject: [mm/gup] 376a34efa4: will-it-scale.per_process_ops -5.5% regression
Greeting,
FYI, we noticed a -5.5% regression of will-it-scale.per_process_ops due to commit:
commit: 376a34efa4eeb699d285c1a741b186d44b44c429 ("mm/gup: refactor and de-duplicate gup_fast() code")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:
nr_task: 50%
mode: process
test: futex1
cpufreq_governor: performance
ucode: 0x5003003
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/futex1/will-it-scale/0x5003003
commit:
9e1f0580d3 ("mm/gup: move __get_user_pages_fast() down a few lines in gup.c")
376a34efa4 ("mm/gup: refactor and de-duplicate gup_fast() code")
9e1f0580d37e0d3f 376a34efa4eeb699d285c1a741b
---------------- ---------------------------
%stddev %change %stddev
\ | \
6.067e+08 -5.5% 5.733e+08 will-it-scale.96.processes
6319316 -5.5% 5972248 will-it-scale.per_process_ops
6.067e+08 -5.5% 5.733e+08 will-it-scale.workload
32.90 ± 4% -4.6% 31.38 boot-time.boot
5329 ± 4% -8.1% 4897 ± 5% boot-time.idle
8256765 ± 70% -68.3% 2614568 ± 43% cpuidle.C1.time
145939 ± 87% -64.3% 52167 ± 20% cpuidle.C1.usage
7722907 ± 20% -57.4% 3290523 ±103% sched_debug.cfs_rq:/.spread0.avg
12708331 ± 7% -39.1% 7733820 ± 46% sched_debug.cfs_rq:/.spread0.max
-1658498 +253.9% -5869460 sched_debug.cfs_rq:/.spread0.min
5890 ± 29% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.preempt_schedule_common._cond_resched.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
1.50 ± 33% -100.0% 0.00 perf-sched.wait_and_delay.count.preempt_schedule_common._cond_resched.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
7559 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.preempt_schedule_common._cond_resched.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
5890 ± 29% -100.0% 0.00 perf-sched.wait_time.avg.ms.preempt_schedule_common._cond_resched.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
7559 -100.0% 0.00 perf-sched.wait_time.max.ms.preempt_schedule_common._cond_resched.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
150049 +2.8% 154285 proc-vmstat.nr_active_anon
89133 +2.0% 90906 proc-vmstat.nr_anon_pages
2959 +3.0% 3048 proc-vmstat.nr_page_table_pages
150049 +2.8% 154285 proc-vmstat.nr_zone_active_anon
1412412 +1.5% 1432905 proc-vmstat.pgalloc_normal
1300962 +1.1% 1314985 proc-vmstat.pgfault
693.25 ± 23% -78.9% 146.25 ± 43% interrupts.CPU1.RES:Rescheduling_interrupts
1.75 ± 24% +10957.1% 193.50 ±159% interrupts.CPU1.TLB:TLB_shootdowns
17.75 ± 54% +181.7% 50.00 ± 84% interrupts.CPU104.RES:Rescheduling_interrupts
76.00 ± 46% -73.0% 20.50 ± 32% interrupts.CPU105.RES:Rescheduling_interrupts
121.00 ±115% -84.3% 19.00 ± 24% interrupts.CPU113.RES:Rescheduling_interrupts
2.00 ± 61% +4262.5% 87.25 ±156% interrupts.CPU118.TLB:TLB_shootdowns
3238 ± 20% +56.0% 5051 ± 48% interrupts.CPU162.NMI:Non-maskable_interrupts
3238 ± 20% +56.0% 5051 ± 48% interrupts.CPU162.PMI:Performance_monitoring_interrupts
4269 ± 62% +75.4% 7490 ± 24% interrupts.CPU171.NMI:Non-maskable_interrupts
4269 ± 62% +75.4% 7490 ± 24% interrupts.CPU171.PMI:Performance_monitoring_interrupts
2851 ± 35% +100.3% 5711 ± 35% interrupts.CPU174.NMI:Non-maskable_interrupts
2851 ± 35% +100.3% 5711 ± 35% interrupts.CPU174.PMI:Performance_monitoring_interrupts
5020 ± 42% +51.1% 7587 ± 24% interrupts.CPU26.NMI:Non-maskable_interrupts
5020 ± 42% +51.1% 7587 ± 24% interrupts.CPU26.PMI:Performance_monitoring_interrupts
75.75 ± 49% -76.2% 18.00 ± 57% interrupts.CPU28.RES:Rescheduling_interrupts
1.50 ± 74% +11966.7% 181.00 ±166% interrupts.CPU3.TLB:TLB_shootdowns
2883 ± 36% +75.2% 5051 ± 48% interrupts.CPU51.NMI:Non-maskable_interrupts
2883 ± 36% +75.2% 5051 ± 48% interrupts.CPU51.PMI:Performance_monitoring_interrupts
16.25 ± 44% +192.3% 47.50 ± 87% interrupts.CPU65.RES:Rescheduling_interrupts
1.25 ± 34% +12700.0% 160.00 ±163% interrupts.CPU7.TLB:TLB_shootdowns
2856 +50.5% 4298 ± 32% interrupts.CPU71.NMI:Non-maskable_interrupts
2856 +50.5% 4298 ± 32% interrupts.CPU71.PMI:Performance_monitoring_interrupts
110.75 ± 39% -58.2% 46.25 ±100% interrupts.CPU86.RES:Rescheduling_interrupts
127.75 ± 84% -60.5% 50.50 ± 83% interrupts.CPU90.RES:Rescheduling_interrupts
2.25 ± 57% +1388.9% 33.50 ±137% interrupts.CPU98.TLB:TLB_shootdowns
230.25 ± 3% +35.9% 313.00 ± 14% interrupts.IWI:IRQ_work_interrupts
1266 ± 55% +494.7% 7533 ± 68% interrupts.TLB:TLB_shootdowns
20.74 -3.2 17.57 ± 11% perf-profile.calltrace.cycles-pp.gup_pgd_range.internal_get_user_pages_fast.get_futex_key.futex_wake.do_futex
18.46 -2.7 15.78 ± 10% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
4.72 -0.7 4.06 ± 11% perf-profile.calltrace.cycles-pp.try_grab_compound_head.gup_pgd_range.internal_get_user_pages_fast.get_futex_key.futex_wake
1.89 -0.6 1.33 ± 11% perf-profile.calltrace.cycles-pp.testcase
2.99 -0.5 2.53 ± 11% perf-profile.calltrace.cycles-pp.hash_futex.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
1.65 -0.2 1.42 ± 11% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
21.01 -3.2 17.83 ± 11% perf-profile.children.cycles-pp.gup_pgd_range
12.26 -1.7 10.54 ± 10% perf-profile.children.cycles-pp.entry_SYSCALL_64
7.83 -1.0 6.81 ± 11% perf-profile.children.cycles-pp.syscall_return_via_sysret
4.72 -0.7 4.06 ± 11% perf-profile.children.cycles-pp.try_grab_compound_head
2.00 -0.6 1.37 ± 11% perf-profile.children.cycles-pp.testcase
2.99 -0.5 2.53 ± 11% perf-profile.children.cycles-pp.hash_futex
0.28 -0.1 0.17 ± 10% perf-profile.children.cycles-pp.syscall@plt
0.30 -0.1 0.24 ± 11% perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
0.22 -0.1 0.16 ± 14% perf-profile.children.cycles-pp.get_user_pages_fast
0.35 -0.1 0.30 ± 9% perf-profile.children.cycles-pp.pmd_huge
15.88 -2.5 13.41 ± 11% perf-profile.self.cycles-pp.gup_pgd_range
9.06 -1.5 7.57 ± 11% perf-profile.self.cycles-pp.syscall
4.91 -1.2 3.73 ± 11% perf-profile.self.cycles-pp.do_syscall_64
7.83 -1.0 6.81 ± 11% perf-profile.self.cycles-pp.syscall_return_via_sysret
4.30 -0.9 3.40 ± 11% perf-profile.self.cycles-pp.futex_wake
6.04 -0.8 5.28 ± 10% perf-profile.self.cycles-pp.entry_SYSCALL_64
4.57 -0.6 3.93 ± 11% perf-profile.self.cycles-pp.try_grab_compound_head
1.66 -0.6 1.08 ± 11% perf-profile.self.cycles-pp.testcase
2.96 -0.4 2.51 ± 11% perf-profile.self.cycles-pp.hash_futex
3.02 -0.4 2.58 ± 11% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.73 -0.2 1.49 ± 11% perf-profile.self.cycles-pp.__x64_sys_futex
1.29 -0.2 1.12 ± 11% perf-profile.self.cycles-pp.do_futex
0.31 -0.0 0.26 ± 8% perf-profile.self.cycles-pp.pmd_huge
0.16 -0.0 0.13 ± 10% perf-profile.self.cycles-pp.syscall@plt
3.13 +2.5 5.59 ± 11% perf-profile.self.cycles-pp.internal_get_user_pages_fast
7.362e+10 -5.5% 6.959e+10 perf-stat.i.branch-instructions
0.41 -0.2 0.19 perf-stat.i.branch-miss-rate%
2.972e+08 -58.1% 1.245e+08 perf-stat.i.branch-misses
13.95 ± 2% +0.6 14.53 perf-stat.i.cache-miss-rate%
1608471 ± 3% +7.5% 1728582 ± 2% perf-stat.i.cache-misses
11874653 +2.1% 12120398 perf-stat.i.cache-references
0.59 +5.9% 0.63 perf-stat.i.cpi
373293 ± 8% +28.3% 478954 ± 10% perf-stat.i.dTLB-load-misses
1.202e+11 -4.5% 1.148e+11 perf-stat.i.dTLB-loads
7.829e+10 -4.0% 7.515e+10 perf-stat.i.dTLB-stores
96.94 -5.4 91.52 perf-stat.i.iTLB-load-miss-rate%
2.909e+08 -58.6% 1.204e+08 perf-stat.i.iTLB-load-misses
5.025e+11 -4.9% 4.778e+11 perf-stat.i.instructions
1730 +129.3% 3968 perf-stat.i.instructions-per-iTLB-miss
1.70 -5.4% 1.61 perf-stat.i.ipc
0.86 ± 6% -9.6% 0.78 perf-stat.i.metric.K/sec
1415 -4.7% 1348 perf-stat.i.metric.M/sec
93.21 +1.4 94.63 perf-stat.i.node-load-miss-rate%
36909 ± 4% -12.3% 32363 ± 10% perf-stat.i.node-loads
92.16 ± 6% +5.3 97.47 perf-stat.i.node-store-miss-rate%
0.02 +7.3% 0.03 perf-stat.overall.MPKI
0.40 -0.2 0.18 perf-stat.overall.branch-miss-rate%
13.61 ± 3% +0.7 14.30 ± 2% perf-stat.overall.cache-miss-rate%
0.59 +5.6% 0.62 perf-stat.overall.cpi
0.00 ± 5% +0.0 0.00 ± 5% perf-stat.overall.dTLB-load-miss-rate%
97.02 -5.4 91.63 perf-stat.overall.iTLB-load-miss-rate%
1727 +129.8% 3970 perf-stat.overall.instructions-per-iTLB-miss
1.70 -5.3% 1.61 perf-stat.overall.ipc
89.86 +1.6 91.50 perf-stat.overall.node-load-miss-rate%
7.326e+10 -5.5% 6.92e+10 perf-stat.ps.branch-instructions
2.958e+08 -58.1% 1.239e+08 perf-stat.ps.branch-misses
1612946 ± 4% +7.1% 1727850 ± 2% perf-stat.ps.cache-misses
386183 ± 5% +25.5% 484727 ± 5% perf-stat.ps.dTLB-load-misses
1.196e+11 -4.6% 1.142e+11 perf-stat.ps.dTLB-loads
15923 ± 4% -6.7% 14859 perf-stat.ps.dTLB-store-misses
7.791e+10 -4.1% 7.473e+10 perf-stat.ps.dTLB-stores
2.894e+08 -58.6% 1.197e+08 perf-stat.ps.iTLB-load-misses
5e+11 -5.0% 4.751e+11 perf-stat.ps.instructions
36225 ± 4% -12.8% 31590 ± 10% perf-stat.ps.node-loads
1.512e+14 -4.8% 1.439e+14 perf-stat.total.instructions
13958 ± 6% -10.1% 12554 ± 4% softirqs.CPU104.RCU
13815 ± 5% -10.9% 12315 ± 5% softirqs.CPU106.RCU
95620 ± 3% +31.2% 125489 ± 21% softirqs.CPU106.TIMER
98504 ± 4% +19.6% 117772 ± 18% softirqs.CPU11.TIMER
13971 ± 5% -8.3% 12815 ± 2% softirqs.CPU112.RCU
96603 ± 4% +19.4% 115377 ± 19% softirqs.CPU112.TIMER
14000 ± 5% -7.7% 12917 ± 5% softirqs.CPU120.RCU
95224 ± 3% +25.0% 119064 ± 17% softirqs.CPU122.TIMER
14002 ± 6% -12.2% 12290 ± 3% softirqs.CPU126.RCU
14028 ± 6% -9.6% 12674 ± 4% softirqs.CPU128.RCU
28439 ± 50% +37.5% 39098 ± 5% softirqs.CPU128.SCHED
94723 ± 3% +30.6% 123672 ± 20% softirqs.CPU128.TIMER
14626 ± 4% -13.0% 12732 softirqs.CPU13.RCU
19080 ± 61% +105.1% 39142 ± 4% softirqs.CPU13.SCHED
93419 ± 2% +32.9% 124191 ± 20% softirqs.CPU13.TIMER
14169 ± 5% -8.8% 12926 ± 3% softirqs.CPU140.RCU
14593 ± 6% -11.7% 12887 ± 6% softirqs.CPU158.RCU
20004 ± 81% +95.9% 39183 ± 4% softirqs.CPU158.SCHED
20928 ± 75% +86.5% 39023 ± 4% softirqs.CPU166.SCHED
92411 ± 2% +32.4% 122325 ± 20% softirqs.CPU166.TIMER
13632 ± 8% -12.7% 11905 ± 4% softirqs.CPU170.RCU
94847 ± 3% +31.3% 124502 ± 20% softirqs.CPU170.TIMER
13449 ± 6% -12.0% 11836 ± 5% softirqs.CPU172.RCU
95541 ± 3% +30.8% 124994 ± 19% softirqs.CPU172.TIMER
13559 ± 4% -10.9% 12079 ± 10% softirqs.CPU176.RCU
17177 ± 11% -14.8% 14642 ± 4% softirqs.CPU26.RCU
14994 ± 7% -7.3% 13895 ± 5% softirqs.CPU27.RCU
28432 ± 50% +38.7% 39440 ± 4% softirqs.CPU27.SCHED
94128 ± 3% +32.7% 124889 ± 20% softirqs.CPU27.TIMER
15261 ± 4% -10.7% 13632 ± 4% softirqs.CPU29.RCU
20787 ± 82% +89.0% 39293 ± 4% softirqs.CPU29.SCHED
93893 ± 3% +32.4% 124332 ± 20% softirqs.CPU29.TIMER
28710 ± 50% +36.7% 39256 ± 4% softirqs.CPU31.SCHED
20986 ± 75% +86.9% 39216 ± 4% softirqs.CPU63.SCHED
93462 ± 3% +31.1% 122523 ± 20% softirqs.CPU63.TIMER
14745 ± 6% -9.2% 13385 ± 5% softirqs.CPU69.RCU
28484 ± 50% +37.5% 39172 ± 4% softirqs.CPU69.SCHED
95765 ± 4% +29.0% 123517 ± 20% softirqs.CPU69.TIMER
13848 ± 6% -10.7% 12364 ± 4% softirqs.CPU73.RCU
96130 +28.6% 123605 ± 20% softirqs.CPU73.TIMER
17277 ± 89% -79.6% 3520 ± 9% softirqs.CPU74.SCHED
94638 ± 4% +27.8% 120924 ± 17% softirqs.CPU75.TIMER
14750 ± 6% -9.5% 13345 ± 5% softirqs.CPU96.RCU
16482 ± 12% -16.7% 13728 ± 3% softirqs.CPU97.RCU
will-it-scale.96.processes
6.1e+08 +-----------------------------------------------------------------+
| .+.+.+.. .+.+. +.+.+.+.+.|
6e+08 |.+.+.+.+ + +.+.+.+.+.+.+.+.+.+.+.+.+..+. + |
| +.+ |
5.9e+08 |-+ |
| |
5.8e+08 |-+ |
| O O O O O O |
5.7e+08 |-+ O O O O O O O O O O |
| O O O O |
5.6e+08 |-+ |
| O O |
5.5e+08 |-O O O O |
| |
5.4e+08 +-----------------------------------------------------------------+
will-it-scale.per_process_ops
6.4e+06 +-----------------------------------------------------------------+
| |
6.3e+06 |-+ .+.+.+.. .+.+. +.+.+.+.+.|
|.+.+.+.+ + +.+.+.+.+.+.+.+.+.+.+.+.+..+. + |
6.2e+06 |-+ +.+ |
| |
6.1e+06 |-+ |
| |
6e+06 |-+ O O O |
| O O O O O O O O O O O O O |
5.9e+06 |-+ O O O |
| O |
5.8e+06 |-+ |
| O O |
5.7e+06 +-----------------------------------------------------------------+
will-it-scale.workload
6.1e+08 +-----------------------------------------------------------------+
| .+.+.+.. .+.+. +.+.+.+.+.|
6e+08 |.+.+.+.+ + +.+.+.+.+.+.+.+.+.+.+.+.+..+. + |
| +.+ |
5.9e+08 |-+ |
| |
5.8e+08 |-+ |
| O O O O O O |
5.7e+08 |-+ O O O O O O O O O O |
| O O O O |
5.6e+08 |-+ |
| O O |
5.5e+08 |-O O O O |
| |
5.4e+08 +-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Oliver Sang
View attachment "config-5.7.0-03831-g376a34efa4eeb" of type "text/plain" (157857 bytes)
View attachment "job-script" of type "text/plain" (7752 bytes)
View attachment "job.yaml" of type "text/plain" (5278 bytes)
View attachment "reproduce" of type "text/plain" (338 bytes)
Powered by blists - more mailing lists