[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200708072346.GL3874@shao2-debian>
Date: Wed, 8 Jul 2020 15:23:46 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Shaokun Zhang <zhangshaokun@...ilicon.com>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Shaokun Zhang <zhangshaokun@...ilicon.com>,
Will Deacon <will@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Peter Zijlstra <peterz@...radead.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Boqun Feng <boqun.feng@...il.com>,
Yuqi Jin <jinyuqi@...wei.com>, lkp@...ts.01.org
Subject: [fs] 936e92b615: unixbench.score 32.3% improvement
Greeting,
FYI, we noticed a 32.3% improvement of unixbench.score due to commit:
commit: 936e92b615e212d08eb74951324bef25ba564c34 ("[PATCH RESEND] fs: Move @f_count to different cacheline with @f_mode")
url: https://github.com/0day-ci/linux/commits/Shaokun-Zhang/fs-Move-f_count-to-different-cacheline-with-f_mode/20200624-163511
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 5e857ce6eae7ca21b2055cca4885545e29228fe2
in testcase: unixbench
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:
runtime: 300s
nr_task: 30%
test: syscall
cpufreq_governor: performance
ucode: 0x5002f01
test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-7.6/30%/debian-x86_64-20191114.cgz/300s/lkp-csl-2ap3/syscall/unixbench/0x5002f01
commit:
5e857ce6ea ("Merge branch 'hch' (maccess patches from Christoph Hellwig)")
936e92b615 ("fs: Move @f_count to different cacheline with @f_mode")
5e857ce6eae7ca21 936e92b615e212d08eb74951324
---------------- ---------------------------
%stddev %change %stddev
\ | \
2297 ± 2% +32.3% 3038 unixbench.score
171.74 +34.8% 231.55 unixbench.time.user_time
1.366e+09 +32.6% 1.812e+09 unixbench.workload
26472 ± 6% +1270.0% 362665 ±158% cpuidle.C1.usage
0.25 ± 2% +0.1 0.33 mpstat.cpu.all.usr%
8.32 ± 43% +129.7% 19.12 ± 63% sched_debug.cpu.clock.stddev
8.32 ± 43% +129.7% 19.12 ± 63% sched_debug.cpu.clock_task.stddev
2100 ± 2% -15.6% 1772 ± 9% sched_debug.cpu.nr_switches.min
373.34 ± 3% +12.4% 419.48 ± 6% sched_debug.cpu.ttwu_local.stddev
2740 ± 12% -72.3% 757.75 ±105% numa-vmstat.node0.nr_inactive_anon
3139 ± 8% -69.9% 946.25 ± 97% numa-vmstat.node0.nr_shmem
2740 ± 12% -72.3% 757.75 ±105% numa-vmstat.node0.nr_zone_inactive_anon
373.75 ± 51% +443.3% 2030 ± 26% numa-vmstat.node2.nr_inactive_anon
496.00 ± 19% +366.1% 2311 ± 29% numa-vmstat.node2.nr_shmem
373.75 ± 51% +443.3% 2030 ± 26% numa-vmstat.node2.nr_zone_inactive_anon
13728 ± 13% +148.1% 34056 ± 46% numa-vmstat.node3.nr_active_anon
78558 +11.3% 87431 ± 6% numa-vmstat.node3.nr_file_pages
9939 ± 8% +19.7% 11902 ± 13% numa-vmstat.node3.nr_shmem
13728 ± 13% +148.1% 34056 ± 46% numa-vmstat.node3.nr_zone_active_anon
11103 ± 13% -71.2% 3201 ± 99% numa-meminfo.node0.Inactive
10962 ± 12% -72.3% 3032 ±105% numa-meminfo.node0.Inactive(anon)
8551 ± 31% -29.4% 6034 ± 18% numa-meminfo.node0.Mapped
12560 ± 8% -69.9% 3786 ± 97% numa-meminfo.node0.Shmem
1596 ± 51% +415.6% 8230 ± 24% numa-meminfo.node2.Inactive
1496 ± 51% +442.8% 8122 ± 26% numa-meminfo.node2.Inactive(anon)
1984 ± 19% +366.1% 9248 ± 29% numa-meminfo.node2.Shmem
54929 ± 13% +148.0% 136212 ± 46% numa-meminfo.node3.Active
54929 ± 13% +148.0% 136206 ± 46% numa-meminfo.node3.Active(anon)
314216 +11.3% 349697 ± 6% numa-meminfo.node3.FilePages
747907 ± 2% +15.2% 861672 ± 9% numa-meminfo.node3.MemUsed
39744 ± 8% +19.7% 47580 ± 13% numa-meminfo.node3.Shmem
13.94 ± 6% -13.9 0.00 perf-profile.calltrace.cycles-pp.dnotify_flush.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.7 0.66 ± 8% perf-profile.calltrace.cycles-pp.__x64_sys_umask.do_syscall_64.entry_SYSCALL_64_after_hwframe
31.64 ± 8% +3.4 35.08 ± 5% perf-profile.calltrace.cycles-pp.__fget_files.ksys_dup.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.82 ± 8% +5.6 12.41 ± 12% perf-profile.calltrace.cycles-pp.fput_many.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
23.54 ± 58% +12.7 36.27 ± 5% perf-profile.calltrace.cycles-pp.ksys_dup.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
23.54 ± 58% +12.7 36.29 ± 5% perf-profile.calltrace.cycles-pp.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.98 ± 6% -14.0 0.00 perf-profile.children.cycles-pp.dnotify_flush
39.81 ± 6% -10.8 28.96 ± 9% perf-profile.children.cycles-pp.filp_close
40.13 ± 6% -10.7 29.44 ± 9% perf-profile.children.cycles-pp.__x64_sys_close
0.15 ± 10% -0.0 0.13 ± 8% perf-profile.children.cycles-pp.scheduler_tick
0.05 ± 8% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.__x64_sys_getuid
0.10 ± 7% +0.0 0.12 ± 8% perf-profile.children.cycles-pp.__prepare_exit_to_usermode
0.44 ± 7% +0.1 0.56 ± 6% perf-profile.children.cycles-pp.syscall_return_via_sysret
31.78 ± 8% +3.4 35.22 ± 5% perf-profile.children.cycles-pp.__fget_files
32.52 ± 8% +3.7 36.27 ± 5% perf-profile.children.cycles-pp.ksys_dup
32.54 ± 8% +3.8 36.30 ± 5% perf-profile.children.cycles-pp.__x64_sys_dup
6.86 ± 7% +5.6 12.45 ± 12% perf-profile.children.cycles-pp.fput_many
13.91 ± 6% -13.9 0.00 perf-profile.self.cycles-pp.dnotify_flush
18.05 ± 5% -1.6 16.41 ± 7% perf-profile.self.cycles-pp.filp_close
0.06 ± 6% +0.0 0.08 ± 8% perf-profile.self.cycles-pp.__prepare_exit_to_usermode
0.09 ± 9% +0.0 0.11 ± 7% perf-profile.self.cycles-pp.do_syscall_64
0.16 ± 9% +0.0 0.20 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.30 ± 8% +0.1 0.36 ± 7% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.44 ± 7% +0.1 0.56 ± 6% perf-profile.self.cycles-pp.syscall_return_via_sysret
31.61 ± 8% +3.4 35.00 ± 5% perf-profile.self.cycles-pp.__fget_files
6.81 ± 7% +5.6 12.38 ± 12% perf-profile.self.cycles-pp.fput_many
36623 ± 3% +11.5% 40822 ± 7% softirqs.CPU100.SCHED
16499 ± 40% +27.8% 21088 ± 35% softirqs.CPU122.RCU
16758 ± 41% +30.0% 21781 ± 35% softirqs.CPU126.RCU
178.25 ± 11% +7718.2% 13936 ±168% softirqs.CPU13.NET_RX
40883 ± 4% -6.9% 38055 ± 2% softirqs.CPU132.SCHED
16029 ± 41% +35.9% 21789 ± 33% softirqs.CPU144.RCU
16220 ± 43% +32.4% 21484 ± 35% softirqs.CPU145.RCU
16393 ± 39% +29.9% 21301 ± 32% softirqs.CPU146.RCU
16217 ± 39% +29.8% 21055 ± 35% softirqs.CPU147.RCU
37011 ± 12% +12.4% 41589 ± 5% softirqs.CPU149.SCHED
16127 ± 41% +34.5% 21685 ± 34% softirqs.CPU150.RCU
16131 ± 41% +32.3% 21333 ± 35% softirqs.CPU151.RCU
16558 ± 37% +28.2% 21230 ± 34% softirqs.CPU152.RCU
15863 ± 40% +34.1% 21266 ± 32% softirqs.CPU153.RCU
16044 ± 41% +32.7% 21286 ± 34% softirqs.CPU154.RCU
16057 ± 40% +34.9% 21658 ± 33% softirqs.CPU155.RCU
16352 ± 39% +31.0% 21423 ± 33% softirqs.CPU156.RCU
16006 ± 39% +33.4% 21348 ± 32% softirqs.CPU158.RCU
16300 ± 41% +32.0% 21521 ± 34% softirqs.CPU161.RCU
37546 ± 4% +13.5% 42605 ± 3% softirqs.CPU161.SCHED
16411 ± 41% +33.4% 21894 ± 33% softirqs.CPU162.RCU
16329 ± 41% +32.9% 21704 ± 35% softirqs.CPU163.RCU
16517 ± 39% +29.8% 21441 ± 34% softirqs.CPU164.RCU
16227 ± 41% +32.3% 21471 ± 34% softirqs.CPU165.RCU
16347 ± 40% +31.4% 21481 ± 35% softirqs.CPU166.RCU
16360 ± 43% +32.2% 21631 ± 35% softirqs.CPU167.RCU
36986 +11.3% 41148 ± 6% softirqs.CPU167.SCHED
16218 ± 44% +34.7% 21843 ± 33% softirqs.CPU189.RCU
16501 ± 39% +32.0% 21783 ± 33% softirqs.CPU52.RCU
17101 ± 41% +29.4% 22121 ± 35% softirqs.CPU68.RCU
1.087e+09 +20.9% 1.314e+09 perf-stat.i.branch-instructions
19778787 +22.1% 24144895 ± 16% perf-stat.i.branch-misses
22.88 -17.7% 18.84 ± 2% perf-stat.i.cpi
1.635e+09 +23.6% 2.021e+09 perf-stat.i.dTLB-loads
20648 ± 2% +218.4% 65736 ±110% perf-stat.i.dTLB-store-misses
1.023e+09 +24.8% 1.276e+09 perf-stat.i.dTLB-stores
78.10 +1.4 79.54 perf-stat.i.iTLB-load-miss-rate%
16169669 +8.2% 17493234 perf-stat.i.iTLB-load-misses
5.364e+09 +21.3% 6.507e+09 perf-stat.i.instructions
369.33 +11.8% 413.03 ± 5% perf-stat.i.instructions-per-iTLB-miss
0.41 ± 2% +83.3% 0.76 ± 16% perf-stat.i.metric.K/sec
19.79 +23.2% 24.39 perf-stat.i.metric.M/sec
4460149 ± 2% -45.1% 2447884 ± 14% perf-stat.i.node-load-misses
241219 ± 2% -58.8% 99443 ± 47% perf-stat.i.node-loads
1679821 ± 2% -4.4% 1605611 ± 3% perf-stat.i.node-store-misses
25.91 -17.6% 21.36 perf-stat.overall.cpi
82.51 +1.7 84.17 perf-stat.overall.iTLB-load-miss-rate%
331.21 +12.2% 371.62 perf-stat.overall.instructions-per-iTLB-miss
0.04 +21.3% 0.05 perf-stat.overall.ipc
1566 -8.4% 1435 perf-stat.overall.path-length
1.089e+09 +21.0% 1.318e+09 perf-stat.ps.branch-instructions
19801099 +21.7% 24102537 ± 15% perf-stat.ps.branch-misses
1.641e+09 +23.6% 2.028e+09 perf-stat.ps.dTLB-loads
20512 ± 2% +212.7% 64142 ±109% perf-stat.ps.dTLB-store-misses
1.027e+09 +24.8% 1.282e+09 perf-stat.ps.dTLB-stores
16239916 +8.2% 17567773 perf-stat.ps.iTLB-load-misses
5.378e+09 +21.4% 6.527e+09 perf-stat.ps.instructions
4485062 ± 2% -45.2% 2458026 ± 14% perf-stat.ps.node-load-misses
242388 ± 2% -59.0% 99493 ± 47% perf-stat.ps.node-loads
1689890 ± 2% -4.5% 1614182 ± 3% perf-stat.ps.node-store-misses
2.139e+12 +21.5% 2.6e+12 perf-stat.total.instructions
288.00 ± 13% +8910.9% 25951 ±168% interrupts.34:PCI-MSI.524292-edge.eth0-TxRx-3
2042 ± 57% +190.2% 5927 ± 26% interrupts.CPU1.NMI:Non-maskable_interrupts
2042 ± 57% +190.2% 5927 ± 26% interrupts.CPU1.PMI:Performance_monitoring_interrupts
3.75 ± 34% +2373.3% 92.75 ±130% interrupts.CPU100.TLB:TLB_shootdowns
3510 ± 88% -85.1% 522.00 ±124% interrupts.CPU107.NMI:Non-maskable_interrupts
3510 ± 88% -85.1% 522.00 ±124% interrupts.CPU107.PMI:Performance_monitoring_interrupts
3813 ± 74% -73.3% 1018 ±150% interrupts.CPU110.NMI:Non-maskable_interrupts
3813 ± 74% -73.3% 1018 ±150% interrupts.CPU110.PMI:Performance_monitoring_interrupts
4536 ± 51% -97.1% 131.50 ± 8% interrupts.CPU111.NMI:Non-maskable_interrupts
4536 ± 51% -97.1% 131.50 ± 8% interrupts.CPU111.PMI:Performance_monitoring_interrupts
4476 ± 47% -97.5% 113.00 ± 19% interrupts.CPU112.NMI:Non-maskable_interrupts
4476 ± 47% -97.5% 113.00 ± 19% interrupts.CPU112.PMI:Performance_monitoring_interrupts
3522 ± 36% +92.7% 6787 ± 16% interrupts.CPU120.NMI:Non-maskable_interrupts
3522 ± 36% +92.7% 6787 ± 16% interrupts.CPU120.PMI:Performance_monitoring_interrupts
2888 ± 66% +117.5% 6283 ± 21% interrupts.CPU123.NMI:Non-maskable_interrupts
2888 ± 66% +117.5% 6283 ± 21% interrupts.CPU123.PMI:Performance_monitoring_interrupts
3109 ± 61% +132.5% 7230 ± 7% interrupts.CPU124.NMI:Non-maskable_interrupts
3109 ± 61% +132.5% 7230 ± 7% interrupts.CPU124.PMI:Performance_monitoring_interrupts
1067 ± 19% -21.6% 836.50 interrupts.CPU125.CAL:Function_call_interrupts
288.00 ± 13% +8910.9% 25951 ±168% interrupts.CPU13.34:PCI-MSI.524292-edge.eth0-TxRx-3
244.25 ± 96% -95.3% 11.50 ± 95% interrupts.CPU13.TLB:TLB_shootdowns
2056 ±117% +206.3% 6298 ± 20% interrupts.CPU130.NMI:Non-maskable_interrupts
2056 ±117% +206.3% 6298 ± 20% interrupts.CPU130.PMI:Performance_monitoring_interrupts
831.50 +21.4% 1009 ± 13% interrupts.CPU133.CAL:Function_call_interrupts
8.00 ± 29% +634.4% 58.75 ±119% interrupts.CPU133.RES:Rescheduling_interrupts
1629 ±159% +265.3% 5952 ± 29% interrupts.CPU139.NMI:Non-maskable_interrupts
1629 ±159% +265.3% 5952 ± 29% interrupts.CPU139.PMI:Performance_monitoring_interrupts
1660 ±159% +161.0% 4332 ± 61% interrupts.CPU141.NMI:Non-maskable_interrupts
1660 ±159% +161.0% 4332 ± 61% interrupts.CPU141.PMI:Performance_monitoring_interrupts
882.75 ±147% +542.5% 5671 ± 38% interrupts.CPU143.NMI:Non-maskable_interrupts
882.75 ±147% +542.5% 5671 ± 38% interrupts.CPU143.PMI:Performance_monitoring_interrupts
2600 ± 29% +68.8% 4389 ± 47% interrupts.CPU144.NMI:Non-maskable_interrupts
2600 ± 29% +68.8% 4389 ± 47% interrupts.CPU144.PMI:Performance_monitoring_interrupts
1494 ± 20% +91.3% 2859 ± 29% interrupts.CPU147.NMI:Non-maskable_interrupts
1494 ± 20% +91.3% 2859 ± 29% interrupts.CPU147.PMI:Performance_monitoring_interrupts
3657 ± 54% -96.3% 133.75 ± 8% interrupts.CPU15.NMI:Non-maskable_interrupts
3657 ± 54% -96.3% 133.75 ± 8% interrupts.CPU15.PMI:Performance_monitoring_interrupts
5165 ± 40% -97.8% 115.00 ± 26% interrupts.CPU16.NMI:Non-maskable_interrupts
5165 ± 40% -97.8% 115.00 ± 26% interrupts.CPU16.PMI:Performance_monitoring_interrupts
34.00 ±125% -84.6% 5.25 ± 49% interrupts.CPU186.RES:Rescheduling_interrupts
1033 ± 24% -19.0% 836.75 interrupts.CPU190.CAL:Function_call_interrupts
68.00 ± 28% +55.5% 105.75 ± 9% interrupts.CPU26.RES:Rescheduling_interrupts
882.25 ± 4% +6.3% 937.75 ± 7% interrupts.CPU32.CAL:Function_call_interrupts
139.25 ± 96% -74.0% 36.25 ± 72% interrupts.CPU32.TLB:TLB_shootdowns
848.25 ±130% +368.9% 3977 ± 56% interrupts.CPU35.NMI:Non-maskable_interrupts
848.25 ±130% +368.9% 3977 ± 56% interrupts.CPU35.PMI:Performance_monitoring_interrupts
958.25 ± 11% -10.6% 856.75 interrupts.CPU36.CAL:Function_call_interrupts
1903 ± 72% +127.9% 4337 ± 23% interrupts.CPU41.NMI:Non-maskable_interrupts
1903 ± 72% +127.9% 4337 ± 23% interrupts.CPU41.PMI:Performance_monitoring_interrupts
1320 ±158% +245.4% 4560 ± 32% interrupts.CPU47.NMI:Non-maskable_interrupts
1320 ±158% +245.4% 4560 ± 32% interrupts.CPU47.PMI:Performance_monitoring_interrupts
837.50 +5.2% 881.25 ± 4% interrupts.CPU61.CAL:Function_call_interrupts
1074 ± 28% -22.1% 836.50 interrupts.CPU69.CAL:Function_call_interrupts
1042 ± 12% -18.7% 847.50 ± 2% interrupts.CPU86.CAL:Function_call_interrupts
unixbench.score
3200 +--------------------------------------------------------------------+
| O O O |
3000 |-+ O O O O O O O O O |
| O O O O |
| O |
2800 |-+ |
| |
2600 |-+ |
| |
2400 |-+ |
| +.+.. .+.+..+. +..+. .+. .+. .+..+.+.+..+.+.+. .+.|
|.+.. + .+ +.+..+. + + +. + +. |
2200 |-+ + + + |
| |
2000 +--------------------------------------------------------------------+
unixbench.workload
1.9e+09 +-----------------------------------------------------------------+
| O O O O |
1.8e+09 |-+ O O O O O O O O |
| O O O O O |
1.7e+09 |-+ |
| |
1.6e+09 |-+ |
| |
1.5e+09 |-+ |
| |
1.4e+09 |-+ +.+ .+..+.+ +.+. .+.. .+. .+..+. .+. .+.. .|
|.+. .. : + + .+.+.. + + + +.+ + + +.+ |
1.3e+09 |-+ + : + + + |
| + |
1.2e+09 +-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-5.8.0-rc1-00128-g936e92b615e21" of type "text/plain" (206161 bytes)
View attachment "job-script" of type "text/plain" (7442 bytes)
View attachment "job.yaml" of type "text/plain" (5044 bytes)
View attachment "reproduce" of type "text/plain" (293 bytes)
Powered by blists - more mailing lists