[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202306041623.cdc379d-oliver.sang@intel.com>
Date: Sun, 4 Jun 2023 17:25:40 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
<linux-kernel@...r.kernel.org>, Eric Dumazet <edumazet@...gle.com>,
<ying.huang@...el.com>, <feng.tang@...el.com>,
<fengwei.yin@...el.com>, <oliver.sang@...el.com>
Subject: [linus:master] [x86] 47ee3f1dd9:
phoronix-test-suite.ior.2MB.DefaultTestDirectory.mb_s -21.7% regression
hi, Linus,
we reported "[linus:master] [x86] adfcf4231b: blogbench.read_score -10.9% regression"
on
https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/
however, as you pointed out, the blogbench is a "*horrifically* bad benchmark
for this case", Feng Tang also made a debug patch to confirm this.
now we noticed 47ee3f1dd9 is fix patch for adfcf4231b, just as we found for
adfcf4231b, this commit could also cause performance regression or improvement
for different cases. actually only this case is a regression, all others are
improvement (we normally pick regression as title to report).
below are detail data. hope they could be for your information about the
possible performance impact of this change.
Hello,
kernel test robot noticed a -21.7% regression of phoronix-test-suite.ior.2MB.DefaultTestDirectory.mb_s on:
commit: 47ee3f1dd93bcbe031539b1ecdaafb44b661c772 ("x86: re-introduce support for ERMS copies for user space accesses")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: phoronix-test-suite
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
parameters:
test: ior-1.1.1
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops 11.8% improvement |
| test machine | 104 threads 2 sockets (Skylake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=16 |
| | test=pread3 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | nepim: nepim.tcp.avg.kbps_out 9.5% improvement |
| test machine | 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (Haswell) with 16G memory |
| test parameters | cluster=cs-localhost |
| | cpufreq_governor=performance |
| | nr_threads=40% |
| | protocol=tcp6 |
| | runtime=300s |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.intel-mpi.IMB-MPI1Exchange.average_mbytes_sec 21.6% improvement |
| test machine | 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=IMB-MPI1 Exchange |
| | test=intel-mpi-1.0.1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops 4.6% improvement |
| test machine | 104 threads 2 sockets (Skylake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=50% |
| | test=readseek1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.tiobench.Write.64MB.8.mb_s 43.5% improvement |
| test machine | 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=Write |
| | option_b=64MB |
| | option_c=8 |
| | test=tiobench-1.3.1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.tiobench.RandomWrite.32MB.32.mb_s 49.5% improvement |
| test machine | 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=Random Write |
| | option_b=32MB |
| | option_c=4 |
| | test=tiobench-1.3.1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | lmbench3: lmbench3.AF_UNIX.sock.stream.bandwidth.MB/sec 9.2% improvement |
| test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory |
| test parameters | cpufreq_governor=performance |
| | mode=development |
| | nr_threads=1 |
| | test=UNIX |
| | test_memory_size=50% |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.tiobench.RandomWrite.64MB.32.mb_s 51.1% improvement |
| test machine | 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=Random Write |
| | option_b=64MB |
| | option_c=4 |
| | test=tiobench-1.3.1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.tiobench.Write.64MB.32.mb_s 58.5% improvement |
| test machine | 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=Write |
| | option_b=64MB |
| | option_c=4 |
| | test=tiobench-1.3.1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | nepim: nepim.tcp.avg.kbps_out 10.3% improvement |
| test machine | 8 threads 1 sockets Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (Haswell) with 8G memory |
| test parameters | cluster=cs-localhost |
| | cpufreq_governor=performance |
| | nr_threads=25% |
| | protocol=tcp |
| | runtime=300s |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.tiobench.Write.32MB.32.mb_s 57.3% improvement |
| test machine | 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=Write |
| | option_b=32MB |
| | option_c=4 |
| | test=tiobench-1.3.1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.tiobench.RandomWrite.64MB.8.mb_s 45.1% improvement |
| test machine | 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=Random Write |
| | option_b=64MB |
| | option_c=8 |
| | test=tiobench-1.3.1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.x11perf.500pxPutImageSquare.operations___second 8.4% improvement |
| test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 32G memory |
| test parameters | cpufreq_governor=performance |
| | need_x=true |
| | option_a=500px PutImage Square |
| | test=x11perf-1.1.1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_thread_ops 14.1% improvement |
| test machine | 104 threads 2 sockets (Skylake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=thread |
| | nr_task=50% |
| | test=pread1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_thread_ops 7.8% improvement |
| test machine | 104 threads 2 sockets (Skylake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=thread |
| | nr_task=100% |
| | test=readseek1 |
+------------------+-----------------------------------------------------------------------------------------------------------+
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202306041623.cdc379d-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/debian-x86_64-phoronix/lkp-csl-2sp7/ior-1.1.1/phoronix-test-suite
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
3626 -22.0% 2828 phoronix-test-suite.ior.2MB./opt/rootfs.mb_s
3586 -21.7% 2808 phoronix-test-suite.ior.2MB.DefaultTestDirectory.mb_s
4487378 -3.5% 4330977 perf-stat.i.cache-misses
76327545 -1.2% 75402170 perf-stat.i.cache-references
292483 ± 3% -16.0% 245543 ± 5% perf-stat.i.node-stores
5.88 -0.1 5.74 perf-stat.overall.cache-miss-rate%
1462 +3.5% 1513 perf-stat.overall.cycles-between-cache-misses
35.88 +4.9 40.77 ± 5% perf-stat.overall.node-store-miss-rate%
4387788 -3.6% 4229799 perf-stat.ps.cache-misses
285917 ± 4% -16.2% 239711 ± 5% perf-stat.ps.node-stores
***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/process/16/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/pread3/will-it-scale
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
6.01 +0.7 6.71 ± 2% mpstat.cpu.all.usr%
0.22 -42.4% 0.13 ± 3% turbostat.IPC
13556369 +11.8% 15162679 ± 2% will-it-scale.16.processes
847272 +11.8% 947666 ± 2% will-it-scale.per_process_ops
13556369 +11.8% 15162679 ± 2% will-it-scale.workload
1.30 ± 2% +75.8% 2.28 ± 3% perf-stat.i.MPKI
4.457e+09 -10.9% 3.969e+09 ± 2% perf-stat.i.branch-instructions
1.15 +0.3 1.41 perf-stat.i.branch-miss-rate%
51357767 +9.3% 56148787 perf-stat.i.branch-misses
1718 -0.8% 1704 perf-stat.i.context-switches
1.27 +76.6% 2.25 ± 2% perf-stat.i.cpi
0.12 +0.1 0.27 perf-stat.i.dTLB-load-miss-rate%
14889205 +10.7% 16475643 ± 2% perf-stat.i.dTLB-load-misses
1.248e+10 -50.3% 6.202e+09 ± 2% perf-stat.i.dTLB-loads
0.00 +0.0 0.00 ± 2% perf-stat.i.dTLB-store-miss-rate%
1.009e+10 -65.0% 3.534e+09 ± 2% perf-stat.i.dTLB-stores
14680606 +14.5% 16815386 ± 5% perf-stat.i.iTLB-load-misses
33983782 ± 5% +14.8% 39004142 ± 4% perf-stat.i.iTLB-loads
3.708e+10 -43.3% 2.101e+10 ± 2% perf-stat.i.instructions
2528 -50.4% 1254 ± 3% perf-stat.i.instructions-per-iTLB-miss
0.78 -43.3% 0.44 ± 2% perf-stat.i.ipc
790.06 ± 3% +5.7% 835.02 ± 2% perf-stat.i.metric.K/sec
259.91 -49.3% 131.77 ± 2% perf-stat.i.metric.M/sec
1.30 ± 2% +75.3% 2.27 ± 3% perf-stat.overall.MPKI
1.15 +0.3 1.41 perf-stat.overall.branch-miss-rate%
1.27 +76.4% 2.25 ± 2% perf-stat.overall.cpi
0.12 +0.1 0.26 perf-stat.overall.dTLB-load-miss-rate%
0.00 +0.0 0.00 ± 2% perf-stat.overall.dTLB-store-miss-rate%
2525 -50.4% 1251 ± 3% perf-stat.overall.instructions-per-iTLB-miss
0.78 -43.3% 0.44 ± 2% perf-stat.overall.ipc
822348 -49.3% 416945 perf-stat.overall.path-length
4.442e+09 -10.9% 3.956e+09 ± 2% perf-stat.ps.branch-instructions
51175390 +9.3% 55945707 perf-stat.ps.branch-misses
14839395 +10.7% 16420381 ± 2% perf-stat.ps.dTLB-load-misses
1.244e+10 -50.3% 6.181e+09 ± 2% perf-stat.ps.dTLB-loads
1.006e+10 -65.0% 3.522e+09 ± 2% perf-stat.ps.dTLB-stores
14630964 +14.5% 16757456 ± 5% perf-stat.ps.iTLB-load-misses
33871661 ± 5% +14.8% 38879673 ± 4% perf-stat.ps.iTLB-loads
3.695e+10 -43.3% 2.094e+10 ± 2% perf-stat.ps.instructions
1.115e+13 -43.3% 6.322e+12 ± 2% perf-stat.total.instructions
11.76 ± 4% -7.2 4.59 ± 36% perf-profile.calltrace.cycles-pp.rep_movs_alternative.copyout._copy_to_iter.copy_page_to_iter.shmem_file_read_iter
12.22 ± 4% -7.1 5.13 ± 31% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.shmem_file_read_iter.vfs_read
12.65 ± 4% -7.0 5.61 ± 28% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.shmem_file_read_iter.vfs_read.__x64_sys_pread64
12.94 ± 4% -7.0 5.94 ± 26% perf-profile.calltrace.cycles-pp.copy_page_to_iter.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
20.53 ± 3% -6.3 14.28 ± 9% perf-profile.calltrace.cycles-pp.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.48 ± 4% -5.7 19.80 ± 6% perf-profile.calltrace.cycles-pp.vfs_read.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread
27.64 ± 3% -5.5 22.16 ± 5% perf-profile.calltrace.cycles-pp.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread
41.36 ± 4% -4.0 37.37 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread
1.09 ± 3% +0.1 1.22 ± 4% perf-profile.calltrace.cycles-pp.__fget_light.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread
1.60 ± 2% +0.1 1.74 ± 3% perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64
0.95 ± 3% +0.2 1.11 ± 10% perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_read.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.62 ± 4% +0.2 1.78 ± 3% perf-profile.calltrace.cycles-pp.touch_atime.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
1.97 ± 4% +0.2 2.18 ± 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_pread
0.35 ± 70% +0.2 0.58 ± 4% perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread
2.85 ± 2% +0.3 3.18 ± 3% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
6.65 ± 4% +0.7 7.35 ± 4% perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_pread
12.30 ± 5% +1.3 13.62 ± 4% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread
13.67 ± 5% +1.5 15.13 ± 4% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_pread
11.77 ± 4% -7.1 4.63 ± 35% perf-profile.children.cycles-pp.rep_movs_alternative
12.47 ± 4% -7.0 5.42 ± 29% perf-profile.children.cycles-pp.copyout
12.67 ± 4% -7.0 5.64 ± 28% perf-profile.children.cycles-pp._copy_to_iter
12.96 ± 4% -7.0 5.96 ± 26% perf-profile.children.cycles-pp.copy_page_to_iter
20.61 ± 3% -6.2 14.38 ± 9% perf-profile.children.cycles-pp.shmem_file_read_iter
25.58 ± 4% -5.7 19.90 ± 5% perf-profile.children.cycles-pp.vfs_read
27.65 ± 3% -5.5 22.18 ± 5% perf-profile.children.cycles-pp.__x64_sys_pread64
41.52 ± 4% -4.0 37.53 ± 2% perf-profile.children.cycles-pp.do_syscall_64
0.54 ± 5% +0.1 0.61 ± 4% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
1.01 ± 4% +0.1 1.12 ± 3% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.09 ± 4% +0.1 1.22 ± 4% perf-profile.children.cycles-pp.__fget_light
1.62 ± 2% +0.1 1.76 ± 3% perf-profile.children.cycles-pp.filemap_get_entry
0.95 ± 3% +0.2 1.12 ± 10% perf-profile.children.cycles-pp.__fsnotify_parent
1.64 ± 4% +0.2 1.82 ± 3% perf-profile.children.cycles-pp.touch_atime
2.89 ± 2% +0.3 3.23 ± 3% perf-profile.children.cycles-pp.shmem_get_folio_gfp
6.55 ± 4% +0.7 7.24 ± 4% perf-profile.children.cycles-pp.__entry_text_start
12.44 ± 5% +1.3 13.76 ± 4% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
13.81 ± 5% +1.5 15.28 ± 4% perf-profile.children.cycles-pp.syscall_return_via_sysret
11.60 ± 4% -7.1 4.46 ± 37% perf-profile.self.cycles-pp.rep_movs_alternative
0.48 ± 5% +0.1 0.55 ± 4% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.45 ± 7% +0.1 0.52 ± 2% perf-profile.self.cycles-pp.current_time
0.26 ± 5% +0.1 0.34 ± 11% perf-profile.self.cycles-pp.xas_load
0.87 ± 4% +0.1 0.97 ± 3% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
1.11 ± 4% +0.1 1.21 ± 3% perf-profile.self.cycles-pp.__libc_pread
0.82 ± 4% +0.1 0.92 ± 5% perf-profile.self.cycles-pp.copyout
1.09 ± 3% +0.1 1.21 ± 4% perf-profile.self.cycles-pp.__fget_light
1.13 ± 3% +0.2 1.29 ± 4% perf-profile.self.cycles-pp.shmem_get_folio_gfp
0.91 ± 4% +0.2 1.07 ± 10% perf-profile.self.cycles-pp.__fsnotify_parent
5.71 ± 4% +0.6 6.31 ± 4% perf-profile.self.cycles-pp.__entry_text_start
7.81 ± 5% +0.9 8.68 ± 5% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
11.95 ± 5% +1.3 13.21 ± 4% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
13.79 ± 5% +1.5 15.26 ± 4% perf-profile.self.cycles-pp.syscall_return_via_sysret
***************************************************************************************************
lkp-hsw-d03: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (Haswell) with 16G memory
=========================================================================================
cluster/compiler/cpufreq_governor/kconfig/nr_threads/protocol/rootfs/runtime/tbox_group/testcase:
cs-localhost/gcc-11/performance/x86_64-rhel-8.3/40%/tcp6/debian-11.1-x86_64-20220510.cgz/300s/lkp-hsw-d03/nepim
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
40382 +5.8% 42719 vmstat.system.cs
29.19 +1.1% 29.51 turbostat.CorWatt
0.28 -45.7% 0.15 turbostat.IPC
51340235 +9.5% 56202298 proc-vmstat.numa_hit
51317600 +9.5% 56198357 proc-vmstat.numa_local
4.088e+08 +9.5% 4.478e+08 proc-vmstat.pgalloc_normal
4.088e+08 +9.5% 4.478e+08 proc-vmstat.pgfree
7501256 +9.5% 8214645 nepim.tcp.avg.kbps_in
7501474 +9.5% 8214867 nepim.tcp.avg.kbps_out
28758 +9.4% 31464 nepim.tcp.avg.rcv_s
28615 +9.5% 31337 nepim.tcp.avg.snd_s
1895 ± 21% +21.9% 2309 ± 13% nepim.time.involuntary_context_switches
14.45 ± 5% -2.3 12.20 ± 5% perf-profile.calltrace.cycles-pp.skb_do_copy_data_nocache.tcp_sendmsg_locked.tcp_sendmsg.sock_write_iter.vfs_write
14.05 ± 4% -2.2 11.86 ± 5% perf-profile.calltrace.cycles-pp._copy_from_iter.skb_do_copy_data_nocache.tcp_sendmsg_locked.tcp_sendmsg.sock_write_iter
13.88 ± 5% -2.1 11.75 ± 5% perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.skb_do_copy_data_nocache.tcp_sendmsg_locked.tcp_sendmsg
13.84 ± 4% -2.1 11.71 ± 5% perf-profile.calltrace.cycles-pp.rep_movs_alternative.copyin._copy_from_iter.skb_do_copy_data_nocache.tcp_sendmsg_locked
34.14 ± 3% -3.4 30.78 ± 3% perf-profile.children.cycles-pp.rep_movs_alternative
14.46 ± 5% -2.3 12.20 ± 5% perf-profile.children.cycles-pp.skb_do_copy_data_nocache
14.06 ± 4% -2.2 11.86 ± 5% perf-profile.children.cycles-pp._copy_from_iter
13.92 ± 4% -2.2 11.76 ± 5% perf-profile.children.cycles-pp.copyin
0.07 ± 21% +0.0 0.11 ± 17% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.03 ±127% +0.1 0.09 ± 36% perf-profile.children.cycles-pp.rcu_all_qs
0.31 ± 6% +0.1 0.38 ± 8% perf-profile.children.cycles-pp.aa_sk_perm
33.97 ± 3% -3.5 30.51 ± 4% perf-profile.self.cycles-pp.rep_movs_alternative
0.15 ± 12% -0.0 0.10 ± 22% perf-profile.self.cycles-pp._copy_from_iter
0.10 ± 13% +0.0 0.14 ± 18% perf-profile.self.cycles-pp.process_backlog
0.07 ± 18% +0.0 0.11 ± 16% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.03 ±124% +0.1 0.09 ± 20% perf-profile.self.cycles-pp.__release_sock
0.32 ± 11% +0.1 0.38 ± 10% perf-profile.self.cycles-pp._raw_spin_lock_bh
0.28 ± 6% +0.1 0.35 ± 9% perf-profile.self.cycles-pp.aa_sk_perm
0.77 ± 12% +0.4 1.19 ± 8% perf-profile.self.cycles-pp.tcp_sendmsg_locked
25.41 +57.6% 40.05 perf-stat.i.MPKI
8.761e+08 -13.6% 7.568e+08 perf-stat.i.branch-instructions
1.71 +0.3 2.03 perf-stat.i.branch-miss-rate%
13.05 ± 2% +3.7 16.74 perf-stat.i.cache-miss-rate%
23909839 +5.5% 25223669 ± 2% perf-stat.i.cache-misses
1.834e+08 -17.8% 1.507e+08 perf-stat.i.cache-references
40658 +5.9% 43055 perf-stat.i.context-switches
1.02 +91.3% 1.95 perf-stat.i.cpi
315.87 -5.2% 299.30 perf-stat.i.cycles-between-cache-misses
0.10 ± 3% +0.1 0.17 perf-stat.i.dTLB-load-miss-rate%
2.492e+09 ± 4% -39.8% 1.5e+09 perf-stat.i.dTLB-loads
0.04 +0.0 0.07 perf-stat.i.dTLB-store-miss-rate%
717270 +9.2% 783320 perf-stat.i.dTLB-store-misses
2.022e+09 -47.7% 1.057e+09 perf-stat.i.dTLB-stores
897381 +7.0% 959782 perf-stat.i.iTLB-loads
7.305e+09 -46.8% 3.888e+09 perf-stat.i.instructions
0.98 -47.3% 0.52 perf-stat.i.ipc
806.16 ± 2% -58.0% 338.57 ± 2% perf-stat.i.metric.K/sec
699.04 ± 2% -37.6% 435.98 perf-stat.i.metric.M/sec
18383434 +28.2% 23575415 perf-stat.i.node-loads
4791695 ± 3% -81.1% 904896 ± 6% perf-stat.i.node-stores
25.11 +54.4% 38.77 perf-stat.overall.MPKI
1.83 +0.3 2.17 perf-stat.overall.branch-miss-rate%
13.04 ± 2% +3.7 16.74 perf-stat.overall.cache-miss-rate%
1.02 +88.1% 1.91 perf-stat.overall.cpi
310.68 -5.1% 294.84 perf-stat.overall.cycles-between-cache-misses
0.10 ± 3% +0.1 0.17 perf-stat.overall.dTLB-load-miss-rate%
0.04 +0.0 0.07 perf-stat.overall.dTLB-store-miss-rate%
0.98 -46.8% 0.52 perf-stat.overall.ipc
8.732e+08 -13.6% 7.543e+08 perf-stat.ps.branch-instructions
23830434 +5.5% 25139918 ± 2% perf-stat.ps.cache-misses
1.828e+08 -17.8% 1.502e+08 perf-stat.ps.cache-references
40523 +5.9% 42913 perf-stat.ps.context-switches
2.484e+09 ± 4% -39.8% 1.495e+09 perf-stat.ps.dTLB-loads
714887 +9.2% 780718 perf-stat.ps.dTLB-store-misses
2.016e+09 -47.7% 1.053e+09 perf-stat.ps.dTLB-stores
894402 +7.0% 956596 perf-stat.ps.iTLB-loads
7.281e+09 -46.8% 3.875e+09 perf-stat.ps.instructions
18322375 +28.2% 23497100 perf-stat.ps.node-loads
4775792 ± 3% -81.1% 901896 ± 6% perf-stat.ps.node-stores
2.194e+12 -46.8% 1.167e+12 perf-stat.total.instructions
***************************************************************************************************
lkp-cfl-e1: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/IMB-MPI1 Exchange/debian-x86_64-phoronix/lkp-cfl-e1/intel-mpi-1.0.1/phoronix-test-suite
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
678.50 ± 17% -45.7% 368.50 ± 62% turbostat.C10
9064 +21.6% 11018 phoronix-test-suite.intel-mpi.IMB-MPI1Exchange.average_mbytes_sec
177.33 -7.6% 163.84 phoronix-test-suite.intel-mpi.IMB-MPI1Exchange.average_usec
8.86 ± 97% -7.8 1.00 ±142% perf-profile.calltrace.cycles-pp.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve
6.77 ± 79% -6.2 0.56 ±223% perf-profile.calltrace.cycles-pp.__mmput.exec_mmap.begin_new_exec.load_elf_binary.search_binary_handler
6.77 ± 79% -6.2 0.56 ±223% perf-profile.calltrace.cycles-pp.exec_mmap.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm
7.08 ± 81% -4.1 2.96 ±158% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
7.08 ± 81% -4.1 2.96 ±158% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
5.80 ± 77% -2.8 2.96 ±158% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
6.14 ± 66% -2.3 3.84 ±142% perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
8.86 ± 97% -7.8 1.00 ±142% perf-profile.children.cycles-pp.begin_new_exec
6.77 ± 79% -6.2 0.56 ±223% perf-profile.children.cycles-pp.exec_mmap
9.06 ± 80% -5.2 3.84 ±142% perf-profile.children.cycles-pp.unmap_vmas
7.08 ± 81% -4.1 2.96 ±158% perf-profile.children.cycles-pp.unmap_page_range
7.08 ± 81% -4.1 2.96 ±158% perf-profile.children.cycles-pp.zap_pmd_range
7.08 ± 81% -4.1 2.96 ±158% perf-profile.children.cycles-pp.zap_pte_range
4.10 ± 62% -2.5 1.59 ±157% perf-profile.children.cycles-pp.release_pages
3.63 ± 85% -1.0 2.62 ±149% perf-profile.children.cycles-pp.tlb_finish_mmu
3.63 ± 85% -1.0 2.62 ±149% perf-profile.children.cycles-pp.tlb_batch_pages_flush
8776475 -7.0% 8158830 perf-stat.i.cache-misses
2.749e+08 -3.7% 2.647e+08 perf-stat.i.cache-references
1.55 +5.2% 1.63 ± 2% perf-stat.i.cpi
9.08e+09 -1.5% 8.944e+09 perf-stat.i.dTLB-loads
0.03 ± 3% +0.0 0.04 ± 4% perf-stat.i.dTLB-store-miss-rate%
4.758e+09 -2.5% 4.64e+09 perf-stat.i.dTLB-stores
216085 -2.3% 211119 perf-stat.i.iTLB-loads
2.562e+10 -1.5% 2.524e+10 perf-stat.i.instructions
1.13 -2.8% 1.10 perf-stat.i.ipc
983.28 -2.5% 958.80 perf-stat.i.metric.M/sec
750185 +203.4% 2276128 perf-stat.i.node-loads
1661980 +28.8% 2141116 perf-stat.i.node-stores
10.73 -2.3% 10.49 perf-stat.overall.MPKI
3.20 -0.1 3.08 perf-stat.overall.cache-miss-rate%
2088 +6.7% 2227 perf-stat.overall.cycles-between-cache-misses
0.00 ± 9% -0.0 0.00 ± 10% perf-stat.overall.node-load-miss-rate%
8684309 -7.1% 8069768 perf-stat.ps.cache-misses
2.718e+08 -3.7% 2.617e+08 perf-stat.ps.cache-references
8.975e+09 -1.5% 8.84e+09 perf-stat.ps.dTLB-loads
4.703e+09 -2.5% 4.587e+09 perf-stat.ps.dTLB-stores
213618 -2.3% 208713 perf-stat.ps.iTLB-loads
2.532e+10 -1.5% 2.495e+10 perf-stat.ps.instructions
742356 +203.3% 2251351 perf-stat.ps.node-loads
1645341 +28.7% 2117834 perf-stat.ps.node-stores
2.261e+12 -1.3% 2.232e+12 perf-stat.total.instructions
***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/readseek1/will-it-scale
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
5492754 ± 14% -17.6% 4524182 ± 14% sched_debug.cfs_rq:/.spread0.max
0.16 -43.8% 0.09 turbostat.IPC
14210 ± 2% -9.0% 12931 turbostat.POLL
26574376 +4.6% 27809208 will-it-scale.52.processes
511045 +4.6% 534792 will-it-scale.per_process_ops
26574376 +4.6% 27809208 will-it-scale.workload
0.06 +79.4% 0.11 ± 12% perf-stat.i.MPKI
1.024e+10 -13.3% 8.877e+09 perf-stat.i.branch-instructions
1.34 +0.3 1.62 perf-stat.i.branch-miss-rate%
1.374e+08 +4.5% 1.436e+08 perf-stat.i.branch-misses
1.80 +71.0% 3.07 perf-stat.i.cpi
0.20 +0.2 0.40 perf-stat.i.dTLB-load-miss-rate%
53209330 +4.6% 55668630 perf-stat.i.dTLB-load-misses
2.683e+10 -48.3% 1.388e+10 perf-stat.i.dTLB-loads
0.00 +0.0 0.00 perf-stat.i.dTLB-store-miss-rate%
37320 +2.7% 38319 perf-stat.i.dTLB-store-misses
2.138e+10 -61.8% 8.175e+09 perf-stat.i.dTLB-stores
58111654 +3.6% 60212884 perf-stat.i.iTLB-load-misses
8.1e+10 -41.5% 4.738e+10 perf-stat.i.instructions
1395 -43.6% 787.58 perf-stat.i.instructions-per-iTLB-miss
0.56 -41.5% 0.33 perf-stat.i.ipc
563.05 -47.0% 298.51 perf-stat.i.metric.M/sec
178317 +3.2% 183996 ± 2% perf-stat.i.node-load-misses
0.06 +79.8% 0.11 ± 11% perf-stat.overall.MPKI
1.34 +0.3 1.62 perf-stat.overall.branch-miss-rate%
1.80 +71.0% 3.07 perf-stat.overall.cpi
0.20 +0.2 0.40 perf-stat.overall.dTLB-load-miss-rate%
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
1394 -43.6% 786.84 perf-stat.overall.instructions-per-iTLB-miss
0.56 -41.5% 0.33 perf-stat.overall.ipc
916671 -44.1% 512426 perf-stat.overall.path-length
1.02e+10 -13.3% 8.847e+09 perf-stat.ps.branch-instructions
1.37e+08 +4.5% 1.431e+08 perf-stat.ps.branch-misses
53029639 +4.6% 55478925 perf-stat.ps.dTLB-load-misses
2.674e+10 -48.3% 1.383e+10 perf-stat.ps.dTLB-loads
37232 +2.6% 38204 perf-stat.ps.dTLB-store-misses
2.131e+10 -61.8% 8.147e+09 perf-stat.ps.dTLB-stores
57916551 +3.6% 60006718 perf-stat.ps.iTLB-load-misses
8.073e+10 -41.5% 4.721e+10 perf-stat.ps.instructions
177742 +3.2% 183382 ± 2% perf-stat.ps.node-load-misses
2.436e+13 -41.5% 1.425e+13 perf-stat.total.instructions
7.47 -3.1 4.40 perf-profile.calltrace.cycles-pp.rep_movs_alternative.copyout._copy_to_iter.copy_page_to_iter.shmem_file_read_iter
7.73 -3.0 4.68 perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.shmem_file_read_iter.vfs_read
8.07 -3.0 5.03 perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.shmem_file_read_iter.vfs_read.ksys_read
8.25 -3.0 5.22 perf-profile.calltrace.cycles-pp.copy_page_to_iter.shmem_file_read_iter.vfs_read.ksys_read.do_syscall_64
12.67 -2.8 9.90 perf-profile.calltrace.cycles-pp.shmem_file_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
15.82 -2.6 13.24 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
16.94 -2.5 14.43 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
26.01 -2.1 23.93 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
30.63 -1.8 28.83 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
45.06 -1.1 43.92 perf-profile.calltrace.cycles-pp.read
0.55 ± 2% +0.0 0.59 ± 2% perf-profile.calltrace.cycles-pp.shmem_file_llseek.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek
0.67 ± 2% +0.0 0.71 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.ksys_read
0.59 ± 2% +0.0 0.63 ± 2% perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.67 ± 3% +0.0 0.72 perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.76 +0.0 0.81 perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.shmem_file_read_iter.vfs_read.ksys_read
0.82 ± 3% +0.1 0.88 perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
0.97 ± 2% +0.1 1.03 perf-profile.calltrace.cycles-pp.touch_atime.shmem_file_read_iter.vfs_read.ksys_read.do_syscall_64
1.60 +0.1 1.70 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.ksys_read.do_syscall_64
1.62 +0.1 1.72 perf-profile.calltrace.cycles-pp.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek
0.78 +0.1 0.90 ± 22% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.read
0.76 ± 2% +0.1 0.90 ± 23% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.llseek
4.14 +0.2 4.33 perf-profile.calltrace.cycles-pp.__entry_text_start.read
4.08 +0.3 4.37 perf-profile.calltrace.cycles-pp.__entry_text_start.llseek
7.80 +0.4 8.15 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
8.57 +0.4 8.97 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.read
7.60 +0.5 8.05 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek
10.43 +0.6 11.04 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek
8.37 +0.6 9.00 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.llseek
15.05 +0.8 15.85 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.llseek
28.78 +2.2 30.95 perf-profile.calltrace.cycles-pp.llseek
7.47 -3.1 4.42 perf-profile.children.cycles-pp.rep_movs_alternative
8.08 -3.0 5.05 perf-profile.children.cycles-pp._copy_to_iter
7.90 -3.0 4.88 perf-profile.children.cycles-pp.copyout
8.26 -3.0 5.24 perf-profile.children.cycles-pp.copy_page_to_iter
12.73 -2.8 9.96 perf-profile.children.cycles-pp.shmem_file_read_iter
15.89 -2.6 13.31 perf-profile.children.cycles-pp.vfs_read
16.99 -2.5 14.49 perf-profile.children.cycles-pp.ksys_read
36.59 -1.5 35.12 perf-profile.children.cycles-pp.do_syscall_64
45.17 -1.1 44.06 perf-profile.children.cycles-pp.read
46.04 -1.0 45.07 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.10 ± 4% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.__cond_resched
0.20 ± 3% +0.0 0.22 perf-profile.children.cycles-pp.folio_test_hugetlb
0.32 +0.0 0.34 perf-profile.children.cycles-pp.__x64_sys_read
0.31 +0.0 0.33 ± 2% perf-profile.children.cycles-pp.__x64_sys_lseek
0.26 ± 3% +0.0 0.29 ± 3% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.35 ± 4% +0.0 0.39 perf-profile.children.cycles-pp.current_time
0.68 +0.0 0.72 ± 2% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.55 ± 2% +0.0 0.59 ± 2% perf-profile.children.cycles-pp.shmem_file_llseek
0.59 +0.0 0.63 ± 2% perf-profile.children.cycles-pp.__fsnotify_parent
0.79 +0.0 0.84 perf-profile.children.cycles-pp.atime_needs_update
0.68 ± 2% +0.0 0.73 perf-profile.children.cycles-pp.filemap_get_entry
1.26 +0.1 1.32 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.99 +0.1 1.05 perf-profile.children.cycles-pp.touch_atime
1.37 ± 3% +0.1 1.44 perf-profile.children.cycles-pp.__fget_light
1.66 ± 3% +0.1 1.76 perf-profile.children.cycles-pp.__fdget_pos
1.64 +0.1 1.73 perf-profile.children.cycles-pp.ksys_lseek
1.54 +0.1 1.63 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
1.62 +0.1 1.72 perf-profile.children.cycles-pp.shmem_get_folio_gfp
8.09 +0.5 8.58 perf-profile.children.cycles-pp.__entry_text_start
15.54 +0.8 16.34 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
17.10 +1.0 18.13 perf-profile.children.cycles-pp.syscall_return_via_sysret
29.24 +1.8 31.07 perf-profile.children.cycles-pp.llseek
7.34 -3.0 4.29 perf-profile.self.cycles-pp.rep_movs_alternative
0.21 +0.0 0.23 ± 3% perf-profile.self.cycles-pp.touch_atime
0.30 +0.0 0.32 perf-profile.self.cycles-pp.__x64_sys_lseek
0.66 +0.0 0.68 perf-profile.self.cycles-pp.llseek
0.20 ± 3% +0.0 0.22 perf-profile.self.cycles-pp.folio_test_hugetlb
0.30 ± 3% +0.0 0.33 ± 3% perf-profile.self.cycles-pp.__fdget_pos
0.17 ± 4% +0.0 0.20 ± 5% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
0.57 ± 2% +0.0 0.60 perf-profile.self.cycles-pp.filemap_get_entry
0.60 +0.0 0.63 ± 2% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.84 +0.0 0.88 perf-profile.self.cycles-pp.do_syscall_64
0.54 ± 4% +0.0 0.57 perf-profile.self.cycles-pp.copyout
0.56 +0.0 0.60 ± 2% perf-profile.self.cycles-pp.__fsnotify_parent
0.83 ± 2% +0.0 0.86 perf-profile.self.cycles-pp.shmem_get_folio_gfp
0.53 ± 2% +0.0 0.57 ± 2% perf-profile.self.cycles-pp.shmem_file_llseek
0.62 +0.0 0.66 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.71 +0.1 0.76 ± 2% perf-profile.self.cycles-pp.read
1.10 +0.1 1.15 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
1.29 +0.1 1.35 ± 2% perf-profile.self.cycles-pp.shmem_file_read_iter
1.32 ± 3% +0.1 1.39 perf-profile.self.cycles-pp.__fget_light
1.05 +0.1 1.14 perf-profile.self.cycles-pp.vfs_read
7.04 +0.4 7.47 perf-profile.self.cycles-pp.__entry_text_start
9.74 +0.5 10.26 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
14.94 +0.8 15.70 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
17.08 +1.0 18.10 perf-profile.self.cycles-pp.syscall_return_via_sysret
***************************************************************************************************
lkp-cfl-e1: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/Write/64MB/8/debian-x86_64-phoronix/lkp-cfl-e1/tiobench-1.3.1/phoronix-test-suite
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
12535 ± 2% +43.5% 17982 ± 2% phoronix-test-suite.tiobench.Write.64MB.8.mb_s
2132097 ± 11% -26.0% 1578355 ± 9% perf-stat.i.node-stores
313.06 ± 7% +19.9% 375.47 ± 2% perf-stat.overall.cycles-between-cache-misses
2086550 ± 12% -26.6% 1531685 ± 11% perf-stat.ps.node-stores
3.75 ±102% -3.7 0.00 perf-profile.calltrace.cycles-pp.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
5.83 ± 78% -3.2 2.68 ±171% perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
5.83 ± 78% -3.2 2.68 ±171% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
8.58 ±107% -8.0 0.60 ±223% perf-profile.children.cycles-pp.tlb_finish_mmu
8.58 ±107% -8.0 0.60 ±223% perf-profile.children.cycles-pp.tlb_batch_pages_flush
8.58 ±107% -6.5 2.11 ±160% perf-profile.children.cycles-pp.release_pages
3.75 ±102% -3.7 0.00 perf-profile.children.cycles-pp.do_cow_fault
***************************************************************************************************
lkp-cfl-e1: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/Random Write/32MB/4/debian-x86_64-phoronix/lkp-cfl-e1/tiobench-1.3.1/phoronix-test-suite
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.70 ± 2% -0.3 1.39 ± 3% mpstat.cpu.all.sys%
32.33 ± 4% -18.6% 26.33 ± 4% phoronix-test-suite.time.percent_of_cpu_this_job_got
107378 +49.5% 160566 ± 2% phoronix-test-suite.tiobench.RandomWrite.32MB.32.mb_s
10052508 ± 3% -16.7% 8372597 ± 4% perf-stat.i.cache-misses
4.595e+09 ± 2% -6.3% 4.304e+09 ± 2% perf-stat.i.cpu-cycles
3.647e+08 ± 2% -5.6% 3.444e+08 ± 2% perf-stat.i.dTLB-stores
0.29 ± 2% -6.3% 0.27 ± 2% perf-stat.i.metric.GHz
0.00 ± 22% +0.0 0.01 ± 11% perf-stat.i.node-load-miss-rate%
5.67 ± 19% +282.5% 21.70 ± 7% perf-stat.i.node-load-misses
6.24 ± 31% +232.9% 20.76 ± 9% perf-stat.i.node-store-misses
3545056 ± 3% -20.5% 2818290 ± 7% perf-stat.i.node-stores
457.85 ± 5% +12.5% 514.94 ± 3% perf-stat.overall.cycles-between-cache-misses
0.00 ± 18% +0.0 0.01 ± 11% perf-stat.overall.node-load-miss-rate%
0.00 ± 29% +0.0 0.00 ± 12% perf-stat.overall.node-store-miss-rate%
9684141 ± 3% -16.7% 8068153 ± 4% perf-stat.ps.cache-misses
4.427e+09 ± 2% -6.3% 4.15e+09 perf-stat.ps.cpu-cycles
3.514e+08 ± 3% -5.5% 3.32e+08 ± 2% perf-stat.ps.dTLB-stores
5.47 ± 19% +282.5% 20.91 ± 7% perf-stat.ps.node-load-misses
6.00 ± 31% +233.3% 20.01 ± 9% perf-stat.ps.node-store-misses
3414551 ± 3% -20.4% 2718058 ± 8% perf-stat.ps.node-stores
8.14 ± 97% -8.1 0.00 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
7.55 ± 83% -7.5 0.00 perf-profile.calltrace.cycles-pp.free_swap_cache.free_pages_and_swap_cache.tlb_batch_pages_flush.tlb_finish_mmu.exit_mmap
6.48 ±137% -6.5 0.00 perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
9.33 ± 62% -6.3 3.02 ±173% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.exit_mmap.__mmput.exit_mm.do_exit
9.33 ± 62% -6.3 3.02 ±173% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.exit_mmap.__mmput.exit_mm
6.82 ± 73% -4.3 2.49 ±164% perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common
8.14 ± 97% -2.7 5.45 ±168% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
5.48 ±114% -2.5 3.02 ±173% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.exit_mmap.__mmput
9.34 ± 91% -2.3 7.04 ±141% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
5.44 ± 81% +0.0 5.45 ±168% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
5.44 ± 81% +0.0 5.45 ±168% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
5.44 ± 81% +0.0 5.45 ±168% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
7.65 ± 65% +0.1 7.78 ±202% perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.65 ± 65% +0.1 7.78 ±202% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.65 ± 65% +0.1 7.78 ±202% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
7.65 ± 65% +0.1 7.78 ±202% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
7.65 ± 65% +0.1 7.78 ±202% perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.67 ± 91% +0.9 5.56 ±223% perf-profile.calltrace.cycles-pp.rep_movs_alternative.copyin.copy_page_from_iter_atomic.generic_perform_write.__generic_file_write_iter
4.67 ± 91% +0.9 5.56 ±223% perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.vfs_write
4.67 ± 91% +0.9 5.56 ±223% perf-profile.calltrace.cycles-pp.copyin.copy_page_from_iter_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
4.40 ±110% +1.0 5.45 ±168% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
8.14 ± 97% -8.1 0.00 perf-profile.children.cycles-pp.do_fault
10.72 ± 74% -7.7 3.02 ±173% perf-profile.children.cycles-pp.tlb_finish_mmu
10.72 ± 74% -7.7 3.02 ±173% perf-profile.children.cycles-pp.tlb_batch_pages_flush
6.48 ±137% -6.5 0.00 perf-profile.children.cycles-pp.do_read_fault
5.23 ± 84% -5.2 0.00 perf-profile.children.cycles-pp.free_pages_and_swap_cache
5.23 ± 84% -5.2 0.00 perf-profile.children.cycles-pp.free_swap_cache
6.82 ± 73% -4.3 2.49 ±164% perf-profile.children.cycles-pp.load_elf_binary
3.97 ± 78% -4.0 0.00 perf-profile.children.cycles-pp.page_cache_ra_unbounded
5.49 ±114% -2.5 3.02 ±173% perf-profile.children.cycles-pp.release_pages
9.34 ± 91% -2.3 7.04 ±141% perf-profile.children.cycles-pp.poll_idle
8.14 ± 97% -2.0 6.09 ±155% perf-profile.children.cycles-pp.__handle_mm_fault
7.65 ± 65% +0.1 7.78 ±202% perf-profile.children.cycles-pp.__x64_sys_exit_group
4.67 ± 91% +0.9 5.56 ±223% perf-profile.children.cycles-pp.rep_movs_alternative
4.67 ± 91% +0.9 5.56 ±223% perf-profile.children.cycles-pp.copy_page_from_iter_atomic
4.67 ± 91% +0.9 5.56 ±223% perf-profile.children.cycles-pp.copyin
9.34 ± 91% -2.3 7.04 ±141% perf-profile.self.cycles-pp.poll_idle
4.67 ± 91% +0.9 5.56 ±223% perf-profile.self.cycles-pp.rep_movs_alternative
***************************************************************************************************
lkp-ivb-2ep1: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_threads/rootfs/tbox_group/test/test_memory_size/testcase:
gcc-11/performance/x86_64-rhel-8.3/development/1/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/UNIX/50%/lmbench3
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
6375 +9.2% 6959 lmbench3.AF_UNIX.sock.stream.bandwidth.MB/sec
597101 ± 9% +65.9% 990600 ± 37% sched_debug.cpu.max_idle_balance_cost.max
16099 ± 65% +411.0% 82275 ± 79% sched_debug.cpu.max_idle_balance_cost.stddev
178220 ± 10% -65.7% 61065 ± 6% turbostat.C1
0.14 ± 4% -0.1 0.06 ± 7% turbostat.C1%
10200875 +9.2% 11144066 ± 2% proc-vmstat.numa_hit
10145413 +8.8% 11033665 ± 2% proc-vmstat.numa_local
46796092 ± 2% +6.2% 49719663 proc-vmstat.pgalloc_normal
46721572 ± 2% +6.3% 49645401 proc-vmstat.pgfree
0.24 ±223% +2.4 2.62 ± 73% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.74 ± 65% -1.9 0.82 ± 86% perf-profile.children.cycles-pp.__schedule
1.44 ± 58% -1.0 0.42 ±109% perf-profile.children.cycles-pp.newidle_balance
0.47 ±115% +1.0 1.44 ± 62% perf-profile.children.cycles-pp._raw_spin_lock
0.84 ± 75% +1.3 2.14 ± 19% perf-profile.children.cycles-pp.vsnprintf
0.72 ± 75% +1.4 2.08 ± 18% perf-profile.children.cycles-pp.seq_printf
1.32 ± 96% +1.6 2.96 ± 23% perf-profile.children.cycles-pp.link_path_walk
0.47 ±115% +1.0 1.44 ± 62% perf-profile.self.cycles-pp._raw_spin_lock
25.20 ± 16% +123.4% 56.28 ± 7% perf-stat.i.MPKI
1.081e+08 ± 4% +35.4% 1.463e+08 ± 6% perf-stat.i.cache-references
0.63 ± 5% +0.1 0.68 ± 2% perf-stat.i.dTLB-load-miss-rate%
1.648e+09 ± 4% -14.7% 1.406e+09 ± 6% perf-stat.i.dTLB-loads
1571133 ± 4% -8.6% 1435719 ± 5% perf-stat.i.dTLB-store-misses
1.187e+09 ± 4% -20.9% 9.395e+08 ± 6% perf-stat.i.dTLB-stores
6.106e+09 ± 4% -22.7% 4.719e+09 ± 5% perf-stat.i.instructions
8412 ± 4% -26.8% 6159 ± 3% perf-stat.i.instructions-per-iTLB-miss
0.66 ± 3% -23.5% 0.51 ± 4% perf-stat.i.ipc
17.70 ± 2% +75.1% 30.98 ± 2% perf-stat.overall.MPKI
4.73 +0.2 4.93 perf-stat.overall.branch-miss-rate%
20.30 ± 3% -5.6 14.73 ± 5% perf-stat.overall.cache-miss-rate%
1.46 ± 2% +30.4% 1.90 ± 2% perf-stat.overall.cpi
0.51 ± 8% +0.1 0.59 ± 5% perf-stat.overall.dTLB-load-miss-rate%
0.13 ± 2% +0.0 0.15 ± 3% perf-stat.overall.dTLB-store-miss-rate%
7715 ± 3% -23.8% 5877 ± 3% perf-stat.overall.instructions-per-iTLB-miss
0.69 ± 2% -23.3% 0.53 ± 2% perf-stat.overall.ipc
1.064e+08 ± 4% +35.4% 1.441e+08 ± 6% perf-stat.ps.cache-references
1.623e+09 ± 4% -14.6% 1.386e+09 ± 6% perf-stat.ps.dTLB-loads
1547538 ± 4% -8.6% 1414466 ± 5% perf-stat.ps.dTLB-store-misses
1.169e+09 ± 4% -20.8% 9.255e+08 ± 6% perf-stat.ps.dTLB-stores
6.015e+09 ± 4% -22.7% 4.651e+09 ± 5% perf-stat.ps.instructions
4.099e+11 ± 2% -21.3% 3.225e+11 ± 4% perf-stat.total.instructions
***************************************************************************************************
lkp-cfl-e1: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/Random Write/64MB/4/debian-x86_64-phoronix/lkp-cfl-e1/tiobench-1.3.1/phoronix-test-suite
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
2.40 -0.5 1.90 ± 5% mpstat.cpu.all.sys%
207080 +51.1% 312861 phoronix-test-suite.tiobench.RandomWrite.64MB.32.mb_s
15.34 ±126% -14.1 1.28 ±223% perf-profile.calltrace.cycles-pp.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
15.34 ±126% -14.1 1.28 ±223% perf-profile.children.cycles-pp.wp_page_copy
79033 ± 4% +15.2% 91012 ± 10% turbostat.C8
12.50 ± 29% -45.3% 6.83 ± 44% turbostat.C9
194443 ± 87% -79.8% 39236 ± 8% sched_debug.cfs_rq:/.load.stddev
12884 ± 8% +34.7% 17354 ± 28% sched_debug.cfs_rq:/.min_vruntime.max
2307 ± 10% +32.4% 3055 ± 11% sched_debug.cfs_rq:/.min_vruntime.stddev
518.00 ± 15% +21.4% 628.86 ± 7% sched_debug.cfs_rq:/.util_avg.avg
279473 ± 12% -31.4% 191767 ± 11% sched_debug.cpu.avg_idle.stddev
9.64 ± 4% -1.4 8.21 ± 11% perf-stat.i.cache-miss-rate%
14015078 ± 2% -21.6% 10993421 ± 4% perf-stat.i.cache-misses
5.125e+09 ± 2% -8.3% 4.699e+09 ± 2% perf-stat.i.cpu-cycles
4.042e+08 ± 2% -6.3% 3.788e+08 ± 2% perf-stat.i.dTLB-stores
1294820 ± 2% +12.8% 1460653 ± 12% perf-stat.i.iTLB-load-misses
1172786 ± 2% +16.2% 1362610 ± 15% perf-stat.i.iTLB-loads
0.32 ± 2% -8.3% 0.29 ± 2% perf-stat.i.metric.GHz
0.00 ± 31% +0.0 0.01 ± 15% perf-stat.i.node-load-miss-rate%
6.62 ± 16% +241.5% 22.62 ± 15% perf-stat.i.node-load-misses
5.72 ± 18% +313.5% 23.65 ± 16% perf-stat.i.node-store-misses
12.94 -3.2 9.77 ± 8% perf-stat.overall.cache-miss-rate%
365.74 +17.0% 427.97 ± 2% perf-stat.overall.cycles-between-cache-misses
2561 -15.3% 2169 ± 22% perf-stat.overall.instructions-per-iTLB-miss
0.00 ± 16% +0.0 0.01 ± 25% perf-stat.overall.node-load-miss-rate%
0.00 ± 17% +0.0 0.00 ± 19% perf-stat.overall.node-store-miss-rate%
13486705 ± 2% -20.9% 10664323 ± 3% perf-stat.ps.cache-misses
4.932e+09 ± 2% -7.5% 4.56e+09 perf-stat.ps.cpu-cycles
3.89e+08 ± 2% -5.5% 3.675e+08 ± 2% perf-stat.ps.dTLB-stores
1245754 ± 2% +14.0% 1419784 ± 14% perf-stat.ps.iTLB-load-misses
1128129 ± 2% +17.4% 1324815 ± 17% perf-stat.ps.iTLB-loads
6.37 ± 16% +244.6% 21.95 ± 15% perf-stat.ps.node-load-misses
5.50 ± 18% +317.3% 22.96 ± 16% perf-stat.ps.node-store-misses
***************************************************************************************************
lkp-cfl-e1: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/Write/64MB/4/debian-x86_64-phoronix/lkp-cfl-e1/tiobench-1.3.1/phoronix-test-suite
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
2.29 ± 2% -0.6 1.73 ± 2% mpstat.cpu.all.sys%
36.00 ± 2% -30.6% 25.00 ± 2% phoronix-test-suite.time.percent_of_cpu_this_job_got
13930 +58.5% 22083 phoronix-test-suite.tiobench.Write.64MB.32.mb_s
15352488 ± 3% -19.6% 12345834 perf-stat.i.cache-misses
67514100 ± 2% -6.2% 63302150 perf-stat.i.cache-references
3.915e+09 ± 3% -11.7% 3.455e+09 ± 3% perf-stat.i.cpu-cycles
2.867e+08 ± 4% -9.7% 2.588e+08 ± 3% perf-stat.i.dTLB-stores
0.46 ± 2% +7.4% 0.49 ± 3% perf-stat.i.ipc
0.24 ± 3% -11.8% 0.22 ± 3% perf-stat.i.metric.GHz
0.00 ± 15% +0.0 0.01 ± 16% perf-stat.i.node-load-miss-rate%
7.27 ± 16% +232.0% 24.13 ± 17% perf-stat.i.node-load-misses
282123 ± 5% +16.2% 327849 ± 4% perf-stat.i.node-loads
8.60 ± 23% +195.5% 25.42 ± 17% perf-stat.i.node-store-misses
6748445 ± 3% -20.7% 5351270 ± 4% perf-stat.i.node-stores
25.14 ± 3% -6.4% 23.54 ± 3% perf-stat.overall.MPKI
22.72 -3.2 19.49 perf-stat.overall.cache-miss-rate%
1.46 ± 2% -11.9% 1.28 perf-stat.overall.cpi
255.14 +9.7% 279.81 ± 2% perf-stat.overall.cycles-between-cache-misses
0.02 ± 3% +0.0 0.03 ± 2% perf-stat.overall.dTLB-store-miss-rate%
0.69 ± 2% +13.5% 0.78 perf-stat.overall.ipc
0.00 ± 15% +0.0 0.01 ± 18% perf-stat.overall.node-load-miss-rate%
0.00 ± 24% +0.0 0.00 ± 20% perf-stat.overall.node-store-miss-rate%
14688148 ± 3% -19.5% 11824921 perf-stat.ps.cache-misses
64639446 ± 2% -6.2% 60656902 perf-stat.ps.cache-references
3.747e+09 ± 3% -11.7% 3.309e+09 ± 2% perf-stat.ps.cpu-cycles
2.743e+08 ± 4% -9.6% 2.479e+08 ± 2% perf-stat.ps.dTLB-stores
6.95 ± 16% +232.7% 23.11 ± 17% perf-stat.ps.node-load-misses
269991 ± 5% +16.4% 314138 ± 4% perf-stat.ps.node-loads
8.22 ± 23% +196.0% 24.34 ± 16% perf-stat.ps.node-store-misses
6457429 ± 3% -20.6% 5127035 ± 4% perf-stat.ps.node-stores
***************************************************************************************************
lkp-hsw-d04: 8 threads 1 sockets Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (Haswell) with 8G memory
=========================================================================================
cluster/compiler/cpufreq_governor/kconfig/nr_threads/protocol/rootfs/runtime/tbox_group/testcase:
cs-localhost/gcc-11/performance/x86_64-rhel-8.3/25%/tcp/debian-11.1-x86_64-20220510.cgz/300s/lkp-hsw-d04/nepim
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
45715 +10.7% 50623 vmstat.system.cs
955743 +9.6% 1047078 sched_debug.cpu.nr_switches.avg
1531197 ± 6% +15.1% 1762126 ± 5% sched_debug.cpu.nr_switches.max
5782803 +12.6% 6513418 ± 2% turbostat.C1
2.44 ± 2% +0.2 2.62 ± 4% turbostat.C1%
0.28 -45.7% 0.15 turbostat.IPC
50588669 +10.4% 55825973 proc-vmstat.numa_hit
50591236 +10.4% 55832674 proc-vmstat.numa_local
25784 ± 3% +13.8% 29331 proc-vmstat.pgactivate
4.029e+08 +10.3% 4.445e+08 proc-vmstat.pgalloc_normal
4.028e+08 +10.3% 4.445e+08 proc-vmstat.pgfree
11101771 +10.3% 12243890 nepim.tcp.avg.kbps_in
11101986 +10.3% 12244118 nepim.tcp.avg.kbps_out
42677 +10.4% 47099 nepim.tcp.avg.rcv_s
42350 +10.3% 46707 nepim.tcp.avg.snd_s
1924 ± 27% -44.9% 1061 ± 40% nepim.time.involuntary_context_switches
76.00 ± 3% -6.8% 70.80 ± 3% nepim.time.percent_of_cpu_this_job_got
196.47 ± 5% -11.4% 174.00 ± 5% nepim.time.system_time
32.86 ± 7% +21.7% 40.00 ± 5% nepim.time.user_time
3702890 ± 27% +57.8% 5842771 ± 15% nepim.time.voluntary_context_switches
25.11 +59.0% 39.91 perf-stat.i.MPKI
8.617e+08 -12.8% 7.514e+08 perf-stat.i.branch-instructions
1.71 +0.2 1.95 perf-stat.i.branch-miss-rate%
12.34 +3.9 16.28 ± 2% perf-stat.i.cache-miss-rate%
21804011 ± 2% +10.2% 24021171 perf-stat.i.cache-misses
1.768e+08 -16.5% 1.477e+08 perf-stat.i.cache-references
46033 +10.7% 50946 perf-stat.i.context-switches
1.03 +90.8% 1.96 perf-stat.i.cpi
340.91 -9.0% 310.31 perf-stat.i.cycles-between-cache-misses
0.10 ± 7% +0.1 0.19 ± 7% perf-stat.i.dTLB-load-miss-rate%
2.379e+09 ± 2% -37.5% 1.486e+09 ± 3% perf-stat.i.dTLB-loads
0.04 +0.0 0.08 perf-stat.i.dTLB-store-miss-rate%
772508 +9.2% 843638 perf-stat.i.dTLB-store-misses
2.009e+09 -47.8% 1.049e+09 ± 2% perf-stat.i.dTLB-stores
1029493 +7.2% 1104044 perf-stat.i.iTLB-loads
7.126e+09 -46.3% 3.828e+09 perf-stat.i.instructions
5394 ± 12% -50.5% 2671 ± 17% perf-stat.i.instructions-per-iTLB-miss
0.97 -47.1% 0.51 perf-stat.i.ipc
654.57 ± 3% -47.7% 342.06 perf-stat.i.metric.K/sec
680.50 -36.5% 432.13 perf-stat.i.metric.M/sec
18110220 +24.9% 22615284 perf-stat.i.node-loads
3374408 ± 5% -78.6% 723088 ± 5% perf-stat.i.node-stores
24.81 +55.5% 38.59 perf-stat.overall.MPKI
1.83 +0.3 2.09 perf-stat.overall.branch-miss-rate%
12.33 +3.9 16.26 ± 2% perf-stat.overall.cache-miss-rate%
1.02 +87.4% 1.92 perf-stat.overall.cpi
335.05 -8.6% 306.10 perf-stat.overall.cycles-between-cache-misses
0.10 ± 7% +0.1 0.18 ± 7% perf-stat.overall.dTLB-load-miss-rate%
0.04 +0.0 0.08 perf-stat.overall.dTLB-store-miss-rate%
5293 ± 12% -51.6% 2561 ± 18% perf-stat.overall.instructions-per-iTLB-miss
0.98 -46.6% 0.52 perf-stat.overall.ipc
8.588e+08 -12.8% 7.489e+08 perf-stat.ps.branch-instructions
21731682 ± 2% +10.2% 23941402 perf-stat.ps.cache-misses
1.762e+08 -16.5% 1.472e+08 perf-stat.ps.cache-references
45881 +10.7% 50777 perf-stat.ps.context-switches
2.371e+09 ± 2% -37.5% 1.481e+09 ± 3% perf-stat.ps.dTLB-loads
769946 +9.2% 840836 perf-stat.ps.dTLB-store-misses
2.002e+09 -47.8% 1.046e+09 ± 2% perf-stat.ps.dTLB-stores
1026078 +7.2% 1100384 perf-stat.ps.iTLB-loads
7.102e+09 -46.3% 3.815e+09 perf-stat.ps.instructions
18050146 +24.9% 22540180 perf-stat.ps.node-loads
3363213 ± 5% -78.6% 720692 ± 5% perf-stat.ps.node-stores
2.143e+12 -46.3% 1.15e+12 perf-stat.total.instructions
20.63 ± 4% -2.7 17.94 ± 4% perf-profile.calltrace.cycles-pp.rep_movs_alternative.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
20.76 ± 4% -2.7 18.08 ± 4% perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_recvmsg
20.67 ± 4% -2.7 17.99 ± 4% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvmsg_locked
29.55 ± 3% -2.3 27.28 ± 4% perf-profile.calltrace.cycles-pp.sock_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64
29.67 ± 3% -2.3 27.40 ± 4% perf-profile.calltrace.cycles-pp.sock_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
30.11 ± 3% -2.2 27.87 ± 4% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
30.29 ± 3% -2.2 28.10 ± 4% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read.oop_sys_run_once
1.20 ± 3% +0.2 1.36 ± 6% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.oop_sys_run_once
1.09 ± 10% +0.2 1.29 ± 5% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__select
1.08 ± 10% +0.2 1.29 ± 9% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.sigprocmask.main
2.32 ± 10% +0.4 2.71 ± 5% perf-profile.calltrace.cycles-pp.sigprocmask.main.__libc_start_main
2.72 ± 10% +0.4 3.16 ± 4% perf-profile.calltrace.cycles-pp.__libc_start_main
2.69 ± 10% +0.4 3.13 ± 4% perf-profile.calltrace.cycles-pp.main.__libc_start_main
1.70 ± 12% +0.6 2.26 ± 4% perf-profile.calltrace.cycles-pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
6.89 ± 8% +1.2 8.12 ± 9% perf-profile.calltrace.cycles-pp.__select
34.20 ± 2% -3.7 30.50 ± 4% perf-profile.children.cycles-pp.rep_movs_alternative
20.77 ± 4% -2.7 18.08 ± 4% perf-profile.children.cycles-pp._copy_to_iter
20.68 ± 4% -2.7 17.99 ± 4% perf-profile.children.cycles-pp.copyout
22.03 ± 4% -2.7 19.37 ± 4% perf-profile.children.cycles-pp.skb_copy_datagram_iter
22.00 ± 4% -2.6 19.35 ± 4% perf-profile.children.cycles-pp.__skb_datagram_iter
29.17 ± 3% -2.3 26.83 ± 4% perf-profile.children.cycles-pp.tcp_recvmsg
26.95 ± 3% -2.3 24.62 ± 4% perf-profile.children.cycles-pp.tcp_recvmsg_locked
29.55 ± 3% -2.3 27.29 ± 4% perf-profile.children.cycles-pp.sock_recvmsg
29.68 ± 3% -2.3 27.41 ± 4% perf-profile.children.cycles-pp.sock_read_iter
30.14 ± 3% -2.2 27.90 ± 4% perf-profile.children.cycles-pp.vfs_read
30.33 ± 3% -2.2 28.14 ± 4% perf-profile.children.cycles-pp.ksys_read
0.37 ± 9% -0.1 0.30 ± 8% perf-profile.children.cycles-pp.tcp_queue_rcv
0.30 ± 10% -0.1 0.24 ± 6% perf-profile.children.cycles-pp.tcp_try_coalesce
0.09 ± 23% -0.1 0.04 ± 87% perf-profile.children.cycles-pp.alloc_pages
0.21 ± 7% +0.0 0.25 ± 8% perf-profile.children.cycles-pp.tcp_current_mss
0.01 ±200% +0.1 0.07 ± 15% perf-profile.children.cycles-pp.perf_rotate_context
0.11 ± 15% +0.1 0.17 ± 10% perf-profile.children.cycles-pp.__fdelt_warn
0.29 ± 4% +0.1 0.35 ± 9% perf-profile.children.cycles-pp.security_socket_recvmsg
0.64 ± 5% +0.1 0.73 ± 3% perf-profile.children.cycles-pp.mem_cgroup_charge_skmem
0.48 ± 7% +0.1 0.58 ± 9% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.34 ± 16% +0.1 0.45 ± 20% perf-profile.children.cycles-pp.poll_freewait
1.70 ± 2% +0.2 1.90 ± 4% perf-profile.children.cycles-pp.syscall_return_via_sysret
2.39 ± 10% +0.4 2.79 ± 5% perf-profile.children.cycles-pp.sigprocmask
2.72 ± 10% +0.4 3.16 ± 4% perf-profile.children.cycles-pp.__libc_start_main
2.72 ± 10% +0.4 3.16 ± 4% perf-profile.children.cycles-pp.main
4.71 ± 6% +0.7 5.38 ± 5% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
6.98 ± 8% +1.2 8.20 ± 9% perf-profile.children.cycles-pp.__select
33.94 ± 2% -3.7 30.23 ± 4% perf-profile.self.cycles-pp.rep_movs_alternative
0.07 ± 14% +0.0 0.08 ± 9% perf-profile.self.cycles-pp.tcp_data_queue
0.06 ± 7% +0.0 0.08 ± 11% perf-profile.self.cycles-pp.apparmor_socket_sendmsg
0.21 ± 15% +0.0 0.25 ± 6% perf-profile.self.cycles-pp.vfs_write
0.29 ± 9% +0.0 0.33 ± 3% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.04 ± 50% +0.0 0.09 ± 42% perf-profile.self.cycles-pp.datagram_poll
0.04 ± 83% +0.0 0.09 ± 11% perf-profile.self.cycles-pp.__fdelt_chk@plt
0.26 ± 7% +0.1 0.31 ± 11% perf-profile.self.cycles-pp.aa_sk_perm
0.16 ± 20% +0.1 0.22 ± 12% perf-profile.self.cycles-pp.tcp_rcv_established
0.14 ± 27% +0.1 0.19 ± 13% perf-profile.self.cycles-pp.skb_page_frag_refill
0.05 ± 85% +0.1 0.12 ± 14% perf-profile.self.cycles-pp.enqueue_entity
1.70 ± 3% +0.2 1.89 ± 4% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.72 ± 6% +0.5 1.23 ± 7% perf-profile.self.cycles-pp.tcp_sendmsg_locked
4.59 ± 6% +0.7 5.26 ± 5% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
***************************************************************************************************
lkp-cfl-e1: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/Write/32MB/4/debian-x86_64-phoronix/lkp-cfl-e1/tiobench-1.3.1/phoronix-test-suite
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.57 ± 2% -0.3 1.25 ± 2% mpstat.cpu.all.sys%
273236 +1.5% 277363 proc-vmstat.pgfault
20.83 -26.4% 15.33 ± 3% phoronix-test-suite.time.percent_of_cpu_this_job_got
13836 +57.3% 21761 phoronix-test-suite.tiobench.Write.32MB.32.mb_s
4.99 ± 79% -3.9 1.04 ±223% perf-profile.calltrace.cycles-pp.next_uptodate_page.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
4.99 ± 79% -3.9 1.04 ±223% perf-profile.children.cycles-pp.next_uptodate_page
4.71 ± 77% -3.1 1.62 ±145% perf-profile.children.cycles-pp.__fput
4.71 ± 77% -3.1 1.62 ±145% perf-profile.children.cycles-pp.task_work_run
8.88 ± 48% +12.2 21.06 ± 44% perf-profile.children.cycles-pp.__mmput
8.88 ± 48% +12.2 21.06 ± 44% perf-profile.children.cycles-pp.exit_mmap
4.99 ± 79% -3.9 1.04 ±223% perf-profile.self.cycles-pp.next_uptodate_page
12.51 ± 2% -1.2 11.32 perf-stat.i.cache-miss-rate%
10967720 -14.1% 9423790 ± 2% perf-stat.i.cache-misses
62094269 -4.0% 59580926 perf-stat.i.cache-references
3.296e+09 ± 2% -7.4% 3.052e+09 ± 2% perf-stat.i.cpu-cycles
0.47 ± 2% +5.1% 0.49 ± 2% perf-stat.i.ipc
0.21 ± 2% -7.5% 0.19 ± 2% perf-stat.i.metric.GHz
0.00 ± 12% +0.0 0.01 ± 15% perf-stat.i.node-load-miss-rate%
6.58 ± 13% +230.0% 21.71 ± 15% perf-stat.i.node-load-misses
268579 ± 3% +7.6% 288964 ± 3% perf-stat.i.node-loads
7.18 ± 35% +215.4% 22.65 ± 12% perf-stat.i.node-store-misses
3675951 -23.7% 2805440 ± 2% perf-stat.i.node-stores
24.86 -3.2% 24.06 perf-stat.overall.MPKI
17.66 -1.9 15.81 perf-stat.overall.cache-miss-rate%
1.32 ± 2% -6.7% 1.23 perf-stat.overall.cpi
300.56 ± 2% +7.7% 323.83 perf-stat.overall.cycles-between-cache-misses
0.03 ± 2% +0.0 0.03 ± 2% perf-stat.overall.dTLB-store-miss-rate%
0.76 ± 2% +7.1% 0.81 perf-stat.overall.ipc
0.00 ± 13% +0.0 0.01 ± 13% perf-stat.overall.node-load-miss-rate%
0.00 ± 35% +0.0 0.00 ± 13% perf-stat.overall.node-store-miss-rate%
10494435 -14.0% 9020387 ± 2% perf-stat.ps.cache-misses
59412983 -4.0% 57045833 perf-stat.ps.cache-references
3.154e+09 ± 2% -7.4% 2.921e+09 ± 2% perf-stat.ps.cpu-cycles
6.29 ± 13% +230.1% 20.77 ± 15% perf-stat.ps.node-load-misses
256921 ± 3% +7.7% 276598 ± 3% perf-stat.ps.node-loads
6.87 ± 35% +215.5% 21.68 ± 12% perf-stat.ps.node-store-misses
3518377 -23.7% 2685938 ± 2% perf-stat.ps.node-stores
***************************************************************************************************
lkp-cfl-e1: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/Random Write/64MB/8/debian-x86_64-phoronix/lkp-cfl-e1/tiobench-1.3.1/phoronix-test-suite
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
203084 +45.1% 294676 ± 2% phoronix-test-suite.tiobench.RandomWrite.64MB.8.mb_s
1819666 ± 8% -19.4% 1466212 ± 9% perf-stat.i.node-stores
361.37 ± 3% +13.6% 410.37 ± 2% perf-stat.overall.cycles-between-cache-misses
1775294 ± 9% -19.6% 1427506 ± 10% perf-stat.ps.node-stores
36.61 ± 4% -23.8 12.85 ±143% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
34.62 ± 9% -22.8 11.80 ±141% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
34.08 ± 5% -22.3 11.80 ±141% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
34.08 ± 5% -22.3 11.80 ±141% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
34.87 ± 7% -22.0 12.85 ±143% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
34.87 ± 7% -22.0 12.85 ±143% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
34.87 ± 7% -22.0 12.85 ±143% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
31.16 ± 14% -19.4 11.80 ±141% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
5.67 ± 72% -5.7 0.00 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
4.14 ± 72% -4.1 0.00 perf-profile.calltrace.cycles-pp.rep_movs_alternative.copyin.copy_page_from_iter_atomic.generic_perform_write.__generic_file_write_iter
12.06 ± 49% -3.2 8.89 ±147% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
4.14 ± 72% -3.1 1.04 ±223% perf-profile.calltrace.cycles-pp.copyin.copy_page_from_iter_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
4.14 ± 72% -3.1 1.04 ±223% perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.vfs_write
7.45 ± 64% -3.1 4.38 ±168% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
7.45 ± 64% -3.1 4.38 ±168% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
7.80 ± 61% -2.6 5.21 ±175% perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common
11.40 ± 23% -2.0 9.38 ±195% perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
11.40 ± 23% -2.0 9.38 ±195% perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.98 ± 54% -1.1 8.89 ±147% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
9.98 ± 54% -1.1 8.89 ±147% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
9.98 ± 54% -1.1 8.89 ±147% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
10.21 ± 27% -0.8 9.38 ±195% perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
10.21 ± 27% -0.8 9.38 ±195% perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
10.21 ± 27% -0.8 9.38 ±195% perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve
4.14 ± 72% -0.8 3.33 ±223% perf-profile.calltrace.cycles-pp.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
9.46 ± 62% -0.6 8.89 ±147% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
5.72 ± 87% -0.5 5.21 ±175% perf-profile.calltrace.cycles-pp.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve
7.60 ± 63% +0.1 7.71 ±189% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
7.60 ± 63% +0.1 7.71 ±189% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
7.34 ± 50% +0.2 7.50 ±142% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
7.34 ± 50% +0.2 7.50 ±142% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.get_signal
6.21 ± 70% +1.5 7.71 ±189% perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
6.21 ± 70% +1.5 7.71 ±189% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
35.81 ± 5% -24.0 11.80 ±141% perf-profile.children.cycles-pp.cpuidle_idle_call
35.81 ± 5% -24.0 11.80 ±141% perf-profile.children.cycles-pp.cpuidle_enter
35.81 ± 5% -24.0 11.80 ±141% perf-profile.children.cycles-pp.cpuidle_enter_state
36.61 ± 4% -23.8 12.85 ±143% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
36.61 ± 4% -23.8 12.85 ±143% perf-profile.children.cycles-pp.cpu_startup_entry
36.61 ± 4% -23.8 12.85 ±143% perf-profile.children.cycles-pp.do_idle
34.87 ± 7% -22.0 12.85 ±143% perf-profile.children.cycles-pp.start_secondary
31.16 ± 14% -19.4 11.80 ±141% perf-profile.children.cycles-pp.intel_idle
5.67 ± 72% -5.7 0.00 perf-profile.children.cycles-pp.do_fault
5.18 ± 83% -5.2 0.00 perf-profile.children.cycles-pp.rep_movs_alternative
12.58 ± 44% -3.7 8.89 ±147% perf-profile.children.cycles-pp.exc_page_fault
12.58 ± 44% -3.7 8.89 ±147% perf-profile.children.cycles-pp.do_user_addr_fault
12.06 ± 49% -3.2 8.89 ±147% perf-profile.children.cycles-pp.__handle_mm_fault
12.06 ± 49% -3.2 8.89 ±147% perf-profile.children.cycles-pp.handle_mm_fault
4.14 ± 72% -3.1 1.04 ±223% perf-profile.children.cycles-pp.copyin
4.14 ± 72% -3.1 1.04 ±223% perf-profile.children.cycles-pp.copy_page_from_iter_atomic
7.80 ± 61% -2.6 5.21 ±175% perf-profile.children.cycles-pp.load_elf_binary
11.40 ± 23% -2.0 9.38 ±195% perf-profile.children.cycles-pp.__x64_sys_execve
11.40 ± 23% -2.0 9.38 ±195% perf-profile.children.cycles-pp.do_execveat_common
10.21 ± 27% -0.8 9.38 ±195% perf-profile.children.cycles-pp.bprm_execve
10.21 ± 27% -0.8 9.38 ±195% perf-profile.children.cycles-pp.exec_binprm
10.21 ± 27% -0.8 9.38 ±195% perf-profile.children.cycles-pp.search_binary_handler
4.14 ± 72% -0.8 3.33 ±223% perf-profile.children.cycles-pp.wp_page_copy
5.72 ± 87% -0.5 5.21 ±175% perf-profile.children.cycles-pp.begin_new_exec
8.12 ± 54% -0.4 7.71 ±189% perf-profile.children.cycles-pp.zap_pte_range
8.12 ± 54% -0.4 7.71 ±189% perf-profile.children.cycles-pp.unmap_vmas
8.12 ± 54% -0.4 7.71 ±189% perf-profile.children.cycles-pp.unmap_page_range
8.12 ± 54% -0.4 7.71 ±189% perf-profile.children.cycles-pp.zap_pmd_range
4.26 ± 74% +1.3 5.56 ±223% perf-profile.children.cycles-pp.task_work_run
31.16 ± 14% -19.4 11.80 ±141% perf-profile.self.cycles-pp.intel_idle
4.14 ± 72% -4.1 0.00 perf-profile.self.cycles-pp.rep_movs_alternative
***************************************************************************************************
lkp-cfl-d2: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/need_x/option_a/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/true/500px PutImage Square/debian-x86_64-phoronix/lkp-cfl-d2/x11perf-1.1.1/phoronix-test-suite
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
59773 -3.3% 57826 proc-vmstat.pgreuse
9554 ± 11% -26.3% 7038 ± 20% sched_debug.cfs_rq:/.min_vruntime.min
237.81 ± 14% -15.9% 199.90 ± 2% uptime.boot
2378 ± 12% -14.6% 2031 ± 2% uptime.idle
1992 +6.1% 2114 vmstat.io.bi
18164 +2.3% 18590 vmstat.system.in
150.06 -5.7% 141.56 phoronix-test-suite.time.elapsed_time
150.06 -5.7% 141.56 phoronix-test-suite.time.elapsed_time.max
108.51 -9.2% 98.54 phoronix-test-suite.time.system_time
5626 +8.4% 6101 phoronix-test-suite.x11perf.500pxPutImageSquare.operations___second
1.90 ± 2% -0.2 1.66 ± 3% turbostat.C1%
2.85 +0.2 3.00 turbostat.C1E%
0.07 +0.0 0.08 turbostat.CPUGFX%
4.27 ± 4% -22.1% 3.33 turbostat.Pkg%pc2
1.72 +1.6% 1.75 turbostat.RAMWatt
40.01 ± 3% -3.4 36.60 ± 4% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
1.76 ± 4% -0.7 1.04 ± 21% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
1.71 ± 3% -0.7 1.00 ± 21% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
1.40 ± 2% -0.6 0.80 ± 19% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
42.14 ± 3% -3.6 38.57 ± 4% perf-profile.children.cycles-pp.cpuidle_idle_call
40.22 ± 3% -3.5 36.72 ± 4% perf-profile.children.cycles-pp.cpuidle_enter_state
40.25 ± 3% -3.5 36.76 ± 4% perf-profile.children.cycles-pp.cpuidle_enter
2.08 ± 3% -0.8 1.30 ± 18% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
2.01 ± 4% -0.8 1.24 ± 18% perf-profile.children.cycles-pp.hrtimer_interrupt
1.66 ± 4% -0.6 1.01 ± 17% perf-profile.children.cycles-pp.__hrtimer_run_queues
1.05 ± 7% -0.4 0.60 ± 20% perf-profile.children.cycles-pp.tick_sched_timer
0.90 ± 9% -0.4 0.54 ± 19% perf-profile.children.cycles-pp.tick_sched_handle
0.78 ± 11% -0.3 0.50 ± 18% perf-profile.children.cycles-pp.update_process_times
0.44 ± 11% -0.2 0.27 ± 21% perf-profile.children.cycles-pp.scheduler_tick
0.26 ± 18% -0.1 0.14 ± 30% perf-profile.children.cycles-pp.tick_irq_enter
0.26 ± 18% -0.1 0.14 ± 28% perf-profile.children.cycles-pp.irq_enter_rcu
0.14 ± 13% -0.1 0.06 ± 54% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.11 ± 18% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.update_irq_load_avg
0.23 ± 18% -0.1 0.16 ± 13% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.11 ± 21% -0.1 0.04 ± 72% perf-profile.children.cycles-pp.rcu_pending
0.18 ± 16% -0.1 0.11 ± 31% perf-profile.children.cycles-pp.update_rq_clock_task
0.09 ± 17% -0.1 0.03 ±101% perf-profile.children.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.16 ± 17% -0.0 0.12 ± 17% perf-profile.children.cycles-pp.perf_rotate_context
0.10 ± 10% -0.0 0.06 ± 53% perf-profile.children.cycles-pp.tick_nohz_stop_idle
0.64 ± 17% +0.2 0.81 ± 8% perf-profile.children.cycles-pp._raw_spin_lock
0.34 ± 18% -0.1 0.24 ± 32% perf-profile.self.cycles-pp.cpuidle_enter_state
0.11 ± 18% -0.1 0.04 ± 71% perf-profile.self.cycles-pp.update_irq_load_avg
0.12 ± 33% -0.1 0.06 ± 52% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.08 ± 14% -0.1 0.03 ±101% perf-profile.self.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.62 ± 18% +0.2 0.80 ± 7% perf-profile.self.cycles-pp._raw_spin_lock
184.41 +40.9% 259.89 ± 2% perf-stat.i.MPKI
8.998e+08 -9.4% 8.148e+08 perf-stat.i.branch-instructions
1.83 ± 2% +0.1 1.94 ± 2% perf-stat.i.branch-miss-rate%
15153824 ± 5% +6.2% 16088855 ± 3% perf-stat.i.branch-misses
1.497e+09 -7.3% 1.388e+09 perf-stat.i.cache-references
1.29 ± 3% +33.7% 1.73 perf-stat.i.cpi
7.394e+09 +1.3% 7.488e+09 perf-stat.i.cpu-cycles
8076 ± 3% +6.3% 8588 perf-stat.i.cycles-between-cache-misses
2.321e+09 -42.5% 1.334e+09 perf-stat.i.dTLB-loads
1.992e+09 -50.7% 9.825e+08 perf-stat.i.dTLB-stores
7.709e+09 -32.2% 5.227e+09 perf-stat.i.instructions
15364 ± 14% -28.2% 11033 ± 11% perf-stat.i.instructions-per-iTLB-miss
0.99 -32.0% 0.68 perf-stat.i.ipc
2.37 ± 2% +6.3% 2.53 ± 2% perf-stat.i.major-faults
0.62 +1.3% 0.62 perf-stat.i.metric.GHz
559.06 -32.6% 376.59 perf-stat.i.metric.M/sec
2727 +3.0% 2810 perf-stat.i.minor-faults
176938 ± 4% +8.6% 192161 perf-stat.i.node-stores
2730 +3.0% 2813 perf-stat.i.page-faults
194.17 +36.8% 265.59 ± 2% perf-stat.overall.MPKI
1.68 ± 4% +0.3 1.97 ± 2% perf-stat.overall.branch-miss-rate%
0.17 ± 4% +0.0 0.19 ± 4% perf-stat.overall.cache-miss-rate%
0.96 +49.4% 1.43 perf-stat.overall.cpi
0.03 ± 5% +0.0 0.05 ± 5% perf-stat.overall.dTLB-load-miss-rate%
0.00 ± 17% +0.0 0.01 ± 28% perf-stat.overall.dTLB-store-miss-rate%
11594 ± 19% -31.3% 7967 ± 11% perf-stat.overall.instructions-per-iTLB-miss
1.04 -33.0% 0.70 perf-stat.overall.ipc
8.938e+08 -9.5% 8.09e+08 perf-stat.ps.branch-instructions
15056160 ± 5% +6.1% 15976861 ± 3% perf-stat.ps.branch-misses
1.487e+09 -7.3% 1.378e+09 perf-stat.ps.cache-references
7.345e+09 +1.2% 7.435e+09 perf-stat.ps.cpu-cycles
2.305e+09 -42.5% 1.325e+09 perf-stat.ps.dTLB-loads
1.978e+09 -50.7% 9.756e+08 perf-stat.ps.dTLB-stores
7.657e+09 -32.2% 5.19e+09 perf-stat.ps.instructions
2.36 ± 2% +6.2% 2.51 ± 2% perf-stat.ps.major-faults
2710 +3.0% 2791 perf-stat.ps.minor-faults
175787 ± 4% +8.6% 190826 perf-stat.ps.node-stores
2712 +3.0% 2793 perf-stat.ps.page-faults
1.158e+12 -36.1% 7.395e+11 perf-stat.total.instructions
***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/pread1/will-it-scale
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
19.54 +2.8 22.30 mpstat.cpu.all.usr%
117438 ± 69% -66.1% 39831 ± 4% numa-meminfo.node1.FilePages
29350 ± 69% -66.1% 9943 ± 4% numa-vmstat.node1.nr_file_pages
0.23 -43.5% 0.13 turbostat.IPC
43809356 +14.1% 49998501 will-it-scale.52.threads
842487 +14.1% 961509 will-it-scale.per_thread_ops
43809356 +14.1% 49998501 will-it-scale.workload
264681 ± 21% -23.5% 202416 ± 2% sched_debug.cpu.clock.avg
264689 ± 21% -23.5% 202424 ± 2% sched_debug.cpu.clock.max
264673 ± 21% -23.5% 202407 ± 2% sched_debug.cpu.clock.min
260519 ± 21% -23.0% 200690 ± 3% sched_debug.cpu.clock_task.avg
262320 ± 21% -23.3% 201218 ± 2% sched_debug.cpu.clock_task.max
566960 ± 12% -11.5% 501642 sched_debug.cpu.max_idle_balance_cost.max
8195 ±120% -98.0% 160.33 ±186% sched_debug.cpu.max_idle_balance_cost.stddev
264673 ± 21% -23.5% 202407 ± 2% sched_debug.cpu_clk
264068 ± 21% -23.6% 201802 ± 2% sched_debug.ktime
0.04 +68.4% 0.07 ± 2% perf-stat.i.MPKI
1.447e+10 -8.8% 1.321e+10 perf-stat.i.branch-instructions
0.94 +0.2 1.17 perf-stat.i.branch-miss-rate%
1.36e+08 +13.6% 1.544e+08 perf-stat.i.branch-misses
1.22 +73.6% 2.12 perf-stat.i.cpi
0.11 +0.1 0.24 perf-stat.i.dTLB-load-miss-rate%
43870070 +14.1% 50047738 perf-stat.i.dTLB-load-misses
4.026e+10 -49.3% 2.04e+10 perf-stat.i.dTLB-loads
0.00 +0.0 0.00 perf-stat.i.dTLB-store-miss-rate%
32603 +8.4% 35341 perf-stat.i.dTLB-store-misses
3.274e+10 -63.9% 1.183e+10 perf-stat.i.dTLB-stores
63088748 ± 2% +13.6% 71686582 perf-stat.i.iTLB-load-misses
1.19e+11 -42.4% 6.854e+10 perf-stat.i.instructions
1900 ± 2% -49.3% 963.97 perf-stat.i.instructions-per-iTLB-miss
0.82 -42.4% 0.47 perf-stat.i.ipc
841.01 -48.0% 436.92 perf-stat.i.metric.M/sec
0.04 +68.7% 0.07 perf-stat.overall.MPKI
0.94 +0.2 1.17 perf-stat.overall.branch-miss-rate%
1.22 +73.7% 2.12 perf-stat.overall.cpi
0.11 +0.1 0.24 perf-stat.overall.dTLB-load-miss-rate%
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
1888 ± 2% -49.3% 957.20 perf-stat.overall.instructions-per-iTLB-miss
0.82 -42.4% 0.47 perf-stat.overall.ipc
817740 -49.6% 412119 perf-stat.overall.path-length
1.442e+10 -8.8% 1.316e+10 perf-stat.ps.branch-instructions
1.355e+08 +13.6% 1.539e+08 perf-stat.ps.branch-misses
43722670 +14.1% 49880640 perf-stat.ps.dTLB-load-misses
4.013e+10 -49.3% 2.034e+10 perf-stat.ps.dTLB-loads
32528 +8.4% 35259 perf-stat.ps.dTLB-store-misses
3.263e+10 -63.9% 1.179e+10 perf-stat.ps.dTLB-stores
62833990 ± 2% +13.6% 71389669 perf-stat.ps.iTLB-load-misses
1.186e+11 -42.4% 6.832e+10 perf-stat.ps.instructions
3.582e+13 -42.5% 2.061e+13 perf-stat.total.instructions
12.31 -8.9 3.41 perf-profile.calltrace.cycles-pp.rep_movs_alternative.copyout._copy_to_iter.copy_page_to_iter.shmem_file_read_iter
12.74 -8.8 3.94 perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.shmem_file_read_iter.vfs_read
13.30 -8.7 4.58 perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.shmem_file_read_iter.vfs_read.__x64_sys_pread64
13.59 -8.7 4.90 perf-profile.calltrace.cycles-pp.copy_page_to_iter.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
20.90 -7.8 13.12 perf-profile.calltrace.cycles-pp.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.72 -7.2 18.51 perf-profile.calltrace.cycles-pp.vfs_read.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread64
28.82 -6.8 22.05 perf-profile.calltrace.cycles-pp.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread64
43.04 -5.0 38.08 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread64
50.67 -3.9 46.78 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pread64
0.54 ± 2% +0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.folio_unlock.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
0.54 +0.1 0.62 ± 2% perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread64
1.13 ± 2% +0.1 1.26 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64
0.98 +0.1 1.12 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_read.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.24 +0.1 1.38 ± 2% perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.shmem_file_read_iter.vfs_read.__x64_sys_pread64
1.59 +0.2 1.77 ± 2% perf-profile.calltrace.cycles-pp.touch_atime.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
0.35 ± 70% +0.2 0.58 ± 3% perf-profile.calltrace.cycles-pp.current_time.atime_needs_update.touch_atime.shmem_file_read_iter.vfs_read
1.60 +0.3 1.87 ± 5% perf-profile.calltrace.cycles-pp.__fget_light.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread64
1.99 ± 2% +0.3 2.27 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_pread64
2.67 +0.3 3.00 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
0.00 +0.5 0.53 ± 2% perf-profile.calltrace.cycles-pp.fput.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread64
6.78 +0.9 7.70 perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_pread64
12.88 +1.7 14.56 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pread64
14.10 +1.9 15.96 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_pread64
12.31 -8.9 3.44 perf-profile.children.cycles-pp.rep_movs_alternative
13.04 -8.8 4.28 perf-profile.children.cycles-pp.copyout
13.32 -8.7 4.61 perf-profile.children.cycles-pp._copy_to_iter
13.61 -8.7 4.92 perf-profile.children.cycles-pp.copy_page_to_iter
21.00 -7.8 13.23 perf-profile.children.cycles-pp.shmem_file_read_iter
25.82 -7.2 18.62 perf-profile.children.cycles-pp.vfs_read
28.83 -6.8 22.07 perf-profile.children.cycles-pp.__x64_sys_pread64
43.17 -4.9 38.24 perf-profile.children.cycles-pp.do_syscall_64
50.98 -3.9 47.12 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.11 ± 3% +0.0 0.13 ± 3% perf-profile.children.cycles-pp.folio_mark_accessed
0.11 +0.0 0.12 ± 4% perf-profile.children.cycles-pp.__pthread_enable_asynccancel
0.09 ± 4% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.rw_verify_area
0.15 ± 3% +0.0 0.18 ± 2% perf-profile.children.cycles-pp.__cond_resched
0.34 ± 2% +0.0 0.38 ± 2% perf-profile.children.cycles-pp.folio_test_hugetlb
0.24 ± 12% +0.0 0.29 ± 2% perf-profile.children.cycles-pp.aa_file_perm
0.17 ± 10% +0.1 0.23 ± 21% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.47 +0.1 0.53 ± 2% perf-profile.children.cycles-pp.fput
0.54 ± 2% +0.1 0.61 ± 2% perf-profile.children.cycles-pp.folio_unlock
0.57 +0.1 0.65 ± 2% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.58 ± 2% +0.1 0.66 ± 4% perf-profile.children.cycles-pp.current_time
1.16 ± 2% +0.1 1.29 perf-profile.children.cycles-pp.filemap_get_entry
0.99 +0.1 1.13 perf-profile.children.cycles-pp.__fsnotify_parent
1.03 +0.1 1.18 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.28 +0.2 1.44 ± 2% perf-profile.children.cycles-pp.atime_needs_update
1.24 ± 2% +0.2 1.41 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
1.61 +0.2 1.80 ± 2% perf-profile.children.cycles-pp.touch_atime
1.60 +0.3 1.87 ± 5% perf-profile.children.cycles-pp.__fget_light
2.70 +0.3 3.04 perf-profile.children.cycles-pp.shmem_get_folio_gfp
6.68 +0.9 7.58 perf-profile.children.cycles-pp.__entry_text_start
12.97 +1.7 14.63 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
14.24 +1.9 16.11 perf-profile.children.cycles-pp.syscall_return_via_sysret
12.09 -8.9 3.23 perf-profile.self.cycles-pp.rep_movs_alternative
0.11 +0.0 0.12 ± 3% perf-profile.self.cycles-pp.__pthread_enable_asynccancel
0.09 ± 4% +0.0 0.10 ± 3% perf-profile.self.cycles-pp.rw_verify_area
0.14 ± 2% +0.0 0.16 ± 4% perf-profile.self.cycles-pp.testcase
0.36 ± 4% +0.0 0.39 ± 2% perf-profile.self.cycles-pp.touch_atime
0.34 ± 2% +0.0 0.38 ± 2% perf-profile.self.cycles-pp.folio_test_hugetlb
0.22 ± 14% +0.0 0.26 ± 3% perf-profile.self.cycles-pp.aa_file_perm
0.30 ± 4% +0.0 0.35 ± 2% perf-profile.self.cycles-pp._copy_to_iter
0.46 ± 2% +0.0 0.52 perf-profile.self.cycles-pp.current_time
0.47 +0.1 0.52 perf-profile.self.cycles-pp.fput
0.57 ± 3% +0.1 0.63 ± 2% perf-profile.self.cycles-pp.do_syscall_64
0.48 +0.1 0.55 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.50 ± 2% +0.1 0.57 ± 2% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.58 ± 2% +0.1 0.65 ± 2% perf-profile.self.cycles-pp.atime_needs_update
0.54 ± 2% +0.1 0.61 ± 2% perf-profile.self.cycles-pp.folio_unlock
0.13 ± 12% +0.1 0.20 ± 24% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
0.91 +0.1 1.01 perf-profile.self.cycles-pp.__x64_sys_pread64
0.95 ± 2% +0.1 1.06 perf-profile.self.cycles-pp.filemap_get_entry
0.90 +0.1 1.03 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.89 +0.1 1.02 ± 2% perf-profile.self.cycles-pp.copyout
0.94 +0.1 1.07 perf-profile.self.cycles-pp.__fsnotify_parent
1.24 +0.2 1.42 perf-profile.self.cycles-pp.__libc_pread64
1.37 +0.2 1.56 perf-profile.self.cycles-pp.shmem_get_folio_gfp
1.76 +0.2 2.00 ± 2% perf-profile.self.cycles-pp.vfs_read
2.13 ± 2% +0.3 2.39 perf-profile.self.cycles-pp.shmem_file_read_iter
1.59 +0.3 1.86 ± 5% perf-profile.self.cycles-pp.__fget_light
5.81 +0.8 6.60 perf-profile.self.cycles-pp.__entry_text_start
8.05 +1.1 9.16 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
12.47 +1.6 14.06 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
14.21 +1.9 16.08 perf-profile.self.cycles-pp.syscall_return_via_sysret
***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/readseek1/will-it-scale
commit:
0d85b27b0c ("Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
47ee3f1dd9 ("x86: re-introduce support for ERMS copies for user space accesses")
0d85b27b0cc6b5cf 47ee3f1dd93bcbe031539b1ecda
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.01 ± 3% +0.0 0.01 ± 4% mpstat.cpu.all.soft%
53591 -2.8% 52083 proc-vmstat.pgactivate
0.10 -40.0% 0.06 turbostat.IPC
5.12 ± 38% +1410.7% 77.33 ±138% sched_debug.cfs_rq:/.removed.load_avg.avg
28.32 ± 18% +2025.5% 601.98 ±136% sched_debug.cfs_rq:/.removed.load_avg.stddev
30834962 +7.8% 33231698 will-it-scale.104.threads
296489 +7.8% 319535 will-it-scale.per_thread_ops
30834962 +7.8% 33231698 will-it-scale.workload
0.04 ± 5% +295.1% 0.17 ±129% perf-stat.i.MPKI
1.331e+10 -8.9% 1.212e+10 perf-stat.i.branch-instructions
1.38 +0.3 1.71 perf-stat.i.branch-miss-rate%
1.834e+08 +12.5% 2.063e+08 perf-stat.i.branch-misses
2.90 +59.9% 4.64 perf-stat.i.cpi
165.59 +2.2% 169.19 perf-stat.i.cpu-migrations
0.19 +0.2 0.37 perf-stat.i.dTLB-load-miss-rate%
61714012 +7.5% 66344685 perf-stat.i.dTLB-load-misses
3.26e+10 -44.4% 1.812e+10 perf-stat.i.dTLB-loads
0.00 +0.0 0.00 ± 50% perf-stat.i.dTLB-store-miss-rate%
60456 +2.3% 61857 perf-stat.i.dTLB-store-misses
2.576e+10 -58.1% 1.079e+10 perf-stat.i.dTLB-stores
92796911 ± 4% +14.6% 1.063e+08 perf-stat.i.iTLB-load-misses
79032012 ± 3% +16.7% 92191583 ± 2% perf-stat.i.iTLB-loads
9.913e+10 -37.5% 6.199e+10 perf-stat.i.instructions
1073 ± 4% -45.4% 585.88 perf-stat.i.instructions-per-iTLB-miss
0.34 -37.4% 0.22 perf-stat.i.ipc
784.59 ± 4% +9.7% 860.49 ± 4% perf-stat.i.metric.K/sec
689.14 -42.7% 394.59 perf-stat.i.metric.M/sec
10907 ± 6% -8.6% 9974 ± 6% perf-stat.i.node-stores
0.04 ± 5% +68.8% 0.07 ± 7% perf-stat.overall.MPKI
1.38 +0.3 1.70 perf-stat.overall.branch-miss-rate%
2.90 +59.8% 4.64 perf-stat.overall.cpi
0.19 +0.2 0.36 perf-stat.overall.dTLB-load-miss-rate%
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
1070 ± 4% -45.5% 583.23 perf-stat.overall.instructions-per-iTLB-miss
0.34 -37.4% 0.22 perf-stat.overall.ipc
967334 -41.9% 562134 perf-stat.overall.path-length
1.326e+10 -8.9% 1.208e+10 perf-stat.ps.branch-instructions
1.828e+08 +12.5% 2.056e+08 perf-stat.ps.branch-misses
165.01 +2.1% 168.48 perf-stat.ps.cpu-migrations
61509195 +7.5% 66124018 perf-stat.ps.dTLB-load-misses
3.249e+10 -44.4% 1.806e+10 perf-stat.ps.dTLB-loads
60319 +2.3% 61694 perf-stat.ps.dTLB-store-misses
2.568e+10 -58.1% 1.075e+10 perf-stat.ps.dTLB-stores
92483996 ± 4% +14.6% 1.059e+08 perf-stat.ps.iTLB-load-misses
78775497 ± 3% +16.6% 91888236 ± 2% perf-stat.ps.iTLB-loads
9.88e+10 -37.5% 6.179e+10 perf-stat.ps.instructions
2.983e+13 -37.4% 1.868e+13 perf-stat.total.instructions
32.98 -7.8 25.20 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
43.20 -7.4 35.79 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
26.83 -6.8 20.03 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
48.41 -6.1 42.34 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
9.66 -5.8 3.90 ± 2% perf-profile.calltrace.cycles-pp.rep_movs_alternative.copyout._copy_to_iter.copy_page_to_iter.shmem_file_read_iter
20.63 -5.5 15.14 perf-profile.calltrace.cycles-pp.shmem_file_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
10.11 -5.1 4.96 ± 5% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.shmem_file_read_iter.vfs_read
10.70 -5.0 5.69 ± 2% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.shmem_file_read_iter.vfs_read.ksys_read
11.03 -5.0 6.07 ± 2% perf-profile.calltrace.cycles-pp.copy_page_to_iter.shmem_file_read_iter.vfs_read.ksys_read.do_syscall_64
64.33 -4.1 60.23 perf-profile.calltrace.cycles-pp.__libc_read
3.50 -0.6 2.92 ± 2% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.ksys_read.do_syscall_64
1.24 ± 4% -0.6 0.67 ± 3% perf-profile.calltrace.cycles-pp.mutex_unlock.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
1.34 ± 2% -0.5 0.79 ± 3% perf-profile.calltrace.cycles-pp.fput.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
1.30 ± 2% -0.5 0.84 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.19 ± 2% -0.3 0.86 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.ksys_read
12.58 -0.3 12.25 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_lseek64
1.60 ± 4% -0.2 1.44 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.41 ± 4% -0.1 1.26 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
2.35 +0.1 2.42 perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.88 +0.1 3.01 perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
4.08 +0.3 4.34 perf-profile.calltrace.cycles-pp.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_lseek64
9.05 +0.3 9.32 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
0.26 ±100% +0.3 0.56 ± 2% perf-profile.calltrace.cycles-pp.current_time.atime_needs_update.touch_atime.shmem_file_read_iter.vfs_read
3.28 +0.3 3.62 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_read
7.39 +0.7 8.06 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_lseek64
3.86 +0.8 4.68 perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_lseek64
3.09 +0.9 4.03 perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
12.47 +1.0 13.48 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_lseek64
9.73 +1.3 11.00 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
17.06 +1.4 18.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_lseek64
2.23 +4.1 6.30 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_lseek64
35.19 +4.1 39.27 perf-profile.calltrace.cycles-pp.__libc_lseek64
33.06 -7.8 25.28 perf-profile.children.cycles-pp.ksys_read
26.95 -6.8 20.14 perf-profile.children.cycles-pp.vfs_read
55.83 -6.4 49.44 perf-profile.children.cycles-pp.do_syscall_64
20.76 -5.5 15.26 perf-profile.children.cycles-pp.shmem_file_read_iter
9.68 -5.3 4.36 ± 2% perf-profile.children.cycles-pp.rep_movs_alternative
10.30 -5.2 5.09 ± 2% perf-profile.children.cycles-pp.copyout
10.71 -5.0 5.72 ± 2% perf-profile.children.cycles-pp._copy_to_iter
11.05 -5.0 6.10 ± 2% perf-profile.children.cycles-pp.copy_page_to_iter
65.89 -4.6 61.25 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
64.93 -4.1 60.84 perf-profile.children.cycles-pp.__libc_read
1.02 ± 5% -0.6 0.42 ± 3% perf-profile.children.cycles-pp.fsnotify_perm
3.54 -0.6 2.95 ± 2% perf-profile.children.cycles-pp.shmem_get_folio_gfp
1.77 ± 3% -0.5 1.25 ± 3% perf-profile.children.cycles-pp.fput
1.82 ± 4% -0.5 1.32 ± 4% perf-profile.children.cycles-pp.mutex_unlock
1.34 -0.5 0.87 perf-profile.children.cycles-pp.__fsnotify_parent
1.22 ± 2% -0.3 0.88 perf-profile.children.cycles-pp.filemap_get_entry
1.62 ± 3% -0.2 1.45 perf-profile.children.cycles-pp.security_file_permission
1.42 ± 4% -0.1 1.28 perf-profile.children.cycles-pp.apparmor_file_permission
0.24 ± 2% -0.1 0.16 ± 4% perf-profile.children.cycles-pp.aa_file_perm
0.22 ± 15% -0.1 0.16 ± 13% perf-profile.children.cycles-pp.make_vfsuid
0.26 -0.1 0.21 ± 3% perf-profile.children.cycles-pp.xas_load
0.16 ± 3% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.xas_start
0.32 -0.0 0.30 perf-profile.children.cycles-pp.__cond_resched
0.19 +0.0 0.20 perf-profile.children.cycles-pp.folio_test_hugetlb
0.28 ± 2% +0.0 0.30 perf-profile.children.cycles-pp.folio_unlock
0.09 ± 6% +0.0 0.11 ± 5% perf-profile.children.cycles-pp.make_vfsgid
0.29 +0.0 0.31 ± 2% perf-profile.children.cycles-pp.testcase
0.19 ± 3% +0.0 0.21 ± 6% perf-profile.children.cycles-pp.generic_file_llseek_size
0.08 ± 4% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.read@plt
0.41 +0.0 0.45 ± 2% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.54 ± 3% +0.1 0.59 ± 2% perf-profile.children.cycles-pp.current_time
1.08 +0.1 1.17 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.64 ± 2% +0.2 0.80 perf-profile.children.cycles-pp.syscall_enter_from_user_mode
4.14 +0.3 4.40 perf-profile.children.cycles-pp.ksys_lseek
16.52 +1.0 17.47 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
22.43 +1.0 23.39 perf-profile.children.cycles-pp.syscall_return_via_sysret
3.03 +2.3 5.31 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
8.91 +3.3 12.23 perf-profile.children.cycles-pp.__entry_text_start
35.78 +4.2 39.95 perf-profile.children.cycles-pp.__libc_lseek64
9.53 -5.3 4.24 ± 2% perf-profile.self.cycles-pp.rep_movs_alternative
1.01 ± 5% -0.6 0.42 ± 3% perf-profile.self.cycles-pp.fsnotify_perm
1.73 ± 3% -0.5 1.22 ± 2% perf-profile.self.cycles-pp.fput
1.75 ± 4% -0.5 1.25 ± 4% perf-profile.self.cycles-pp.mutex_unlock
1.30 -0.5 0.83 ± 2% perf-profile.self.cycles-pp.__fsnotify_parent
0.96 ± 2% -0.3 0.67 perf-profile.self.cycles-pp.filemap_get_entry
2.13 ± 2% -0.2 1.90 ± 3% perf-profile.self.cycles-pp.shmem_get_folio_gfp
0.22 ± 4% -0.1 0.14 ± 3% perf-profile.self.cycles-pp.aa_file_perm
0.20 ± 14% -0.1 0.14 ± 11% perf-profile.self.cycles-pp.make_vfsuid
0.14 ± 2% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.xas_start
0.20 ± 4% -0.0 0.18 ± 3% perf-profile.self.cycles-pp.security_file_permission
0.10 ± 3% -0.0 0.08 perf-profile.self.cycles-pp.xas_load
0.22 ± 2% -0.0 0.20 ± 2% perf-profile.self.cycles-pp.__cond_resched
0.08 ± 5% +0.0 0.10 ± 3% perf-profile.self.cycles-pp.make_vfsgid
0.26 ± 3% +0.0 0.29 perf-profile.self.cycles-pp.folio_unlock
0.19 ± 3% +0.0 0.21 ± 4% perf-profile.self.cycles-pp.generic_file_llseek_size
0.33 ± 3% +0.0 0.36 perf-profile.self.cycles-pp.ksys_lseek
0.38 +0.0 0.41 perf-profile.self.cycles-pp.exit_to_user_mode_prepare
0.47 +0.0 0.51 ± 2% perf-profile.self.cycles-pp.__fdget_pos
0.42 ± 5% +0.1 0.48 ± 2% perf-profile.self.cycles-pp.current_time
0.29 ± 3% +0.1 0.35 ± 2% perf-profile.self.cycles-pp.copy_page_to_iter
1.03 +0.1 1.08 perf-profile.self.cycles-pp.__libc_read
0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.read@plt
0.96 +0.1 1.03 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.72 +0.1 0.82 perf-profile.self.cycles-pp.copyout
0.70 +0.1 0.80 perf-profile.self.cycles-pp.__libc_lseek64
0.52 ± 2% +0.1 0.64 perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.54 ± 2% +0.2 0.70 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.42 +0.2 0.64 ± 7% perf-profile.self.cycles-pp._copy_to_iter
15.71 +0.9 16.60 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
22.41 +1.0 23.37 perf-profile.self.cycles-pp.syscall_return_via_sysret
10.40 +1.8 12.15 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
7.90 +3.2 11.13 perf-profile.self.cycles-pp.__entry_text_start
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
View attachment "config-6.4.0-rc3-00191-g47ee3f1dd93b" of type "text/plain" (158675 bytes)
View attachment "job-script" of type "text/plain" (7681 bytes)
View attachment "job.yaml" of type "text/plain" (5208 bytes)
View attachment "reproduce" of type "text/plain" (254 bytes)
Powered by blists - more mailing lists