[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202305041446.71d46724-yujie.liu@intel.com>
Date: Thu, 4 May 2023 16:09:12 +0800
From: kernel test robot <yujie.liu@...el.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
<linux-kernel@...r.kernel.org>, <ying.huang@...el.com>,
<feng.tang@...el.com>, <fengwei.yin@...el.com>
Subject: [linus:master] [x86] adfcf4231b: blogbench.read_score -10.9%
regression
Hello,
kernel test robot noticed a -10.9% regression of blogbench.read_score on:
commit: adfcf4231b8cbc2d9c1e7bfaa965b907e60639eb ("x86: don't use REP_GOOD or ERMS for user memory copies")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: blogbench
test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
parameters:
disk: 1SSD
fs: xfs
cpufreq_governor: performance
test-description: Blogbench is a portable filesystem benchmark that tries to reproduce the load of a real-world busy file server.
test-url: https://www.pureftpd.org/project/blogbench
In addition to that, the commit also has impact on the following tests:
+------------------+------------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.aio-stress.RandomWrite.mb_s 21.4% improvement |
| test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory |
| test parameters | cpufreq_governor=performance |
| | test=aio-stress-1.1.1 |
+------------------+------------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.incompact3d.input.i3d129CellsPerDirection.seconds 3.6% regression |
| test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=input.i3d 129 Cells Per Direction [Intended For Small Systems] |
| | test=incompact3d-2.0.2 |
+------------------+------------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.ior.2MB.DefaultTestDirectory.mb_s 25.7% improvement |
| test machine | 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory |
| test parameters | cpufreq_governor=performance |
| | test=ior-1.1.1 |
+------------------+------------------------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.iozone.1MB.512MB.WritePerformance.mb_s 20.4% improvement |
| test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=1MB |
| | option_b=512MB |
| | option_c=Write Performance |
| | test=iozone-1.9.5 |
+------------------+------------------------------------------------------------------------------------------------------------+
Since there are both improvements and regressions in various micro
benchmarks, we are sending this report to provide some data and hope
this can be helpful for further investigation. Thanks.
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202305041446.71d46724-yujie.liu@intel.com
Details are as below:
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/rootfs/tbox_group/testcase:
gcc-11/performance/1SSD/xfs/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/blogbench
commit:
20f3337d35 ("x86: don't use REP_GOOD or ERMS for small memory clearing")
adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies")
20f3337d350c4e1b adfcf4231b8cbc2d9c1e7bfaa96
---------------- ---------------------------
%stddev %change %stddev
\ | \
2071761 ± 4% -10.9% 1846004 blogbench.read_score
8916 ± 6% -11.0% 7932 ± 8% blogbench.time.minor_page_faults
215.23 ± 4% +12.6% 242.40 ± 2% blogbench.time.user_time
0.87 ± 4% +11.9% 0.98 ± 2% iostat.cpu.user
2.25 ± 10% -0.4 1.85 ± 6% mpstat.cpu.all.soft%
16.03 ± 8% -33.3% 10.69 ± 15% sched_debug.cfs_rq:/.load_avg.min
18.59 +8.4% 20.16 turbostat.RAMWatt
180697 -3.0% 175305 vmstat.io.bo
15025 ± 4% +6.8% 16048 ± 3% proc-vmstat.nr_active_anon
20720 ± 3% +6.0% 21962 ± 2% proc-vmstat.nr_shmem
306941 -2.6% 298909 proc-vmstat.nr_slab_unreclaimable
10016536 -3.3% 9689882 proc-vmstat.nr_written
15025 ± 4% +6.8% 16048 ± 3% proc-vmstat.nr_zone_active_anon
22736515 ± 2% -5.3% 21542328 proc-vmstat.numa_hit
22656328 ± 2% -5.3% 21460298 proc-vmstat.numa_local
31365890 ± 2% -7.0% 29177083 proc-vmstat.pgalloc_normal
18590396 ± 4% -10.5% 16641000 proc-vmstat.pgfree
54574610 -2.9% 52984224 proc-vmstat.pgpgout
15.65 ± 2% +66.7% 26.09 perf-stat.i.MPKI
6.909e+09 -8.9% 6.296e+09 perf-stat.i.branch-instructions
0.41 ± 2% +0.0 0.46 perf-stat.i.branch-miss-rate%
74.77 -14.7 60.06 perf-stat.i.cache-miss-rate%
3.627e+08 ± 3% +37.6% 4.992e+08 ± 2% perf-stat.i.cache-misses
4.842e+08 ± 3% +71.3% 8.297e+08 ± 2% perf-stat.i.cache-references
7.87 -2.9% 7.64 perf-stat.i.cpi
916.29 ± 2% -4.4% 875.61 perf-stat.i.cpu-migrations
673.07 ± 4% -27.6% 487.55 perf-stat.i.cycles-between-cache-misses
8.042e+09 +10.5% 8.884e+09 perf-stat.i.dTLB-loads
0.02 ± 25% -0.0 0.01 ± 11% perf-stat.i.dTLB-store-miss-rate%
2.369e+09 ± 3% +67.1% 3.958e+09 perf-stat.i.dTLB-stores
3.092e+10 +2.8% 3.179e+10 perf-stat.i.instructions
0.13 +3.1% 0.14 perf-stat.i.ipc
0.11 ± 16% -31.5% 0.08 ± 19% perf-stat.i.major-faults
202.45 +12.1% 226.90 perf-stat.i.metric.M/sec
73.58 +1.4 74.93 perf-stat.i.node-load-miss-rate%
83206825 ± 3% -56.1% 36533214 perf-stat.i.node-load-misses
29812010 ± 4% -59.1% 12206523 ± 2% perf-stat.i.node-loads
72.86 +20.2 93.08 perf-stat.i.node-store-miss-rate%
13767301 ± 3% +415.9% 71022712 ± 2% perf-stat.i.node-store-misses
15.66 ± 2% +66.7% 26.10 perf-stat.overall.MPKI
0.41 ± 2% +0.0 0.46 perf-stat.overall.branch-miss-rate%
74.90 -14.7 60.16 perf-stat.overall.cache-miss-rate%
7.87 -2.8% 7.65 perf-stat.overall.cpi
672.29 ± 4% -27.5% 487.55 perf-stat.overall.cycles-between-cache-misses
0.02 ± 25% -0.0 0.01 ± 11% perf-stat.overall.dTLB-store-miss-rate%
0.13 +2.9% 0.13 perf-stat.overall.ipc
73.62 +1.3 74.95 perf-stat.overall.node-load-miss-rate%
72.98 +20.3 93.32 perf-stat.overall.node-store-miss-rate%
6.887e+09 -8.9% 6.277e+09 perf-stat.ps.branch-instructions
3.615e+08 ± 3% +37.6% 4.975e+08 ± 2% perf-stat.ps.cache-misses
4.827e+08 ± 3% +71.3% 8.27e+08 ± 2% perf-stat.ps.cache-references
913.07 ± 2% -4.4% 872.48 perf-stat.ps.cpu-migrations
8.017e+09 +10.5% 8.857e+09 perf-stat.ps.dTLB-loads
2.362e+09 ± 3% +67.1% 3.946e+09 perf-stat.ps.dTLB-stores
3.082e+10 +2.8% 3.169e+10 perf-stat.ps.instructions
0.11 ± 16% -31.7% 0.08 ± 19% perf-stat.ps.major-faults
82933317 ± 3% -56.1% 36413295 perf-stat.ps.node-load-misses
29716685 ± 4% -59.0% 12171311 ± 2% perf-stat.ps.node-loads
13721351 ± 3% +415.9% 70784762 ± 2% perf-stat.ps.node-store-misses
9.266e+12 +3.0% 9.547e+12 perf-stat.total.instructions
50.60 -6.5 44.11 perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__open64
50.58 -6.5 44.10 perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__open64
50.69 -6.5 44.22 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__open64
50.68 -6.5 44.20 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__open64
50.78 -6.4 44.33 perf-profile.calltrace.cycles-pp.__open64
41.42 -5.8 35.60 perf-profile.calltrace.cycles-pp._raw_spin_lock.alloc_fd.do_sys_openat2.__x64_sys_openat.do_syscall_64
41.75 -5.8 35.93 perf-profile.calltrace.cycles-pp.alloc_fd.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
41.20 -5.8 35.40 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.alloc_fd.do_sys_openat2.__x64_sys_openat
36.32 -5.5 30.85 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
35.94 -5.5 30.47 perf-profile.calltrace.cycles-pp._raw_spin_lock.close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
36.11 -5.5 30.64 perf-profile.calltrace.cycles-pp.close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
35.74 -5.4 30.30 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.close_fd.__x64_sys_close.do_syscall_64
37.31 -5.4 31.87 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
37.32 -5.4 31.89 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
37.39 -5.4 31.96 perf-profile.calltrace.cycles-pp.__close
5.56 ± 2% -0.7 4.84 perf-profile.calltrace.cycles-pp._raw_spin_lock.put_unused_fd.do_sys_openat2.__x64_sys_openat.do_syscall_64
5.58 ± 2% -0.7 4.86 perf-profile.calltrace.cycles-pp.put_unused_fd.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.52 ± 2% -0.7 4.81 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.put_unused_fd.do_sys_openat2.__x64_sys_openat
2.47 ± 6% -0.4 2.07 ± 6% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork
2.46 ± 6% -0.4 2.06 ± 6% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
2.45 ± 6% -0.4 2.06 ± 6% perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
2.45 ± 6% -0.4 2.06 ± 6% perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
2.44 ± 6% -0.4 2.05 ± 6% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
1.84 ± 5% -0.3 1.54 ± 5% perf-profile.calltrace.cycles-pp.file_free_rcu.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd
0.00 +2.9 2.87 ± 5% perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_to_iter.copy_page_to_iter.filemap_read
0.00 +2.9 2.94 ± 5% perf-profile.calltrace.cycles-pp.__might_fault._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read
4.73 ± 3% +9.0 13.76 ± 4% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read
5.22 ± 4% +11.7 16.96 ± 4% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter
5.27 ± 4% +11.8 17.06 ± 4% perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read
6.44 ± 4% +12.2 18.63 ± 4% perf-profile.calltrace.cycles-pp.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64
6.28 ± 4% +12.2 18.48 ± 4% perf-profile.calltrace.cycles-pp.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read
6.60 ± 4% +12.2 18.81 ± 4% perf-profile.calltrace.cycles-pp.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.16 ± 3% +12.2 19.38 ± 4% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
7.20 ± 3% +12.2 19.42 ± 4% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
7.22 ± 3% +12.2 19.44 ± 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
6.87 ± 3% +12.2 19.12 ± 4% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
7.32 ± 3% +12.3 19.59 ± 4% perf-profile.calltrace.cycles-pp.__libc_read
0.00 +13.6 13.63 ± 4% perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled.copyout._copy_to_iter.copy_page_to_iter.filemap_read
83.12 -12.0 71.10 perf-profile.children.cycles-pp._raw_spin_lock
82.49 -12.0 70.54 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
50.60 -6.5 44.11 perf-profile.children.cycles-pp.__x64_sys_openat
50.58 -6.5 44.10 perf-profile.children.cycles-pp.do_sys_openat2
50.80 -6.5 44.34 perf-profile.children.cycles-pp.__open64
41.75 -5.8 35.93 perf-profile.children.cycles-pp.alloc_fd
36.32 -5.5 30.85 perf-profile.children.cycles-pp.__x64_sys_close
36.11 -5.5 30.64 perf-profile.children.cycles-pp.close_fd
37.40 -5.4 31.97 perf-profile.children.cycles-pp.__close
5.58 ± 2% -0.7 4.86 perf-profile.children.cycles-pp.put_unused_fd
2.80 ± 5% -0.4 2.40 ± 5% perf-profile.children.cycles-pp.__do_softirq
2.74 ± 6% -0.4 2.35 ± 5% perf-profile.children.cycles-pp.rcu_core
2.47 ± 6% -0.4 2.07 ± 6% perf-profile.children.cycles-pp.smpboot_thread_fn
2.46 ± 6% -0.4 2.06 ± 6% perf-profile.children.cycles-pp.run_ksoftirqd
2.73 ± 6% -0.4 2.34 ± 6% perf-profile.children.cycles-pp.rcu_do_batch
2.06 ± 5% -0.3 1.74 ± 5% perf-profile.children.cycles-pp.file_free_rcu
0.11 ± 28% -0.1 0.04 ± 75% perf-profile.children.cycles-pp.record__finish_output
0.65 -0.0 0.62 perf-profile.children.cycles-pp.hrtimer_interrupt
0.65 -0.0 0.63 perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.20 ± 3% -0.0 0.18 ± 4% perf-profile.children.cycles-pp.down_read
0.12 ± 4% -0.0 0.11 ± 5% perf-profile.children.cycles-pp.xfs_ilock
0.58 -0.0 0.57 perf-profile.children.cycles-pp.__hrtimer_run_queues
0.07 ± 6% +0.0 0.09 ± 4% perf-profile.children.cycles-pp.touch_atime
0.06 ± 9% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.__call_rcu_common
0.06 ± 14% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.get_obj_cgroup_from_current
0.00 +0.1 0.05 perf-profile.children.cycles-pp.walk_component
0.00 +0.1 0.05 perf-profile.children.cycles-pp.atime_needs_update
0.00 +0.1 0.05 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.00 +0.1 0.05 ± 7% perf-profile.children.cycles-pp.__snprintf_chk
0.47 ± 4% +0.1 0.54 ± 4% perf-profile.children.cycles-pp.kmem_cache_alloc
0.00 +0.1 0.10 ± 9% perf-profile.children.cycles-pp.folio_mark_accessed
0.00 +0.1 0.10 ± 4% perf-profile.children.cycles-pp.copy_user_short_string
0.06 ± 9% +0.2 0.23 ± 3% perf-profile.children.cycles-pp.__might_sleep
95.96 +0.3 96.30 perf-profile.children.cycles-pp.do_syscall_64
96.00 +0.3 96.34 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.12 ± 4% +2.8 2.96 ± 5% perf-profile.children.cycles-pp.__might_resched
0.08 ± 8% +2.9 2.96 ± 5% perf-profile.children.cycles-pp.__might_fault
4.73 ± 4% +9.0 13.76 ± 4% perf-profile.children.cycles-pp.copyout
5.23 ± 4% +11.7 16.96 ± 4% perf-profile.children.cycles-pp._copy_to_iter
5.27 ± 4% +11.8 17.07 ± 4% perf-profile.children.cycles-pp.copy_page_to_iter
6.44 ± 4% +12.2 18.63 ± 4% perf-profile.children.cycles-pp.xfs_file_buffered_read
6.28 ± 4% +12.2 18.49 ± 4% perf-profile.children.cycles-pp.filemap_read
6.60 ± 4% +12.2 18.81 ± 4% perf-profile.children.cycles-pp.xfs_file_read_iter
7.18 ± 3% +12.2 19.40 ± 4% perf-profile.children.cycles-pp.ksys_read
6.89 ± 3% +12.2 19.14 ± 4% perf-profile.children.cycles-pp.vfs_read
7.36 ± 3% +12.3 19.63 ± 4% perf-profile.children.cycles-pp.__libc_read
0.00 +13.7 13.70 ± 4% perf-profile.children.cycles-pp.copy_user_generic_unrolled
81.72 -11.8 69.91 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
2.05 ± 5% -0.3 1.74 ± 5% perf-profile.self.cycles-pp.file_free_rcu
0.43 ± 4% -0.2 0.26 ± 2% perf-profile.self.cycles-pp._copy_to_iter
0.62 ± 2% -0.1 0.56 ± 4% perf-profile.self.cycles-pp._raw_spin_lock
0.20 ± 6% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.fput
0.10 ± 7% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.alloc_empty_file
0.05 +0.0 0.07 ± 5% perf-profile.self.cycles-pp.kmem_cache_free
0.16 ± 3% +0.0 0.18 ± 4% perf-profile.self.cycles-pp.xfs_file_read_iter
0.09 ± 7% +0.0 0.12 ± 4% perf-profile.self.cycles-pp.kmem_cache_alloc
0.00 +0.1 0.05 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.folio_mark_accessed
0.02 ± 99% +0.1 0.10 ± 4% perf-profile.self.cycles-pp.copy_page_to_iter
0.00 +0.1 0.10 ± 4% perf-profile.self.cycles-pp.copy_user_short_string
0.05 ± 7% +0.2 0.22 ± 4% perf-profile.self.cycles-pp.__might_sleep
0.34 ± 5% +0.3 0.64 ± 3% perf-profile.self.cycles-pp.filemap_read
0.12 ± 4% +2.6 2.76 ± 5% perf-profile.self.cycles-pp.__might_resched
0.00 +13.5 13.52 ± 4% perf-profile.self.cycles-pp.copy_user_generic_unrolled
***************************************************************************************************
lkp-csl-2sp7: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/debian-x86_64-phoronix/lkp-csl-2sp7/aio-stress-1.1.1/phoronix-test-suite
commit:
20f3337d35 ("x86: don't use REP_GOOD or ERMS for small memory clearing")
adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies")
20f3337d350c4e1b adfcf4231b8cbc2d9c1e7bfaa96
---------------- ---------------------------
%stddev %change %stddev
\ | \
3087 +21.4% 3747 phoronix-test-suite.aio-stress.RandomWrite.mb_s
1415581 ± 3% -3.9% 1360489 proc-vmstat.numa_local
18259 ± 34% -86.3% 2501 ±135% proc-vmstat.numa_pte_updates
434049 -2.1% 425149 proc-vmstat.pgfault
5.43 +2.0 7.44 ± 4% perf-stat.i.cache-miss-rate%
3626266 +66.6% 6042976 ± 6% perf-stat.i.cache-misses
69186100 +3.7% 71773613 perf-stat.i.cache-references
1917 +3.1% 1977 perf-stat.i.context-switches
2.524e+08 +12.4% 2.837e+08 ± 2% perf-stat.i.dTLB-stores
69.45 -1.9 67.54 perf-stat.i.iTLB-load-miss-rate%
906278 -2.9% 879960 perf-stat.i.iTLB-load-misses
387815 +5.4% 408574 perf-stat.i.iTLB-loads
2.204e+09 +3.6% 2.283e+09 ± 2% perf-stat.i.instructions
2397 +5.2% 2522 ± 3% perf-stat.i.instructions-per-iTLB-miss
132455 ± 3% -9.0% 120495 ± 9% perf-stat.i.node-loads
245539 ± 4% +623.2% 1775744 ± 13% perf-stat.i.node-stores
3.81 ±109% -3.8 0.00 perf-profile.calltrace.cycles-pp.vsnprintf.seq_printf.show_interrupts.seq_read_iter.proc_reg_read_iter
4.98 ±126% -3.1 1.88 ±142% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
4.98 ± 81% -3.1 1.88 ±142% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
4.00 ±110% -2.1 1.88 ±142% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
4.00 ±110% -2.1 1.88 ±142% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
4.00 ±110% -2.1 1.88 ±142% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
3.81 ±109% -2.1 1.76 ±141% perf-profile.calltrace.cycles-pp.seq_printf.show_interrupts.seq_read_iter.proc_reg_read_iter.vfs_read
6.94 ± 90% -5.1 1.88 ±142% perf-profile.children.cycles-pp.asm_exc_page_fault
5.96 ± 98% -4.1 1.88 ±142% perf-profile.children.cycles-pp.exc_page_fault
3.81 ±109% -3.8 0.00 perf-profile.children.cycles-pp.vsnprintf
4.98 ±126% -3.1 1.88 ±142% perf-profile.children.cycles-pp.do_user_addr_fault
4.98 ±126% -3.1 1.88 ±142% perf-profile.children.cycles-pp.__handle_mm_fault
4.98 ±126% -3.1 1.88 ±142% perf-profile.children.cycles-pp.handle_mm_fault
3.32 ±101% -2.5 0.83 ±223% perf-profile.children.cycles-pp.dput
3.81 ±109% -2.1 1.76 ±141% perf-profile.children.cycles-pp.seq_printf
3.24 ±101% -3.2 0.00 perf-profile.self.cycles-pp.release_pages
***************************************************************************************************
lkp-csl-2sp7: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/input.i3d 129 Cells Per Direction [Intended For Small Systems]/debian-x86_64-phoronix/lkp-csl-2sp7/incompact3d-2.0.2/phoronix-test-suite
commit:
20f3337d35 ("x86: don't use REP_GOOD or ERMS for small memory clearing")
adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies")
20f3337d350c4e1b adfcf4231b8cbc2d9c1e7bfaa96
---------------- ---------------------------
%stddev %change %stddev
\ | \
10.21 +3.6% 10.58 phoronix-test-suite.incompact3d.input.i3d129CellsPerDirection.seconds
2634 +1.5% 2675 phoronix-test-suite.time.percent_of_cpu_this_job_got
125.60 +18.0% 148.24 phoronix-test-suite.time.system_time
1365 +2.2% 1395 phoronix-test-suite.time.user_time
2.45 +0.4 2.80 mpstat.cpu.all.sys%
4716584 +1.3% 4778690 proc-vmstat.numa_hint_faults
591104 +1.3% 598784 proc-vmstat.unevictable_pgs_scanned
1484 ± 2% -3.1% 1438 vmstat.io.bi
6485 -2.7% 6310 vmstat.system.cs
59.03 ± 13% -26.8% 43.20 ± 15% sched_debug.cfs_rq:/.load_avg.avg
203.26 ± 9% -17.4% 167.94 ± 11% sched_debug.cfs_rq:/.load_avg.stddev
41.33 ± 17% -33.5% 27.48 ± 29% sched_debug.cfs_rq:/.removed.load_avg.avg
195.00 ± 9% -17.4% 161.00 ± 14% sched_debug.cfs_rq:/.removed.load_avg.stddev
1256 ± 8% +13.9% 1430 ± 4% sched_debug.cpu.nr_switches.stddev
11.80 ± 76% -8.1 3.68 ±157% perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.70 ± 86% -5.0 3.68 ±157% perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
8.70 ± 86% -5.0 3.68 ±157% perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve
8.70 ± 86% -5.0 3.68 ±157% perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common
6.32 ±101% -3.8 2.56 ±223% perf-profile.calltrace.cycles-pp.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve
4.08 ±106% -2.8 1.28 ±223% perf-profile.calltrace.cycles-pp.__mmput.exec_mmap.begin_new_exec.load_elf_binary.search_binary_handler
4.08 ±106% -2.8 1.28 ±223% perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exec_mmap.begin_new_exec.load_elf_binary
3.82 ±109% -2.5 1.28 ±223% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
3.82 ±109% -2.5 1.28 ±223% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
3.88 ± 73% -2.0 1.91 ±141% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
3.88 ± 73% -2.0 1.91 ±141% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
4.08 ±106% -1.5 2.56 ±223% perf-profile.calltrace.cycles-pp.exec_mmap.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm
11.80 ± 76% -8.1 3.68 ±157% perf-profile.children.cycles-pp.bprm_execve
8.70 ± 86% -5.0 3.68 ±157% perf-profile.children.cycles-pp.exec_binprm
8.70 ± 86% -5.0 3.68 ±157% perf-profile.children.cycles-pp.search_binary_handler
8.70 ± 86% -5.0 3.68 ±157% perf-profile.children.cycles-pp.load_elf_binary
6.32 ±101% -3.8 2.56 ±223% perf-profile.children.cycles-pp.begin_new_exec
3.88 ± 73% -3.0 0.93 ±223% perf-profile.children.cycles-pp.seq_read_iter
3.82 ±109% -2.5 1.28 ±223% perf-profile.children.cycles-pp.zap_pte_range
3.82 ±109% -2.5 1.28 ±223% perf-profile.children.cycles-pp.unmap_vmas
3.82 ±109% -2.5 1.28 ±223% perf-profile.children.cycles-pp.unmap_page_range
3.82 ±109% -2.5 1.28 ±223% perf-profile.children.cycles-pp.zap_pmd_range
3.88 ± 73% -2.0 1.91 ±141% perf-profile.children.cycles-pp.ksys_read
3.88 ± 73% -2.0 1.91 ±141% perf-profile.children.cycles-pp.vfs_read
4.08 ±106% -1.5 2.56 ±223% perf-profile.children.cycles-pp.exec_mmap
4.167e+09 +5.0% 4.374e+09 ± 3% perf-stat.i.branch-instructions
36.75 +1.1 37.81 perf-stat.i.cache-miss-rate%
3.508e+08 +3.8% 3.643e+08 perf-stat.i.cache-misses
6338 -1.7% 6231 perf-stat.i.context-switches
7.944e+10 +1.4% 8.056e+10 perf-stat.i.cpu-cycles
1.236e+10 +2.9% 1.272e+10 perf-stat.i.dTLB-loads
0.09 -0.0 0.09 perf-stat.i.dTLB-store-miss-rate%
4.764e+09 +7.2% 5.109e+09 perf-stat.i.dTLB-stores
57.38 -1.1 56.31 perf-stat.i.iTLB-load-miss-rate%
1477373 +3.2% 1525290 perf-stat.i.iTLB-loads
4.633e+10 +2.8% 4.761e+10 perf-stat.i.instructions
831488 +1.3% 842562 perf-stat.i.metric.GHz
2.307e+08 +4.1% 2.401e+08 perf-stat.i.metric.M/sec
42.72 ± 2% -5.6 37.15 ± 2% perf-stat.i.node-load-miss-rate%
6830797 -47.3% 3600340 perf-stat.i.node-load-misses
20259060 -11.1% 18007910 perf-stat.i.node-loads
5175098 +4.9% 5428994 perf-stat.i.node-store-misses
64485672 +13.0% 72841822 perf-stat.i.node-stores
13.91 -2.4% 13.58 perf-stat.overall.MPKI
0.79 -0.0 0.75 ± 3% perf-stat.overall.branch-miss-rate%
54.45 +1.9 56.35 perf-stat.overall.cache-miss-rate%
226.37 -2.3% 221.12 perf-stat.overall.cycles-between-cache-misses
0.02 ± 2% -0.0 0.02 perf-stat.overall.dTLB-load-miss-rate%
0.07 -0.0 0.07 perf-stat.overall.dTLB-store-miss-rate%
53.87 -1.0 52.88 perf-stat.overall.iTLB-load-miss-rate%
26883 +3.5% 27836 perf-stat.overall.instructions-per-iTLB-miss
25.21 -8.6 16.66 perf-stat.overall.node-load-miss-rate%
7.43 -0.5 6.94 perf-stat.overall.node-store-miss-rate%
4.104e+09 +5.0% 4.309e+09 ± 3% perf-stat.ps.branch-instructions
3.456e+08 +3.8% 3.589e+08 perf-stat.ps.cache-misses
6219 -1.6% 6121 perf-stat.ps.context-switches
7.825e+10 +1.4% 7.936e+10 perf-stat.ps.cpu-cycles
1.218e+10 +2.9% 1.253e+10 perf-stat.ps.dTLB-loads
4.692e+09 +7.3% 5.033e+09 perf-stat.ps.dTLB-stores
1453670 +3.3% 1501333 perf-stat.ps.iTLB-loads
4.564e+10 +2.8% 4.691e+10 perf-stat.ps.instructions
6728891 -47.3% 3546712 perf-stat.ps.node-load-misses
19959395 -11.1% 17743546 perf-stat.ps.node-loads
5098250 +4.9% 5348852 perf-stat.ps.node-store-misses
63535396 +13.0% 71772875 perf-stat.ps.node-stores
2.592e+12 +4.8% 2.717e+12 perf-stat.total.instructions
***************************************************************************************************
lkp-icl-2sp5: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
cpu/gcc-11/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp5/getrandom/stress-ng/60s
commit:
20f3337d35 ("x86: don't use REP_GOOD or ERMS for small memory clearing")
adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies")
20f3337d350c4e1b adfcf4231b8cbc2d9c1e7bfaa96
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
2:3 -67% :3 dmesg.RIP:_copy_to_iter
1:3 -33% :3 dmesg.RIP:chacha_block_generic
:3 33% 1:3 dmesg.RIP:chacha_permute
:3 67% 2:3 dmesg.RIP:copy_user_fast_string
1:3 -33% :3 dmesg.WARNING:CPU:#PID:#at_arch/x86/evperf:interrupt_took_too_long(#>#),lowering_kernel.perf_event_max_sample_rate_to
%stddev %change %stddev
\ | \
7725056 ± 12% -14.5% 6605482 meminfo.DirectMap2M
3025 ± 3% +8.0% 3268 proc-vmstat.nr_page_table_pages
48.73 ± 3% -3.6% 46.99 ± 2% turbostat.RAMWatt
0.40 -4.2% 0.39 perf-stat.i.MPKI
173.78 +1.4% 176.18 perf-stat.i.cpu-migrations
0.00 -0.0 0.00 perf-stat.i.dTLB-store-miss-rate%
169.53 +1.0% 171.29 perf-stat.ps.cpu-migrations
7.99 -8.0 0.00 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.get_random_bytes_user.__x64_sys_getrandom
19.23 -0.1 19.09 perf-profile.calltrace.cycles-pp._copy_to_iter.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe
10.66 -0.1 10.54 perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64
75.51 +0.1 75.64 perf-profile.calltrace.cycles-pp.chacha_block_generic.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe
66.37 +0.2 66.52 perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64
0.00 +7.9 7.87 perf-profile.calltrace.cycles-pp.copy_user_fast_string.copyout._copy_to_iter.get_random_bytes_user.__x64_sys_getrandom
9.17 -9.2 0.00 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
20.05 -0.1 19.91 perf-profile.children.cycles-pp._copy_to_iter
10.11 -0.1 9.99 perf-profile.children.cycles-pp.copyout
0.09 ± 5% +0.0 0.10 perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.37 ± 3% +0.0 0.40 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
77.15 +0.1 77.26 perf-profile.children.cycles-pp.chacha_block_generic
67.64 +0.2 67.79 perf-profile.children.cycles-pp.chacha_permute
0.00 +9.0 9.05 perf-profile.children.cycles-pp.copy_user_fast_string
8.92 -8.9 0.00 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
6.23 -0.0 6.20 perf-profile.self.cycles-pp._copy_to_iter
0.08 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.31 +0.0 0.33 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
67.15 +0.1 67.29 perf-profile.self.cycles-pp.chacha_permute
0.00 +8.8 8.79 perf-profile.self.cycles-pp.copy_user_fast_string
0.25 ± 78% +117.4% 0.55 ± 2% sched_debug.cfs_rq:/.h_nr_running.avg
3187 ± 35% +183.7% 9043 ± 38% sched_debug.cfs_rq:/.load.avg
1296 ± 29% -40.1% 775.67 ± 22% sched_debug.cfs_rq:/.load_avg.max
205.63 ± 22% -33.8% 136.07 ± 4% sched_debug.cfs_rq:/.load_avg.stddev
1405113 ±137% +195.0% 4144508 sched_debug.cfs_rq:/.min_vruntime.avg
1433314 ±135% +192.5% 4191972 sched_debug.cfs_rq:/.min_vruntime.max
977377 ±137% +194.8% 2880861 sched_debug.cfs_rq:/.min_vruntime.min
0.25 ± 79% +117.7% 0.54 sched_debug.cfs_rq:/.nr_running.avg
853.67 ± 28% -40.0% 512.00 sched_debug.cfs_rq:/.removed.load_avg.max
163.07 ± 30% -36.6% 103.40 ± 23% sched_debug.cfs_rq:/.removed.load_avg.stddev
372.81 ± 48% +71.4% 638.84 sched_debug.cfs_rq:/.runnable_avg.avg
1161 ± 6% +36.9% 1589 ± 9% sched_debug.cfs_rq:/.runnable_avg.max
132.33 ±141% +244.1% 455.33 ± 11% sched_debug.cfs_rq:/.runnable_avg.min
4861 ±272% +1135.9% 60082 ± 20% sched_debug.cfs_rq:/.spread0.avg
32740 ± 92% +230.0% 108041 ± 11% sched_debug.cfs_rq:/.spread0.max
371.90 ± 48% +70.2% 632.85 sched_debug.cfs_rq:/.util_avg.avg
99.71 ±110% +155.3% 254.61 sched_debug.cfs_rq:/.util_est_enqueued.avg
88286 ±137% +215.2% 278294 ± 4% sched_debug.cpu.avg_idle.min
6.10 ± 37% +58.2% 9.65 ± 17% sched_debug.cpu.clock.stddev
1268 ± 99% +147.4% 3137 sched_debug.cpu.curr->pid.avg
4681 ± 15% +22.9% 5754 sched_debug.cpu.curr->pid.max
972.83 ±141% +200.4% 2922 sched_debug.cpu.curr->pid.min
644790 ± 5% -10.0% 580100 sched_debug.cpu.max_idle_balance_cost.max
21765 ± 26% -40.9% 12856 ± 11% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 ± 26% +51.9% 0.00 ± 13% sched_debug.cpu.next_balance.stddev
0.25 ± 80% +123.8% 0.55 ± 2% sched_debug.cpu.nr_running.avg
1.00 +50.0% 1.50 sched_debug.cpu.nr_running.max
232.33 ± 21% +56.1% 362.67 sched_debug.cpu.nr_switches.min
***************************************************************************************************
lkp-csl-2sp7: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/debian-x86_64-phoronix/lkp-csl-2sp7/ior-1.1.1/phoronix-test-suite
commit:
20f3337d35 ("x86: don't use REP_GOOD or ERMS for small memory clearing")
adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies")
20f3337d350c4e1b adfcf4231b8cbc2d9c1e7bfaa96
---------------- ---------------------------
%stddev %change %stddev
\ | \
2762 +24.7% 3444 phoronix-test-suite.ior.2MB./opt/rootfs.mb_s
2737 +25.7% 3441 phoronix-test-suite.ior.2MB.DefaultTestDirectory.mb_s
40826 ± 13% -21.1% 32218 ± 11% sched_debug.cfs_rq:/.min_vruntime.avg
51961 ± 10% -18.8% 42191 ± 7% sched_debug.cfs_rq:/.min_vruntime.max
4451 ± 11% -32.1% 3022 ± 18% sched_debug.cfs_rq:/.min_vruntime.stddev
4452 ± 11% -32.0% 3026 ± 18% sched_debug.cfs_rq:/.spread0.stddev
811283 ± 26% -22.1% 631725 ± 6% sched_debug.cpu.max_idle_balance_cost.max
5.24 ±124% -3.6 1.67 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
5.24 ±124% -3.6 1.67 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
5.24 ±124% -3.6 1.67 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
5.24 ±124% -3.6 1.67 ±223% perf-profile.children.cycles-pp.asm_exc_page_fault
5.24 ±124% -3.6 1.67 ±223% perf-profile.children.cycles-pp.exc_page_fault
5.24 ±124% -3.6 1.67 ±223% perf-profile.children.cycles-pp.do_user_addr_fault
3939447 ± 2% +5.8% 4169072 ± 3% perf-stat.i.cache-misses
1787 -3.2% 1730 perf-stat.i.cycles-between-cache-misses
67.42 -1.9 65.54 perf-stat.i.iTLB-load-miss-rate%
1009539 -3.7% 972226 perf-stat.i.iTLB-load-misses
644541 ± 2% +9.1% 703307 ± 4% perf-stat.i.iTLB-loads
142633 ± 3% +13.0% 161225 ± 8% perf-stat.i.node-store-misses
205884 ± 4% +17.7% 242357 ± 10% perf-stat.i.node-stores
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
View attachment "config-6.3.0-rc7-00003-gadfcf4231b8c" of type "text/plain" (157283 bytes)
View attachment "job-script" of type "text/plain" (7974 bytes)
View attachment "job.yaml" of type "text/plain" (5645 bytes)
View attachment "reproduce" of type "text/plain" (443 bytes)
Powered by blists - more mailing lists