[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202504211337.3c954295-lkp@intel.com>
Date: Mon, 21 Apr 2025 13:48:27 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Josh Poimboeuf <jpoimboe@...nel.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>, Pawan Gupta
<pawan.kumar.gupta@...ux.intel.com>, Amit Shah <amit.shah@....com>, "Nikolay
Borisov" <nik.borisov@...e.com>, <oliver.sang@...el.com>
Subject: [linus:master] [x86/bugs] 27ce8299bc: netperf.Throughput_tps 23.1%
improvement
Hello,
kernel test robot noticed a 23.1% improvement of netperf.Throughput_tps on:
commit: 27ce8299bc1ec6df8306073785ff82b30b3cc5ee ("x86/bugs: Don't fill RSB on context switch with eIBRS")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: netperf
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
parameters:
ip: ipv4
runtime: 300s
nr_threads: 50%
cluster: cs-localhost
test: TCP_RR
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+--------------------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.sem.sem_wait_calls_per_sec 10.0% improvement |
| test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=sem |
| | testtime=60s |
+------------------+--------------------------------------------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_Mbps 21.7% improvement |
| test machine | 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V CPU @ 2.4GHz (Sapphire Rapids) with 384G memory |
| test parameters | cluster=cs-localhost |
| | cpufreq_governor=performance |
| | ip=ipv4 |
| | nr_threads=50% |
| | runtime=300s |
| | send_size=10K |
| | test=SCTP_STREAM_MANY |
+------------------+--------------------------------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250421/202504211337.3c954295-lkp@intel.com
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-9.4/50%/debian-12-x86_64-20240206.cgz/300s/igk-spr-2sp3/TCP_RR/netperf
commit:
18bae0dfec ("x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline")
27ce8299bc ("x86/bugs: Don't fill RSB on context switch with eIBRS")
18bae0dfec15b24e 27ce8299bc1ec6df8306073785f
---------------- ---------------------------
%stddev %change %stddev
\ | \
3.017e+09 +23.0% 3.712e+09 cpuidle..usage
2.10 -0.3 1.78 ± 3% mpstat.cpu.all.usr%
59598 ± 76% +94.3% 115800 ± 48% numa-numastat.node0.other_node
59598 ± 76% +94.3% 115800 ± 48% numa-vmstat.node0.numa_other
19701282 +23.3% 24284882 vmstat.system.cs
18.71 ± 71% +340.8% 82.47 ± 55% perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
1008 ± 12% -32.2% 683.24 ± 19% perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
12.40 ± 23% -62.4% 4.67 ±101% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
63.40 ± 18% -33.0% 42.50 ± 21% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
70.40 ± 13% -30.4% 49.00 ± 10% perf-sched.wait_and_delay.count.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
152.80 ± 15% -35.0% 99.33 ± 19% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
69.16 ± 47% +167.8% 185.23 ± 40% perf-sched.wait_and_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
5018731 +23.1% 6177264 netperf.ThroughputBoth_total_tps
52278 +23.1% 64346 netperf.ThroughputBoth_tps
5018731 +23.1% 6177264 netperf.Throughput_total_tps
52278 +23.1% 64346 netperf.Throughput_tps
424880 ± 4% +56.5% 665128 ± 7% netperf.time.involuntary_context_switches
2787 +4.7% 2916 netperf.time.percent_of_cpu_this_job_got
7940 +5.9% 8407 netperf.time.system_time
453.82 -17.1% 376.38 netperf.time.user_time
1.504e+09 +23.1% 1.852e+09 netperf.time.voluntary_context_switches
1.506e+09 +23.1% 1.853e+09 netperf.workload
3837554 ± 3% +14.4% 4388386 ± 6% sched_debug.cfs_rq:/.avg_vruntime.max
0.43 ± 2% +7.4% 0.46 ± 3% sched_debug.cfs_rq:/.h_nr_queued.avg
0.43 ± 2% +7.2% 0.46 ± 3% sched_debug.cfs_rq:/.h_nr_runnable.avg
318469 ± 8% +21.0% 385456 ± 5% sched_debug.cfs_rq:/.left_deadline.avg
3424819 +11.4% 3813991 ± 2% sched_debug.cfs_rq:/.left_deadline.max
947183 ± 3% +14.5% 1084809 ± 2% sched_debug.cfs_rq:/.left_deadline.stddev
318450 ± 8% +21.0% 385434 ± 5% sched_debug.cfs_rq:/.left_vruntime.avg
3424623 +11.4% 3813781 ± 2% sched_debug.cfs_rq:/.left_vruntime.max
947126 ± 3% +14.5% 1084745 ± 2% sched_debug.cfs_rq:/.left_vruntime.stddev
3837554 ± 3% +14.4% 4388386 ± 6% sched_debug.cfs_rq:/.min_vruntime.max
0.42 ± 2% +7.0% 0.45 ± 3% sched_debug.cfs_rq:/.nr_queued.avg
318450 ± 8% +21.0% 385434 ± 5% sched_debug.cfs_rq:/.right_vruntime.avg
3424623 +11.4% 3813781 ± 2% sched_debug.cfs_rq:/.right_vruntime.max
947126 ± 3% +14.5% 1084745 ± 2% sched_debug.cfs_rq:/.right_vruntime.stddev
131.48 ± 2% +15.7% 152.16 ± 2% sched_debug.cfs_rq:/.util_est.avg
145.54 +9.8% 159.78 ± 2% sched_debug.cfs_rq:/.util_est.stddev
4847 ± 11% -25.5% 3613 ± 9% sched_debug.cpu.avg_idle.min
15457316 +22.7% 18969638 sched_debug.cpu.nr_switches.avg
16517179 +24.8% 20609615 sched_debug.cpu.nr_switches.max
13827971 ± 3% +15.0% 15900842 ± 5% sched_debug.cpu.nr_switches.min
362618 ± 7% +75.8% 637634 ± 10% sched_debug.cpu.nr_switches.stddev
4.113e+10 +19.5% 4.916e+10 perf-stat.i.branch-instructions
0.58 -0.3 0.27 ± 2% perf-stat.i.branch-miss-rate%
2.3e+08 -45.7% 1.248e+08 ± 2% perf-stat.i.branch-misses
1.565e+09 +15.1% 1.8e+09 perf-stat.i.cache-references
19900050 +23.2% 24525361 perf-stat.i.context-switches
1.56 -12.6% 1.36 perf-stat.i.cpi
3.248e+11 +6.4% 3.454e+11 perf-stat.i.cpu-cycles
115466 ± 3% -11.7% 101909 ± 6% perf-stat.i.cpu-migrations
2.098e+11 +21.3% 2.545e+11 perf-stat.i.instructions
0.65 +14.1% 0.74 perf-stat.i.ipc
103.90 +23.1% 127.90 perf-stat.i.metric.K/sec
0.56 -0.3 0.25 ± 2% perf-stat.overall.branch-miss-rate%
1.55 -12.3% 1.36 perf-stat.overall.cpi
0.65 +14.1% 0.74 perf-stat.overall.ipc
42083 -1.7% 41384 perf-stat.overall.path-length
4.1e+10 +19.5% 4.9e+10 perf-stat.ps.branch-instructions
2.292e+08 -45.7% 1.244e+08 ± 2% perf-stat.ps.branch-misses
1.559e+09 +15.1% 1.794e+09 perf-stat.ps.cache-references
19830610 +23.3% 24442385 perf-stat.ps.context-switches
3.237e+11 +6.4% 3.443e+11 perf-stat.ps.cpu-cycles
115199 ± 3% -11.7% 101743 ± 6% perf-stat.ps.cpu-migrations
2.091e+11 +21.4% 2.537e+11 perf-stat.ps.instructions
6.336e+13 +21.0% 7.669e+13 perf-stat.total.instructions
***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sem/stress-ng/60s
commit:
18bae0dfec ("x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline")
27ce8299bc ("x86/bugs: Don't fill RSB on context switch with eIBRS")
18bae0dfec15b24e 27ce8299bc1ec6df8306073785f
---------------- ---------------------------
%stddev %change %stddev
\ | \
3242 ±197% +548.0% 21014 ± 81% numa-meminfo.node0.AnonHugePages
1.729e+08 +10.3% 1.908e+08 ± 2% vmstat.system.cs
3.00 ± 50% +122.1% 6.67 ± 37% sched_debug.cfs_rq:/.removed.runnable_avg.avg
25.78 ± 25% +55.0% 39.95 ± 18% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
3.00 ± 50% +122.1% 6.67 ± 37% sched_debug.cfs_rq:/.removed.util_avg.avg
25.78 ± 25% +55.0% 39.95 ± 18% sched_debug.cfs_rq:/.removed.util_avg.stddev
26219071 +11.9% 29350573 sched_debug.cpu.nr_switches.max
1.83 ±208% +1375.8% 27.06 ± 57% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
106.79 ±219% +711.3% 866.45 ± 39% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
3.68 ±212% +1369.9% 54.11 ± 57% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
213.70 ±222% +710.9% 1732 ± 39% perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
1.90 ±208% +1321.6% 27.05 ± 57% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
109.49 ±219% +691.3% 866.45 ± 39% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
1.296e+10 ± 2% +8.7% 1.409e+10 stress-ng.sem.ops
2.16e+08 ± 2% +8.7% 2.348e+08 stress-ng.sem.ops_per_sec
302558 +10.0% 332730 stress-ng.sem.sem_timedwait_calls_per_sec
301686 +10.0% 331735 stress-ng.sem.sem_trywait_calls_per_sec
302524 +10.0% 332698 stress-ng.sem.sem_wait_calls_per_sec
1.09e+10 +10.3% 1.202e+10 ± 2% stress-ng.time.involuntary_context_switches
2765 +2.8% 2842 stress-ng.time.user_time
1.399e+11 +2.0% 1.426e+11 perf-stat.i.branch-instructions
1.04 -0.7 0.30 perf-stat.i.branch-miss-rate%
1.42e+09 -71.5% 4.04e+08 perf-stat.i.branch-misses
31.24 ± 11% +3.4 34.63 ± 5% perf-stat.i.cache-miss-rate%
1.796e+08 +10.2% 1.979e+08 ± 2% perf-stat.i.context-switches
0.99 -5.9% 0.93 perf-stat.i.cpi
6.499e+11 +6.1% 6.894e+11 perf-stat.i.instructions
1.02 +6.1% 1.08 perf-stat.i.ipc
1.01 -0.7 0.28 perf-stat.overall.branch-miss-rate%
0.98 -5.7% 0.93 perf-stat.overall.cpi
1.02 +6.0% 1.08 perf-stat.overall.ipc
1.376e+11 +2.0% 1.403e+11 perf-stat.ps.branch-instructions
1.39e+09 -71.5% 3.964e+08 perf-stat.ps.branch-misses
1.757e+08 +10.3% 1.939e+08 ± 2% perf-stat.ps.context-switches
6.396e+11 +6.0% 6.783e+11 perf-stat.ps.instructions
3.925e+13 +5.9% 4.156e+13 perf-stat.total.instructions
***************************************************************************************************
igk-spr-2sp3: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/send_size/tbox_group/test/testcase:
cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-9.4/50%/debian-12-x86_64-20240206.cgz/300s/10K/igk-spr-2sp3/SCTP_STREAM_MANY/netperf
commit:
18bae0dfec ("x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline")
27ce8299bc ("x86/bugs: Don't fill RSB on context switch with eIBRS")
18bae0dfec15b24e 27ce8299bc1ec6df8306073785f
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.383e+08 ± 3% +24.4% 1.721e+08 ± 2% cpuidle..usage
154471 ± 4% +30.1% 200987 ± 6% meminfo.Shmem
146073 ± 5% +30.0% 189902 ± 8% numa-meminfo.node1.Shmem
5743 ± 4% +20.0% 6889 ± 6% perf-c2c.HITM.local
3.143e+08 ± 7% +28.4% 4.035e+08 ± 6% numa-numastat.node1.local_node
3.145e+08 ± 7% +28.4% 4.037e+08 ± 6% numa-numastat.node1.numa_hit
0.03 +0.0 0.03 mpstat.cpu.all.irq%
0.66 ± 3% +0.1 0.80 ± 2% mpstat.cpu.all.soft%
3.31 ± 3% +0.7 3.97 ± 2% mpstat.cpu.all.sys%
36537 ± 5% +30.0% 47492 ± 7% numa-vmstat.node1.nr_shmem
3.145e+08 ± 7% +28.4% 4.037e+08 ± 6% numa-vmstat.node1.numa_hit
3.143e+08 ± 7% +28.4% 4.035e+08 ± 6% numa-vmstat.node1.numa_local
9.41 ± 4% +10.3% 10.37 ± 6% vmstat.procs.r
890417 ± 3% +24.6% 1109662 ± 2% vmstat.system.cs
24697 ± 3% +12.3% 27728 ± 2% vmstat.system.in
227082 +5.2% 238914 proc-vmstat.nr_active_anon
917466 +1.3% 929058 proc-vmstat.nr_file_pages
38644 ± 4% +30.0% 50238 ± 6% proc-vmstat.nr_shmem
227082 +5.2% 238914 proc-vmstat.nr_zone_active_anon
6.489e+08 ± 3% +21.7% 7.896e+08 ± 2% proc-vmstat.numa_hit
6.487e+08 ± 3% +21.7% 7.892e+08 ± 2% proc-vmstat.numa_local
3.735e+09 ± 3% +21.7% 4.547e+09 ± 2% proc-vmstat.pgalloc_normal
3.735e+09 ± 3% +21.7% 4.547e+09 ± 2% proc-vmstat.pgfree
68149 ± 6% +31.8% 89840 ± 3% sched_debug.cfs_rq:/.avg_vruntime.avg
28973 ± 16% +42.0% 41141 ± 11% sched_debug.cfs_rq:/.avg_vruntime.min
19941 ± 8% +22.6% 24454 ± 4% sched_debug.cfs_rq:/.avg_vruntime.stddev
68149 ± 6% +31.8% 89840 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg
28973 ± 16% +42.0% 41141 ± 11% sched_debug.cfs_rq:/.min_vruntime.min
19941 ± 8% +22.6% 24454 ± 4% sched_debug.cfs_rq:/.min_vruntime.stddev
697844 ± 3% +24.1% 866113 ± 2% sched_debug.cpu.nr_switches.avg
1574988 ± 4% +18.8% 1870629 ± 6% sched_debug.cpu.nr_switches.max
281842 ± 4% +16.0% 327055 ± 6% sched_debug.cpu.nr_switches.stddev
1432 ± 3% +21.7% 1742 ± 2% netperf.ThroughputBoth_Mbps
137495 ± 3% +21.7% 167304 ± 2% netperf.ThroughputBoth_total_Mbps
1432 ± 3% +21.7% 1742 ± 2% netperf.Throughput_Mbps
137495 ± 3% +21.7% 167304 ± 2% netperf.Throughput_total_Mbps
27575 ± 2% +22.0% 33655 ± 2% netperf.time.involuntary_context_switches
363.00 ± 3% +21.7% 441.67 ± 2% netperf.time.percent_of_cpu_this_job_got
1072 ± 3% +21.6% 1304 ± 2% netperf.time.system_time
23.66 ± 2% +20.2% 28.44 netperf.time.user_time
933935 -9.5% 844837 netperf.time.voluntary_context_switches
5.035e+08 ± 3% +21.7% 6.127e+08 ± 2% netperf.workload
0.04 ± 19% +45.1% 0.05 ± 15% perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
20.08 ± 6% -28.0% 14.45 ± 7% perf-sched.total_wait_and_delay.average.ms
130158 ± 6% +39.6% 181663 ± 7% perf-sched.total_wait_and_delay.count.ms
20.06 ± 6% -28.0% 14.43 ± 7% perf-sched.total_wait_time.average.ms
349.52 ± 19% -32.0% 237.78 ± 7% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
4.13 ± 7% -31.8% 2.82 ± 8% perf-sched.wait_and_delay.avg.ms.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg
46.67 ± 16% +42.5% 66.50 ± 11% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
112061 ± 7% +46.7% 164429 ± 8% perf-sched.wait_and_delay.count.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg
349.50 ± 19% -32.0% 237.77 ± 7% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
4.12 ± 7% -31.8% 2.81 ± 8% perf-sched.wait_time.avg.ms.schedule_timeout.sctp_skb_recv_datagram.sctp_recvmsg.inet_recvmsg
5.554e+09 ± 3% +20.2% 6.679e+09 ± 2% perf-stat.i.branch-instructions
0.48 -0.1 0.37 perf-stat.i.branch-miss-rate%
27633900 ± 2% -9.7% 24960530 perf-stat.i.branch-misses
7520666 ± 3% +26.9% 9545462 ± 4% perf-stat.i.cache-misses
7.246e+08 ± 3% +21.5% 8.805e+08 ± 2% perf-stat.i.cache-references
899481 ± 3% +24.6% 1120839 ± 2% perf-stat.i.context-switches
1.18 -1.9% 1.16 perf-stat.i.cpi
3.456e+10 ± 3% +18.9% 4.111e+10 perf-stat.i.cpu-cycles
376.24 +3.5% 389.52 perf-stat.i.cpu-migrations
4676 -6.5% 4374 ± 4% perf-stat.i.cycles-between-cache-misses
2.928e+10 ± 3% +21.0% 3.542e+10 ± 2% perf-stat.i.instructions
0.85 +1.9% 0.86 perf-stat.i.ipc
4.68 ± 3% +24.7% 5.84 ± 2% perf-stat.i.metric.K/sec
0.50 -0.1 0.37 perf-stat.overall.branch-miss-rate%
1.18 -1.7% 1.16 perf-stat.overall.cpi
4598 -6.1% 4315 ± 3% perf-stat.overall.cycles-between-cache-misses
0.85 +1.7% 0.86 perf-stat.overall.ipc
5.536e+09 ± 3% +20.2% 6.657e+09 ± 2% perf-stat.ps.branch-instructions
27543972 ± 2% -9.7% 24877296 perf-stat.ps.branch-misses
7492755 ± 3% +26.9% 9510891 ± 4% perf-stat.ps.cache-misses
7.222e+08 ± 3% +21.5% 8.776e+08 ± 2% perf-stat.ps.cache-references
896552 ± 3% +24.6% 1117213 ± 2% perf-stat.ps.context-switches
3.445e+10 ± 3% +18.9% 4.097e+10 perf-stat.ps.cpu-cycles
374.91 +3.5% 388.20 perf-stat.ps.cpu-migrations
2.918e+10 ± 3% +21.0% 3.53e+10 ± 2% perf-stat.ps.instructions
8.832e+12 ± 3% +21.0% 1.069e+13 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists