[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iLoBdbV55Ws0KJaizgmmcG1YYXKzT9iTM+y07bBTQ9SvQ@mail.gmail.com>
Date: Mon, 31 Jul 2023 16:00:17 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com,
Linux Memory Management List <linux-mm@...ck.org>, Jakub Kicinski <kuba@...nel.org>,
Soheil Hassas Yeganeh <soheil@...gle.com>, netdev@...r.kernel.org, ying.huang@...el.com,
feng.tang@...el.com, fengwei.yin@...el.com
Subject: Re: [linux-next:master] [tcp] dfa2f04833: stress-ng.sock.ops_per_sec
-7.3% regression
On Mon, Jul 31, 2023 at 3:35 PM kernel test robot <oliver.sang@...el.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a -7.3% regression of stress-ng.sock.ops_per_sec on:
>
>
> commit: dfa2f0483360d4d6f2324405464c9f281156bd87 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: stress-ng
> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
> parameters:
>
TCP 'performance' on some tests depends on initial values for
tcp_rmem[] (and many others sysctl)
The commit changed some initial RWIN values for some MTU/MSS setings,
it is next to impossible to make a change that is a win for all cases.
If you care about a particular real workload, not a synthetic benchmark,
I think you should give us more details.
Thanks.
> nr_threads: 1
> disk: 1HDD
> testtime: 60s
> fs: ext4
> class: os
> test: sock
> cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202307312121.d8479e5e-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/sock/stress-ng/60s
>
> commit:
> 63c8778d91 ("Merge branch 'net-mana-fix-doorbell-access-for-receive-queues'")
> dfa2f04833 ("tcp: get rid of sysctl_tcp_adv_win_scale")
>
> 63c8778d9149d5df dfa2f0483360d4d6f2324405464
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 8094125 +21.5% 9832824 ą 18% cpuidle..usage
> 5.04 -6.1% 4.73 ą 10% iostat.cpu.system
> 330990 ą 2% -32.3% 223958 ą 3% turbostat.C1
> 4685666 +22.3% 5729557 turbostat.POLL
> 23600 ą 8% +51.9% 35849 ą 25% sched_debug.cfs_rq:/.min_vruntime.max
> 4907 ą 7% +44.2% 7073 ą 45% sched_debug.cfs_rq:/.min_vruntime.stddev
> 4911 ą 7% +44.1% 7075 ą 45% sched_debug.cfs_rq:/.spread0.stddev
> 43.08 ą 15% -41.0% 25.42 ą 32% perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 269948 ą 2% +8.1% 291932 ą 2% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
> 43.08 ą 15% -41.0% 25.42 ą 32% perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 0.02 ą 31% +35.0% 0.03 ą 5% perf-sched.wait_time.max.ms.__cond_resched.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.__sys_sendto
> 93552 -7.3% 86706 stress-ng.sock.ops
> 1559 -7.3% 1445 stress-ng.sock.ops_per_sec
> 139.17 -3.4% 134.50 stress-ng.time.percent_of_cpu_this_job_got
> 5092570 +18.6% 6039727 stress-ng.time.voluntary_context_switches
> 1.45 +1.4 2.83 ą105% perf-stat.i.branch-miss-rate%
> 1620951 ą 30% -39.7% 977769 ą 37% perf-stat.i.dTLB-store-misses
> 911.68 -3.6% 878.55 perf-stat.i.instructions-per-iTLB-miss
> 1.54 +0.2 1.69 ą 15% perf-stat.overall.branch-miss-rate%
> 0.16 ą 30% -0.1 0.10 ą 22% perf-stat.overall.dTLB-store-miss-rate%
> 742.16 -4.3% 710.16 perf-stat.overall.instructions-per-iTLB-miss
> 1595258 ą 30% -39.6% 962800 ą 37% perf-stat.ps.dTLB-store-misses
> 67709 +12.6% 76211 ą 14% proc-vmstat.nr_active_anon
> 73849 +11.0% 81975 ą 11% proc-vmstat.nr_shmem
> 67709 +12.6% 76211 ą 14% proc-vmstat.nr_zone_active_anon
> 6320969 -6.7% 5895784 proc-vmstat.numa_hit
> 6314894 -6.8% 5885708 proc-vmstat.numa_local
> 102508 +5.9% 108525 proc-vmstat.pgactivate
> 48068383 -7.3% 44558110 proc-vmstat.pgalloc_normal
> 47937851 -7.3% 44421205 proc-vmstat.pgfree
> 0.70 ą 14% +0.2 0.88 ą 14% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry
> 0.48 ą 47% +0.2 0.70 ą 14% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 2.76 ą 9% +0.5 3.30 ą 2% perf-profile.calltrace.cycles-pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
> 0.39 ą 72% +0.6 0.95 ą 24% perf-profile.calltrace.cycles-pp.try_to_wake_up.__wake_up_common.__wake_up_common_lock.sock_def_readable.tcp_data_queue
> 3.32 ą 10% +0.7 4.00 perf-profile.calltrace.cycles-pp.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core
> 6.88 ą 7% +0.8 7.71 ą 2% perf-profile.calltrace.cycles-pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action.__do_softirq
> 7.18 ą 7% +0.8 8.02 ą 2% perf-profile.calltrace.cycles-pp.__napi_poll.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip
> 7.16 ą 7% +0.9 8.02 ą 2% perf-profile.calltrace.cycles-pp.process_backlog.__napi_poll.net_rx_action.__do_softirq.do_softirq
> 8.90 ą 6% +1.0 9.89 perf-profile.calltrace.cycles-pp.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit
> 9.37 ą 6% +1.0 10.40 perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb
> 9.33 ą 6% +1.0 10.37 perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit
> 9.26 ą 6% +1.0 10.30 perf-profile.calltrace.cycles-pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2
> 2.48 ą 17% +1.3 3.82 ą 2% perf-profile.calltrace.cycles-pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_sendmsg_locked
> 2.61 ą 17% +1.3 3.96 ą 2% perf-profile.calltrace.cycles-pp.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_sendmsg_locked.tcp_sendmsg
> 0.80 ą 15% -0.4 0.43 ą 10% perf-profile.children.cycles-pp.tcp_rcv_space_adjust
> 1.35 ą 5% -0.2 1.19 ą 6% perf-profile.children.cycles-pp.__entry_text_start
> 0.56 ą 15% -0.2 0.40 ą 11% perf-profile.children.cycles-pp.__x64_sys_connect
> 0.56 ą 15% -0.2 0.40 ą 11% perf-profile.children.cycles-pp.__sys_connect
> 0.55 ą 14% -0.2 0.40 ą 12% perf-profile.children.cycles-pp.inet_stream_connect
> 0.55 ą 15% -0.1 0.40 ą 12% perf-profile.children.cycles-pp.__inet_stream_connect
> 0.38 ą 11% -0.1 0.28 ą 21% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> 0.44 ą 9% -0.1 0.33 ą 13% perf-profile.children.cycles-pp.__close
> 0.37 ą 12% -0.1 0.27 ą 20% perf-profile.children.cycles-pp.task_work_run
> 0.77 ą 5% -0.1 0.68 ą 8% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 0.34 ą 12% -0.1 0.26 ą 21% perf-profile.children.cycles-pp.__fput
> 0.31 ą 11% -0.1 0.23 ą 18% perf-profile.children.cycles-pp.tcp_v4_connect
> 0.22 ą 14% -0.1 0.16 ą 22% perf-profile.children.cycles-pp.__sock_release
> 0.22 ą 14% -0.1 0.16 ą 22% perf-profile.children.cycles-pp.sock_close
> 0.23 ą 19% -0.1 0.16 ą 14% perf-profile.children.cycles-pp.tcp_try_coalesce
> 0.09 ą 14% -0.0 0.05 ą 48% perf-profile.children.cycles-pp.new_inode_pseudo
> 0.07 ą 12% -0.0 0.04 ą 72% perf-profile.children.cycles-pp.__ns_get_path
> 0.17 ą 8% +0.0 0.22 ą 8% perf-profile.children.cycles-pp.ip_send_check
> 0.23 ą 7% +0.0 0.28 ą 7% perf-profile.children.cycles-pp.ip_local_out
> 0.09 ą 22% +0.0 0.14 ą 10% perf-profile.children.cycles-pp.available_idle_cpu
> 0.22 ą 9% +0.1 0.26 ą 7% perf-profile.children.cycles-pp.__ip_local_out
> 0.46 ą 11% +0.1 0.56 ą 4% perf-profile.children.cycles-pp.ttwu_queue_wakelist
> 0.92 ą 3% +0.1 1.06 ą 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 7.10 ą 2% +0.7 7.76 ą 3% perf-profile.children.cycles-pp.tcp_v4_rcv
> 7.21 ą 2% +0.7 7.90 ą 3% perf-profile.children.cycles-pp.ip_protocol_deliver_rcu
> 7.42 ą 2% +0.7 8.12 ą 3% perf-profile.children.cycles-pp.ip_local_deliver_finish
> 8.00 ą 2% +0.7 8.71 ą 2% perf-profile.children.cycles-pp.__netif_receive_skb_one_core
> 8.34 ą 2% +0.7 9.06 ą 2% perf-profile.children.cycles-pp.__napi_poll
> 8.32 ą 2% +0.7 9.05 ą 2% perf-profile.children.cycles-pp.process_backlog
> 11.71 ą 3% +0.9 12.63 ą 2% perf-profile.children.cycles-pp.__dev_queue_xmit
> 13.86 ą 2% +0.9 14.78 ą 2% perf-profile.children.cycles-pp.__tcp_transmit_skb
> 11.92 ą 2% +0.9 12.86 ą 2% perf-profile.children.cycles-pp.ip_finish_output2
> 10.05 ą 3% +0.9 10.99 ą 2% perf-profile.children.cycles-pp.net_rx_action
> 12.66 ą 2% +1.0 13.62 ą 2% perf-profile.children.cycles-pp.__ip_queue_xmit
> 10.56 ą 3% +1.0 11.53 ą 2% perf-profile.children.cycles-pp.do_softirq
> 10.82 ą 3% +1.0 11.80 ą 2% perf-profile.children.cycles-pp.__local_bh_enable_ip
> 10.94 ą 4% +1.0 11.94 ą 2% perf-profile.children.cycles-pp.__do_softirq
> 0.52 ą 21% -0.4 0.16 ą 16% perf-profile.self.cycles-pp.tcp_rcv_space_adjust
> 0.62 ą 7% -0.1 0.48 ą 10% perf-profile.self.cycles-pp.tcp_sendmsg
> 0.63 ą 5% -0.1 0.55 ą 7% perf-profile.self.cycles-pp.__entry_text_start
> 0.10 ą 15% +0.0 0.14 ą 13% perf-profile.self.cycles-pp.schedule_timeout
> 0.10 ą 20% +0.0 0.14 ą 16% perf-profile.self.cycles-pp.enqueue_entity
> 0.08 ą 22% +0.1 0.14 ą 11% perf-profile.self.cycles-pp.available_idle_cpu
> 0.37 ą 8% +0.1 0.44 ą 4% perf-profile.self.cycles-pp.net_rx_action
> 0.92 ą 3% +0.1 1.06 ą 5% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
>
Powered by blists - more mailing lists