lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iLoBdbV55Ws0KJaizgmmcG1YYXKzT9iTM+y07bBTQ9SvQ@mail.gmail.com>
Date: Mon, 31 Jul 2023 16:00:17 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, 
	Linux Memory Management List <linux-mm@...ck.org>, Jakub Kicinski <kuba@...nel.org>, 
	Soheil Hassas Yeganeh <soheil@...gle.com>, netdev@...r.kernel.org, ying.huang@...el.com, 
	feng.tang@...el.com, fengwei.yin@...el.com
Subject: Re: [linux-next:master] [tcp] dfa2f04833: stress-ng.sock.ops_per_sec
 -7.3% regression

On Mon, Jul 31, 2023 at 3:35 PM kernel test robot <oliver.sang@...el.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a -7.3% regression of stress-ng.sock.ops_per_sec on:
>
>
> commit: dfa2f0483360d4d6f2324405464c9f281156bd87 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: stress-ng
> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
> parameters:
>

TCP 'performance' on some tests depends on initial values for
tcp_rmem[] (and many others sysctl)

The commit changed some initial RWIN values for some MTU/MSS setings,
it is next to impossible to make a change that is a win for all cases.

If you care about a particular real workload, not a synthetic benchmark,
I think you should give us more details.

Thanks.

>         nr_threads: 1
>         disk: 1HDD
>         testtime: 60s
>         fs: ext4
>         class: os
>         test: sock
>         cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202307312121.d8479e5e-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         sudo bin/lkp install job.yaml           # job file is attached in this email
>         bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>         sudo bin/lkp run generated-yaml-file
>
>         # if come across any failure that blocks the test,
>         # please remove ~/.lkp and /lkp dir to run from a clean state.
>
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/sock/stress-ng/60s
>
> commit:
>   63c8778d91 ("Merge branch 'net-mana-fix-doorbell-access-for-receive-queues'")
>   dfa2f04833 ("tcp: get rid of sysctl_tcp_adv_win_scale")
>
> 63c8778d9149d5df dfa2f0483360d4d6f2324405464
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    8094125           +21.5%    9832824 ą 18%  cpuidle..usage
>       5.04            -6.1%       4.73 ą 10%  iostat.cpu.system
>     330990 ą  2%     -32.3%     223958 ą  3%  turbostat.C1
>    4685666           +22.3%    5729557        turbostat.POLL
>      23600 ą  8%     +51.9%      35849 ą 25%  sched_debug.cfs_rq:/.min_vruntime.max
>       4907 ą  7%     +44.2%       7073 ą 45%  sched_debug.cfs_rq:/.min_vruntime.stddev
>       4911 ą  7%     +44.1%       7075 ą 45%  sched_debug.cfs_rq:/.spread0.stddev
>      43.08 ą 15%     -41.0%      25.42 ą 32%  perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>     269948 ą  2%      +8.1%     291932 ą  2%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
>      43.08 ą 15%     -41.0%      25.42 ą 32%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       0.02 ą 31%     +35.0%       0.03 ą  5%  perf-sched.wait_time.max.ms.__cond_resched.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.__sys_sendto
>      93552            -7.3%      86706        stress-ng.sock.ops
>       1559            -7.3%       1445        stress-ng.sock.ops_per_sec
>     139.17            -3.4%     134.50        stress-ng.time.percent_of_cpu_this_job_got
>    5092570           +18.6%    6039727        stress-ng.time.voluntary_context_switches
>       1.45            +1.4        2.83 ą105%  perf-stat.i.branch-miss-rate%
>    1620951 ą 30%     -39.7%     977769 ą 37%  perf-stat.i.dTLB-store-misses
>     911.68            -3.6%     878.55        perf-stat.i.instructions-per-iTLB-miss
>       1.54            +0.2        1.69 ą 15%  perf-stat.overall.branch-miss-rate%
>       0.16 ą 30%      -0.1        0.10 ą 22%  perf-stat.overall.dTLB-store-miss-rate%
>     742.16            -4.3%     710.16        perf-stat.overall.instructions-per-iTLB-miss
>    1595258 ą 30%     -39.6%     962800 ą 37%  perf-stat.ps.dTLB-store-misses
>      67709           +12.6%      76211 ą 14%  proc-vmstat.nr_active_anon
>      73849           +11.0%      81975 ą 11%  proc-vmstat.nr_shmem
>      67709           +12.6%      76211 ą 14%  proc-vmstat.nr_zone_active_anon
>    6320969            -6.7%    5895784        proc-vmstat.numa_hit
>    6314894            -6.8%    5885708        proc-vmstat.numa_local
>     102508            +5.9%     108525        proc-vmstat.pgactivate
>   48068383            -7.3%   44558110        proc-vmstat.pgalloc_normal
>   47937851            -7.3%   44421205        proc-vmstat.pgfree
>       0.70 ą 14%      +0.2        0.88 ą 14%  perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry
>       0.48 ą 47%      +0.2        0.70 ą 14%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>       2.76 ą  9%      +0.5        3.30 ą  2%  perf-profile.calltrace.cycles-pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
>       0.39 ą 72%      +0.6        0.95 ą 24%  perf-profile.calltrace.cycles-pp.try_to_wake_up.__wake_up_common.__wake_up_common_lock.sock_def_readable.tcp_data_queue
>       3.32 ą 10%      +0.7        4.00        perf-profile.calltrace.cycles-pp.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core
>       6.88 ą  7%      +0.8        7.71 ą  2%  perf-profile.calltrace.cycles-pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action.__do_softirq
>       7.18 ą  7%      +0.8        8.02 ą  2%  perf-profile.calltrace.cycles-pp.__napi_poll.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip
>       7.16 ą  7%      +0.9        8.02 ą  2%  perf-profile.calltrace.cycles-pp.process_backlog.__napi_poll.net_rx_action.__do_softirq.do_softirq
>       8.90 ą  6%      +1.0        9.89        perf-profile.calltrace.cycles-pp.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit
>       9.37 ą  6%      +1.0       10.40        perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb
>       9.33 ą  6%      +1.0       10.37        perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit
>       9.26 ą  6%      +1.0       10.30        perf-profile.calltrace.cycles-pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2
>       2.48 ą 17%      +1.3        3.82 ą  2%  perf-profile.calltrace.cycles-pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_sendmsg_locked
>       2.61 ą 17%      +1.3        3.96 ą  2%  perf-profile.calltrace.cycles-pp.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_sendmsg_locked.tcp_sendmsg
>       0.80 ą 15%      -0.4        0.43 ą 10%  perf-profile.children.cycles-pp.tcp_rcv_space_adjust
>       1.35 ą  5%      -0.2        1.19 ą  6%  perf-profile.children.cycles-pp.__entry_text_start
>       0.56 ą 15%      -0.2        0.40 ą 11%  perf-profile.children.cycles-pp.__x64_sys_connect
>       0.56 ą 15%      -0.2        0.40 ą 11%  perf-profile.children.cycles-pp.__sys_connect
>       0.55 ą 14%      -0.2        0.40 ą 12%  perf-profile.children.cycles-pp.inet_stream_connect
>       0.55 ą 15%      -0.1        0.40 ą 12%  perf-profile.children.cycles-pp.__inet_stream_connect
>       0.38 ą 11%      -0.1        0.28 ą 21%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
>       0.44 ą  9%      -0.1        0.33 ą 13%  perf-profile.children.cycles-pp.__close
>       0.37 ą 12%      -0.1        0.27 ą 20%  perf-profile.children.cycles-pp.task_work_run
>       0.77 ą  5%      -0.1        0.68 ą  8%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       0.34 ą 12%      -0.1        0.26 ą 21%  perf-profile.children.cycles-pp.__fput
>       0.31 ą 11%      -0.1        0.23 ą 18%  perf-profile.children.cycles-pp.tcp_v4_connect
>       0.22 ą 14%      -0.1        0.16 ą 22%  perf-profile.children.cycles-pp.__sock_release
>       0.22 ą 14%      -0.1        0.16 ą 22%  perf-profile.children.cycles-pp.sock_close
>       0.23 ą 19%      -0.1        0.16 ą 14%  perf-profile.children.cycles-pp.tcp_try_coalesce
>       0.09 ą 14%      -0.0        0.05 ą 48%  perf-profile.children.cycles-pp.new_inode_pseudo
>       0.07 ą 12%      -0.0        0.04 ą 72%  perf-profile.children.cycles-pp.__ns_get_path
>       0.17 ą  8%      +0.0        0.22 ą  8%  perf-profile.children.cycles-pp.ip_send_check
>       0.23 ą  7%      +0.0        0.28 ą  7%  perf-profile.children.cycles-pp.ip_local_out
>       0.09 ą 22%      +0.0        0.14 ą 10%  perf-profile.children.cycles-pp.available_idle_cpu
>       0.22 ą  9%      +0.1        0.26 ą  7%  perf-profile.children.cycles-pp.__ip_local_out
>       0.46 ą 11%      +0.1        0.56 ą  4%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
>       0.92 ą  3%      +0.1        1.06 ą  5%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       7.10 ą  2%      +0.7        7.76 ą  3%  perf-profile.children.cycles-pp.tcp_v4_rcv
>       7.21 ą  2%      +0.7        7.90 ą  3%  perf-profile.children.cycles-pp.ip_protocol_deliver_rcu
>       7.42 ą  2%      +0.7        8.12 ą  3%  perf-profile.children.cycles-pp.ip_local_deliver_finish
>       8.00 ą  2%      +0.7        8.71 ą  2%  perf-profile.children.cycles-pp.__netif_receive_skb_one_core
>       8.34 ą  2%      +0.7        9.06 ą  2%  perf-profile.children.cycles-pp.__napi_poll
>       8.32 ą  2%      +0.7        9.05 ą  2%  perf-profile.children.cycles-pp.process_backlog
>      11.71 ą  3%      +0.9       12.63 ą  2%  perf-profile.children.cycles-pp.__dev_queue_xmit
>      13.86 ą  2%      +0.9       14.78 ą  2%  perf-profile.children.cycles-pp.__tcp_transmit_skb
>      11.92 ą  2%      +0.9       12.86 ą  2%  perf-profile.children.cycles-pp.ip_finish_output2
>      10.05 ą  3%      +0.9       10.99 ą  2%  perf-profile.children.cycles-pp.net_rx_action
>      12.66 ą  2%      +1.0       13.62 ą  2%  perf-profile.children.cycles-pp.__ip_queue_xmit
>      10.56 ą  3%      +1.0       11.53 ą  2%  perf-profile.children.cycles-pp.do_softirq
>      10.82 ą  3%      +1.0       11.80 ą  2%  perf-profile.children.cycles-pp.__local_bh_enable_ip
>      10.94 ą  4%      +1.0       11.94 ą  2%  perf-profile.children.cycles-pp.__do_softirq
>       0.52 ą 21%      -0.4        0.16 ą 16%  perf-profile.self.cycles-pp.tcp_rcv_space_adjust
>       0.62 ą  7%      -0.1        0.48 ą 10%  perf-profile.self.cycles-pp.tcp_sendmsg
>       0.63 ą  5%      -0.1        0.55 ą  7%  perf-profile.self.cycles-pp.__entry_text_start
>       0.10 ą 15%      +0.0        0.14 ą 13%  perf-profile.self.cycles-pp.schedule_timeout
>       0.10 ą 20%      +0.0        0.14 ą 16%  perf-profile.self.cycles-pp.enqueue_entity
>       0.08 ą 22%      +0.1        0.14 ą 11%  perf-profile.self.cycles-pp.available_idle_cpu
>       0.37 ą  8%      +0.1        0.44 ą  4%  perf-profile.self.cycles-pp.net_rx_action
>       0.92 ą  3%      +0.1        1.06 ą  5%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ