lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Wed, 17 Mar 2021 21:32:14 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Stanislav Fomichev <sdf@...gle.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Song Liu <songliubraving@...com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...el.com
Subject: [bpf]  a9ed15dae0:  netperf.Throughput_tps 3.9% improvement



Greeting,

FYI, we noticed a 3.9% improvement of netperf.Throughput_tps due to commit:


commit: a9ed15dae0755a0368735e0556a462d8519bdb05 ("bpf: Split cgroup_bpf_enabled per attach type")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: netperf
on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:

	ip: ipv4
	runtime: 300s
	nr_threads: 25%
	cluster: cs-localhost
	test: UDP_RR
	cpufreq_governor: performance
	ucode: 0x5003006

test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install                job.yaml  # job file is attached in this email
        bin/lkp split-job --compatible job.yaml
        bin/lkp run                    compatible-job.yaml

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase/ucode:
  cs-localhost/gcc-9/performance/ipv4/x86_64-rhel-8.3/25%/debian-10.4-x86_64-20200603.cgz/300s/lkp-csl-2sp9/UDP_RR/netperf/0x5003006

commit: 
  20f2505fb4 ("bpf: Try to avoid kzalloc in cgroup/{s,g}etsockopt")
  a9ed15dae0 ("bpf: Split cgroup_bpf_enabled per attach type")

20f2505fb436cfa6 a9ed15dae0755a0368735e0556a 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   2049344            +3.9%    2129386        netperf.Throughput_total_tps
     93152            +3.9%      96790        netperf.Throughput_tps
      9796 ±  2%      +7.2%      10501        netperf.time.involuntary_context_switches
 6.147e+08            +3.9%  6.387e+08        netperf.time.voluntary_context_switches
 6.148e+08            +3.9%  6.388e+08        netperf.workload
     10.42            -1.1        9.36        turbostat.C1%
   8089495            +3.9%    8405213        vmstat.system.cs
 2.774e+09           -10.2%  2.492e+09        cpuidle.C1.time
 1.754e+09           +10.7%  1.941e+09        cpuidle.POLL.time
 5.579e+08           +17.4%  6.552e+08        cpuidle.POLL.usage
      3.00 ±223%    +686.6%      23.56 ±105%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     15.59 ± 74%    +109.0%      32.59 ± 12%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork
      0.01 ± 71%   +1483.3%       0.11 ± 56%  perf-sched.sch_delay.max.ms.preempt_schedule_common._cond_resched.stop_one_cpu.affine_move_task.__set_cpus_allowed_ptr
      1182 ± 73%     +82.1%       2152 ±  6%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork
      0.01 ± 71%     +65.8%       0.01 ±  4%  perf-sched.total_sch_delay.average.ms
    833.13 ± 82%    +100.0%       1666 ± 28%  perf-sched.wait_and_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    833.12 ± 82%    +100.0%       1666 ± 28%  perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
   3550136 ±  7%     +17.9%    4185038 ±  5%  softirqs.CPU0.NET_RX
   7292845 ±  8%     +11.0%    8091559 ±  4%  softirqs.CPU11.NET_RX
     16054 ± 11%     -13.2%      13942 ±  5%  softirqs.CPU16.RCU
     15910 ±  9%     -11.6%      14064 ±  6%  softirqs.CPU22.RCU
     15262 ± 17%     -15.3%      12920 ±  4%  softirqs.CPU3.RCU
     15230 ± 13%     -16.3%      12751 ±  5%  softirqs.CPU49.RCU
     14806 ± 13%     -13.5%      12811 ±  3%  softirqs.CPU50.RCU
     14895 ± 14%     -15.5%      12580 ±  6%  softirqs.CPU55.RCU
     14851 ± 12%     -13.7%      12810 ±  4%  softirqs.CPU59.RCU
   5330188 ±  3%      +8.8%    5801721 ±  5%  softirqs.CPU63.NET_RX
   5480730 ±  8%     +15.0%    6302697 ±  4%  softirqs.CPU80.NET_RX
    731732 ±  3%     +22.3%     894653 ±  5%  interrupts.CAL:Function_call_interrupts
      6652 ± 31%     +58.2%      10526 ± 29%  interrupts.CPU25.CAL:Function_call_interrupts
      9123 ± 24%     +57.8%      14392 ±  9%  interrupts.CPU3.CAL:Function_call_interrupts
      4532 ± 10%     +19.6%       5418 ±  7%  interrupts.CPU34.NMI:Non-maskable_interrupts
      4532 ± 10%     +19.6%       5418 ±  7%  interrupts.CPU34.PMI:Performance_monitoring_interrupts
     12587 ±  4%     -18.6%      10245 ±  6%  interrupts.CPU36.RES:Rescheduling_interrupts
      3988 ± 30%     +43.1%       5706 ±  9%  interrupts.CPU37.NMI:Non-maskable_interrupts
      3988 ± 30%     +43.1%       5706 ±  9%  interrupts.CPU37.PMI:Performance_monitoring_interrupts
      4298 ± 20%     +23.8%       5321 ± 10%  interrupts.CPU38.NMI:Non-maskable_interrupts
      4298 ± 20%     +23.8%       5321 ± 10%  interrupts.CPU38.PMI:Performance_monitoring_interrupts
      2638 ± 29%     +83.9%       4851 ± 20%  interrupts.CPU43.NMI:Non-maskable_interrupts
      2638 ± 29%     +83.9%       4851 ± 20%  interrupts.CPU43.PMI:Performance_monitoring_interrupts
      9514 ± 19%     +37.5%      13080 ± 12%  interrupts.CPU47.CAL:Function_call_interrupts
      8019 ± 25%     +47.6%      11836 ± 34%  interrupts.CPU50.CAL:Function_call_interrupts
      3317 ± 37%     +54.5%       5125 ±  9%  interrupts.CPU63.NMI:Non-maskable_interrupts
      3317 ± 37%     +54.5%       5125 ±  9%  interrupts.CPU63.PMI:Performance_monitoring_interrupts
      6959 ± 46%     +86.9%      13006 ± 30%  interrupts.CPU73.CAL:Function_call_interrupts
     19.24            +3.9%      20.00        perf-stat.i.MPKI
 2.521e+08            +1.0%  2.547e+08        perf-stat.i.branch-misses
  10735680 ±  8%     +54.9%   16625059 ± 29%  perf-stat.i.cache-misses
 1.401e+09            +4.4%  1.462e+09        perf-stat.i.cache-references
   8146058            +3.9%    8463745        perf-stat.i.context-switches
      1.47            +1.5%       1.49        perf-stat.i.cpi
 1.072e+11            +1.9%  1.093e+11        perf-stat.i.cpu-cycles
     14335 ± 11%     -35.3%       9272 ± 31%  perf-stat.i.cycles-between-cache-misses
     83.51            +0.9       84.43        perf-stat.i.iTLB-load-miss-rate%
  38402582            -6.7%   35845533        perf-stat.i.iTLB-loads
      0.69            -1.4%       0.68        perf-stat.i.ipc
      1.22            +1.9%       1.24        perf-stat.i.metric.GHz
   1939336 ±  3%     +11.7%    2166223 ±  6%  perf-stat.i.node-store-misses
     18.96            +3.9%      19.69        perf-stat.overall.MPKI
      1.45            +1.4%       1.47        perf-stat.overall.cpi
     10050 ±  8%     -29.4%       7096 ± 25%  perf-stat.overall.cycles-between-cache-misses
     83.62            +0.9       84.55        perf-stat.overall.iTLB-load-miss-rate%
      0.69            -1.4%       0.68        perf-stat.overall.ipc
     36166            -3.3%      34988        perf-stat.overall.path-length
 2.512e+08            +1.0%  2.538e+08        perf-stat.ps.branch-misses
  10702292 ±  8%     +54.9%   16573593 ± 29%  perf-stat.ps.cache-misses
 1.396e+09            +4.4%  1.457e+09        perf-stat.ps.cache-references
   8118168            +3.9%    8434896        perf-stat.ps.context-switches
 1.069e+11            +1.9%  1.089e+11        perf-stat.ps.cpu-cycles
  38273181            -6.7%   35723998        perf-stat.ps.iTLB-loads
   1932777 ±  3%     +11.7%    2159007 ±  6%  perf-stat.ps.node-store-misses
     14.95            -0.9       14.04 ±  6%  perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
     14.81            -0.9       13.92 ±  6%  perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
     13.33            -0.9       12.45 ±  6%  perf-profile.calltrace.cycles-pp.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
     13.28            -0.9       12.40 ±  6%  perf-profile.calltrace.cycles-pp.udp_recvmsg.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
      0.72 ±  2%      -0.1        0.67 ±  7%  perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.__kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
     14.97            -0.9       14.06 ±  6%  perf-profile.children.cycles-pp.__x64_sys_recvfrom
     14.82            -0.9       13.93 ±  6%  perf-profile.children.cycles-pp.__sys_recvfrom
     13.34            -0.9       12.45 ±  6%  perf-profile.children.cycles-pp.inet_recvmsg
     13.29            -0.9       12.41 ±  6%  perf-profile.children.cycles-pp.udp_recvmsg
      1.01            -0.6        0.43 ±  9%  perf-profile.children.cycles-pp._raw_spin_lock_bh
      0.27 ±  4%      -0.1        0.18 ±  5%  perf-profile.children.cycles-pp.__netif_receive_skb_core
      0.55 ±  2%      -0.1        0.48 ±  7%  perf-profile.children.cycles-pp.__slab_free
      0.44 ±  2%      -0.1        0.38 ±  9%  perf-profile.children.cycles-pp.skb_set_owner_w
      0.74 ±  2%      -0.1        0.68 ±  7%  perf-profile.children.cycles-pp.__kmalloc_node_track_caller
      0.29            -0.1        0.24 ±  5%  perf-profile.children.cycles-pp.___perf_sw_event
      0.21            -0.0        0.16 ±  8%  perf-profile.children.cycles-pp.migrate_enable
      0.25            -0.0        0.20 ±  9%  perf-profile.children.cycles-pp.__might_sleep
      0.12 ±  5%      -0.0        0.08 ± 10%  perf-profile.children.cycles-pp._cond_resched
      0.07 ±  5%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.raw_local_deliver
      0.24 ±  3%      -0.0        0.20 ±  6%  perf-profile.children.cycles-pp.security_socket_recvmsg
      0.18 ±  4%      -0.0        0.15 ±  7%  perf-profile.children.cycles-pp.__ip_finish_output
      0.12 ±  4%      -0.0        0.09 ± 10%  perf-profile.children.cycles-pp.rcu_read_unlock_strict
      0.26 ±  2%      -0.0        0.23 ±  7%  perf-profile.children.cycles-pp.sock_recvmsg
      0.24 ±  2%      -0.0        0.22 ±  4%  perf-profile.children.cycles-pp.ipv4_mtu
      0.00            +0.1        0.05 ±  8%  perf-profile.children.cycles-pp.rcu_note_context_switch
      0.28 ±  2%      +0.1        0.34 ±  4%  perf-profile.children.cycles-pp.skb_release_data
      0.39 ±  3%      +0.1        0.53 ±  7%  perf-profile.children.cycles-pp.__cgroup_bpf_run_filter_skb
      0.27 ±  4%      +0.2        0.43 ±  7%  perf-profile.children.cycles-pp.ip_finish_output
      0.99            -0.6        0.42 ±  9%  perf-profile.self.cycles-pp._raw_spin_lock_bh
      0.27 ±  3%      -0.1        0.18 ±  7%  perf-profile.self.cycles-pp.__netif_receive_skb_core
      0.22 ±  4%      -0.1        0.14 ±  8%  perf-profile.self.cycles-pp.validate_xmit_skb
      0.59 ±  2%      -0.1        0.52 ±  5%  perf-profile.self.cycles-pp.udp_sendmsg
      0.22 ±  3%      -0.1        0.15 ±  9%  perf-profile.self.cycles-pp.__local_bh_enable_ip
      0.54 ±  2%      -0.1        0.47 ±  7%  perf-profile.self.cycles-pp.__slab_free
      0.43 ±  2%      -0.1        0.37 ±  9%  perf-profile.self.cycles-pp.skb_set_owner_w
      0.67 ±  2%      -0.1        0.61 ±  7%  perf-profile.self.cycles-pp.__skb_wait_for_more_packets
      0.29 ±  3%      -0.0        0.25 ±  6%  perf-profile.self.cycles-pp.__kmalloc_node_track_caller
      0.19 ±  3%      -0.0        0.15 ±  6%  perf-profile.self.cycles-pp.migrate_enable
      0.07 ± 11%      -0.0        0.03 ± 70%  perf-profile.self.cycles-pp.raw_local_deliver
      0.21 ±  3%      -0.0        0.18 ±  8%  perf-profile.self.cycles-pp.__might_sleep
      0.13 ± 11%      -0.0        0.10 ± 10%  perf-profile.self.cycles-pp.udp_unicast_rcv_skb
      0.21 ±  4%      -0.0        0.18 ±  6%  perf-profile.self.cycles-pp.___perf_sw_event
      0.09 ±  5%      -0.0        0.07 ± 10%  perf-profile.self.cycles-pp.rcu_read_unlock_strict
      0.23 ±  2%      -0.0        0.21 ±  4%  perf-profile.self.cycles-pp.ipv4_mtu
      0.12 ±  4%      -0.0        0.10 ±  9%  perf-profile.self.cycles-pp.migrate_disable
      0.08 ±  6%      -0.0        0.06 ± 11%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.06 ±  7%      +0.0        0.10 ±  8%  perf-profile.self.cycles-pp.asm_call_sysvec_on_stack
      0.27 ±  3%      +0.0        0.31 ± 10%  perf-profile.self.cycles-pp.__udp_enqueue_schedule_skb
      0.28 ±  2%      +0.1        0.34 ±  4%  perf-profile.self.cycles-pp.skb_release_data
      0.01 ±223%      +0.1        0.07 ±  6%  perf-profile.self.cycles-pp.__ip_select_ident


                                                                                
                               netperf.Throughput_tps                           
                                                                                
  97500 +-------------------------------------------------------------------+   
  97000 |-+                                                                 |   
        |             O                                                     |   
  96500 |-+                                      O            O             |   
  96000 |-+                                                                 |   
        |                                                                   |   
  95500 |-+                                                                 |   
  95000 |-+                                                                 |   
  94500 |-+                                                                 |   
        |                                                                   |   
  94000 |-+                                                                 |   
  93500 |-+                  ......+.............+............              |   
        |          ...+......                                 +.............|   
  93000 |-+ .......                                                         |   
  92500 +-------------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                             netperf.Throughput_total_tps                       
                                                                                
  2.16e+06 +----------------------------------------------------------------+   
           |                                                                |   
  2.14e+06 |-+                       O                                      |   
           |            O                                                   |   
  2.12e+06 |-+                                    O            O            |   
           |                                                                |   
   2.1e+06 |-+                                                              |   
           |                                                                |   
  2.08e+06 |-+                                                              |   
           |                                                                |   
  2.06e+06 |-+                 ......+............                          |   
           |      ......+......                   +............+............|   
  2.04e+06 |......                                                          |   
           |                                                                |   
  2.02e+06 +----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                   netperf.workload                             
                                                                                
  6.45e+08 +----------------------------------------------------------------+   
           |                         O                                      |   
   6.4e+08 |-+          O                                                   |   
           |                                                   O            |   
  6.35e+08 |-+                                    O                         |   
           |                                                                |   
   6.3e+08 |-+                                                              |   
           |                                                                |   
  6.25e+08 |-+                                                              |   
           |                                                                |   
   6.2e+08 |-+                                                              |   
           |                   ......+............+............             |   
  6.15e+08 |-+    ......+......                                +............|   
           |......                                                          |   
   6.1e+08 +----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                        netperf.time.voluntary_context_switches                 
                                                                                
  6.45e+08 +----------------------------------------------------------------+   
           |                         O                                      |   
   6.4e+08 |-+                                                              |   
           |            O                                      O            |   
  6.35e+08 |-+                                    O                         |   
           |                                                                |   
   6.3e+08 |-+                                                              |   
           |                                                                |   
  6.25e+08 |-+                                                              |   
           |                                                                |   
   6.2e+08 |-+                                                              |   
           |                   ......+............+............             |   
  6.15e+08 |-+    ......+......                                +............|   
           |......                                                          |   
   6.1e+08 +----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.11.0-rc4-00516-ga9ed15dae075" of type "text/plain" (172414 bytes)

View attachment "job-script" of type "text/plain" (8237 bytes)

View attachment "job.yaml" of type "text/plain" (5525 bytes)

View attachment "reproduce" of type "text/plain" (1375 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ