lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Mon, 12 Apr 2021 15:52:44 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Song Liu <songliubraving@...com>
Cc:     Alexei Starovoitov <ast@...nel.org>, KP Singh <kpsingh@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
        feng.tang@...el.com, zhengjun.xing@...el.com
Subject: [bpf]  a10787e6d5:  will-it-scale.per_process_ops 3.5% improvement



Greeting,

FYI, we noticed a 3.5% improvement of will-it-scale.per_process_ops due to commit:


commit: a10787e6d58c24b51e91c19c6d16c5da89fcaa4b ("bpf: Enable task local storage for tracing programs")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:

	nr_task: 16
	mode: process
	test: mmap2
	cpufreq_governor: performance
	ucode: 0x5003006

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install                job.yaml  # job file is attached in this email
        bin/lkp split-job --compatible job.yaml
        bin/lkp run                    compatible-job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/mmap2/will-it-scale/0x5003006

commit: 
  9c8f21e6f8 ("xsk: Build skb by page (aka generic zerocopy xmit)")
  a10787e6d5 ("bpf: Enable task local storage for tracing programs")

9c8f21e6f8856a96 a10787e6d58c24b51e91c19c6d1 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   8990002            +3.5%    9304107        will-it-scale.16.processes
    561874            +3.5%     581506        will-it-scale.per_process_ops
   8990002            +3.5%    9304107        will-it-scale.workload
    112185 ± 23%     +46.6%     164508 ± 22%  numa-numastat.node0.local_node
     63.33 ± 93%     -80.8%      12.17 ±130%  numa-vmstat.node0.nr_inactive_file
     63.33 ± 93%     -80.8%      12.17 ±130%  numa-vmstat.node0.nr_zone_inactive_file
     14212 ± 23%     +41.7%      20144 ± 14%  softirqs.CPU15.SCHED
     30141 ± 13%     -22.5%      23370 ± 14%  softirqs.CPU59.SCHED
     66.17 ± 88%     -90.7%       6.17 ± 48%  interrupts.CPU60.RES:Rescheduling_interrupts
    500.00           +86.1%     930.33 ± 60%  interrupts.CPU69.CAL:Function_call_interrupts
    396.17 ±  6%     -18.8%     321.50 ± 21%  interrupts.CPU87.NMI:Non-maskable_interrupts
    396.17 ±  6%     -18.8%     321.50 ± 21%  interrupts.CPU87.PMI:Performance_monitoring_interrupts
      5.45 ± 46%     -98.5%       0.08 ± 73%  perf-sched.wait_and_delay.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
    176.51 ± 36%     -61.2%      68.51 ± 77%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
      5.45 ± 46%     -98.5%       0.08 ± 73%  perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
    176.50 ± 36%     -61.2%      68.50 ± 77%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
 2.304e+10            +3.4%  2.383e+10        perf-stat.i.branch-instructions
  72536156            +4.1%   75492267        perf-stat.i.branch-misses
      0.48            -3.3%       0.47        perf-stat.i.cpi
      0.00 ± 15%      -0.0        0.00 ±  9%  perf-stat.i.dTLB-load-miss-rate%
 2.404e+10            +3.4%  2.487e+10        perf-stat.i.dTLB-loads
 1.096e+10            +3.4%  1.133e+10        perf-stat.i.dTLB-stores
  47654226           +12.8%   53744349        perf-stat.i.iTLB-load-misses
 9.562e+10            +3.4%  9.889e+10        perf-stat.i.instructions
      2015            -8.4%       1847        perf-stat.i.instructions-per-iTLB-miss
      2.06            +3.5%       2.14        perf-stat.i.ipc
    659.67            +3.4%     682.32        perf-stat.i.metric.M/sec
      0.48            -3.4%       0.47        perf-stat.overall.cpi
      0.00 ± 18%      -0.0        0.00 ± 14%  perf-stat.overall.dTLB-load-miss-rate%
      2006            -8.3%       1840        perf-stat.overall.instructions-per-iTLB-miss
      2.07            +3.5%       2.14        perf-stat.overall.ipc
 2.297e+10            +3.4%  2.375e+10        perf-stat.ps.branch-instructions
  72285805            +4.1%   75236431        perf-stat.ps.branch-misses
 2.396e+10            +3.4%  2.479e+10        perf-stat.ps.dTLB-loads
 1.092e+10            +3.4%   1.13e+10        perf-stat.ps.dTLB-stores
  47489125           +12.8%   53563329        perf-stat.ps.iTLB-load-misses
 9.529e+10            +3.4%  9.856e+10        perf-stat.ps.instructions
 2.876e+13            +3.5%  2.976e+13        perf-stat.total.instructions
     44.75            -7.7       37.01 ± 11%  perf-profile.calltrace.cycles-pp.__munmap
     42.13            -7.2       34.95 ± 11%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     41.64            -7.1       34.53 ± 11%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     41.21            -7.1       34.11 ± 11%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     41.45            -7.1       34.36 ± 11%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     39.74            -6.9       32.83 ± 11%  perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     33.92            -6.2       27.75 ± 11%  perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     25.32            -5.7       19.64 ± 11%  perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
     24.74            -5.7       19.08 ± 11%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
     10.59            -3.7        6.89 ± 11%  perf-profile.calltrace.cycles-pp.___might_sleep.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
      1.60            -0.5        1.06 ± 32%  perf-profile.calltrace.cycles-pp.__entry_text_start.__mmap
      2.94            -0.4        2.56 ± 10%  perf-profile.calltrace.cycles-pp.d_path.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
      2.85 ±  2%      -0.4        2.47 ± 11%  perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.66 ±  6%      -0.4        0.29 ±101%  perf-profile.calltrace.cycles-pp.strlen.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
      2.39 ±  3%      -0.3        2.10 ± 11%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff
      1.30 ±  3%      -0.2        1.08 ± 11%  perf-profile.calltrace.cycles-pp.security_mmap_file.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.97 ±  2%      -0.2        0.78 ± 11%  perf-profile.calltrace.cycles-pp.find_vma.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      0.67 ±  3%      -0.2        0.49 ± 45%  perf-profile.calltrace.cycles-pp.touch_atime.shmem_mmap.mmap_region.do_mmap.vm_mmap_pgoff
      0.90 ±  5%      -0.2        0.73 ±  8%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
      0.78 ±  5%      -0.1        0.63 ±  8%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle
     26.40 ±  4%     +10.3       36.72 ± 17%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     26.40 ±  4%     +10.3       36.72 ± 17%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     26.40 ±  4%     +10.3       36.72 ± 17%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     26.11 ±  5%     +10.4       36.49 ± 18%  perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     26.00 ±  5%     +10.4       36.40 ± 18%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
     27.39 ±  4%     +11.1       38.45 ± 18%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     25.93 ±  4%     +11.4       37.32 ± 18%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
     67.97           -10.6       57.41 ± 11%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     66.99           -10.4       56.56 ± 11%  perf-profile.children.cycles-pp.do_syscall_64
     44.75            -7.4       37.31 ± 11%  perf-profile.children.cycles-pp.__munmap
     41.23            -7.1       34.12 ± 11%  perf-profile.children.cycles-pp.__vm_munmap
     41.47            -7.1       34.38 ± 11%  perf-profile.children.cycles-pp.__x64_sys_munmap
     39.79            -6.9       32.88 ± 11%  perf-profile.children.cycles-pp.__do_munmap
     33.98            -6.2       27.81 ± 11%  perf-profile.children.cycles-pp.unmap_region
     25.35            -5.7       19.67 ± 11%  perf-profile.children.cycles-pp.unmap_vmas
     24.73            -5.6       19.12 ± 11%  perf-profile.children.cycles-pp.unmap_page_range
     11.68            -3.9        7.83 ± 11%  perf-profile.children.cycles-pp.___might_sleep
      2.98            -0.4        2.59 ± 10%  perf-profile.children.cycles-pp.d_path
      2.87 ±  2%      -0.4        2.49 ± 11%  perf-profile.children.cycles-pp.get_unmapped_area
      2.49 ±  2%      -0.3        2.18 ± 11%  perf-profile.children.cycles-pp.kmem_cache_alloc
      2.09            -0.3        1.80 ± 11%  perf-profile.children.cycles-pp.__entry_text_start
      2.31            -0.3        2.02 ± 10%  perf-profile.children.cycles-pp.zap_pte_range
      1.31 ±  3%      -0.2        1.09 ± 11%  perf-profile.children.cycles-pp.security_mmap_file
      1.24 ±  2%      -0.2        1.05 ± 10%  perf-profile.children.cycles-pp.down_write
      1.00            -0.2        0.81 ± 10%  perf-profile.children.cycles-pp.find_vma
      0.66 ±  6%      -0.1        0.52 ± 15%  perf-profile.children.cycles-pp.strlen
      0.66 ±  3%      -0.1        0.53 ± 12%  perf-profile.children.cycles-pp.common_file_perm
      0.69 ±  3%      -0.1        0.58 ± 10%  perf-profile.children.cycles-pp.touch_atime
      0.36 ±  4%      -0.1        0.29 ±  8%  perf-profile.children.cycles-pp.sync_mm_rss
      0.40 ±  3%      -0.1        0.34 ±  8%  perf-profile.children.cycles-pp.downgrade_write
      0.19 ± 12%      -0.1        0.13 ± 21%  perf-profile.children.cycles-pp.cap_capable
      0.25 ±  4%      -0.1        0.20 ± 10%  perf-profile.children.cycles-pp.vmacache_find
      0.18 ±  7%      -0.0        0.14 ± 10%  perf-profile.children.cycles-pp.tlb_flush_mmu
      0.19 ±  7%      -0.0        0.15 ± 13%  perf-profile.children.cycles-pp.lru_add_drain_cpu
      0.13 ± 11%      -0.0        0.10 ± 15%  perf-profile.children.cycles-pp.__libc_start_main
      0.13 ± 11%      -0.0        0.10 ± 15%  perf-profile.children.cycles-pp.main
      0.13 ± 11%      -0.0        0.10 ± 15%  perf-profile.children.cycles-pp.run_builtin
      0.12 ± 10%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.timestamp_truncate
      0.09 ±  5%      -0.0        0.06 ± 20%  perf-profile.children.cycles-pp.common_mmap
      0.19 ±  9%      -0.0        0.16 ±  5%  perf-profile.children.cycles-pp.may_expand_vm
      0.19 ±  6%      -0.0        0.16 ±  5%  perf-profile.children.cycles-pp.userfaultfd_unmap_complete
      0.09 ± 12%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.vm_pgprot_modify
      0.08 ±  6%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.get_align_mask
      0.10 ±  7%      +0.0        0.13 ± 14%  perf-profile.children.cycles-pp.blocking_notifier_call_chain
      0.08 ± 22%      +0.0        0.13 ± 12%  perf-profile.children.cycles-pp.munmap@plt
     26.40 ±  4%     +10.3       36.72 ± 17%  perf-profile.children.cycles-pp.start_secondary
     27.39 ±  4%     +11.1       38.45 ± 18%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     27.39 ±  4%     +11.1       38.45 ± 18%  perf-profile.children.cycles-pp.cpu_startup_entry
     27.39 ±  4%     +11.1       38.45 ± 18%  perf-profile.children.cycles-pp.do_idle
     27.10 ±  4%     +11.1       38.21 ± 18%  perf-profile.children.cycles-pp.cpuidle_enter
     27.09 ±  4%     +11.1       38.21 ± 18%  perf-profile.children.cycles-pp.cpuidle_enter_state
     26.00 ±  4%     +11.3       37.32 ± 18%  perf-profile.children.cycles-pp.intel_idle
     11.56            -3.8        7.71 ± 11%  perf-profile.self.cycles-pp.___might_sleep
      1.28 ±  4%      -0.2        1.07 ± 10%  perf-profile.self.cycles-pp.perf_event_mmap
      1.01            -0.2        0.84 ± 11%  perf-profile.self.cycles-pp.__entry_text_start
      1.08 ±  4%      -0.2        0.92 ±  9%  perf-profile.self.cycles-pp.kmem_cache_alloc
      0.66 ±  6%      -0.1        0.51 ± 14%  perf-profile.self.cycles-pp.strlen
      0.67            -0.1        0.54 ± 11%  perf-profile.self.cycles-pp.find_vma
      0.50 ±  4%      -0.1        0.40 ± 12%  perf-profile.self.cycles-pp.common_file_perm
      0.50 ±  6%      -0.1        0.41 ± 11%  perf-profile.self.cycles-pp.get_obj_cgroup_from_current
      0.34 ±  4%      -0.1        0.28 ±  9%  perf-profile.self.cycles-pp.sync_mm_rss
      0.39 ±  3%      -0.1        0.33 ±  8%  perf-profile.self.cycles-pp.downgrade_write
      0.17 ± 13%      -0.1        0.11 ± 21%  perf-profile.self.cycles-pp.cap_capable
      0.24 ±  3%      -0.0        0.20 ± 10%  perf-profile.self.cycles-pp.vmacache_find
      0.15 ±  7%      -0.0        0.11 ± 25%  perf-profile.self.cycles-pp.menu_select
      0.39 ±  3%      -0.0        0.34 ±  7%  perf-profile.self.cycles-pp.__vm_munmap
      0.08 ±  8%      -0.0        0.04 ± 73%  perf-profile.self.cycles-pp.common_mmap
      0.13 ± 11%      -0.0        0.09 ±  6%  perf-profile.self.cycles-pp.tlb_flush_mmu
      0.15 ±  6%      -0.0        0.12 ± 12%  perf-profile.self.cycles-pp.touch_atime
      0.13 ± 10%      -0.0        0.10 ± 10%  perf-profile.self.cycles-pp.remove_vma
      0.11 ± 11%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.timestamp_truncate
      0.18 ± 10%      -0.0        0.15 ±  8%  perf-profile.self.cycles-pp.may_expand_vm
      0.16 ±  4%      -0.0        0.13 ±  6%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.18 ±  5%      -0.0        0.15 ± 11%  perf-profile.self.cycles-pp.get_unmapped_area
      0.19 ±  6%      -0.0        0.16 ±  5%  perf-profile.self.cycles-pp.userfaultfd_unmap_complete
      0.13 ±  5%      -0.0        0.11 ± 10%  perf-profile.self.cycles-pp.prepend
      0.10 ±  7%      +0.0        0.13 ± 14%  perf-profile.self.cycles-pp.blocking_notifier_call_chain
     26.00 ±  4%     +11.3       37.32 ± 18%  perf-profile.self.cycles-pp.intel_idle


                                                                                
                            will-it-scale.per_process_ops                       
                                                                                
  585000 +------------------------------------------------------------------+   
         |                                              O O O OO O   O OO O |   
  580000 |-+                                                       O        |   
         |        O O OO O O O O O                                          |   
  575000 |-O OO O                                                           |   
         |                                     O O                          |   
  570000 |-+                      O O O O OO O                              |   
         |                                         OO                       |   
  565000 |-+                                          O                     |   
         |                  .+.   +.   .+.++.+.+.+.++.+.+.+.                |   
  560000 |-+           +.+.+   +.+  +.+                     +               |   
         |             :                                                    |   
  555000 |.+.++.+. .+.+                                                     |   
         |        +                                                         |   
  550000 +------------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.11.0-04580-ga10787e6d58c" of type "text/plain" (172553 bytes)

View attachment "job-script" of type "text/plain" (7803 bytes)

View attachment "job.yaml" of type "text/plain" (5143 bytes)

View attachment "reproduce" of type "text/plain" (337 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ