lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20200620143735.GF5535@shao2-debian>
Date:   Sat, 20 Jun 2020 22:37:35 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Stanislav Fomichev <sdf@...gle.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andriin@...com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: [bpf] 6890896bd7: will-it-scale.per_process_ops -5.2% regression

Greeting,

FYI, we noticed a -5.2% regression of will-it-scale.per_process_ops due to commit:


commit: 6890896bd765b0504761c61901c9804fca23bfb2 ("bpf: Fix missing bpf_base_func_proto in cgroup_base_func_proto for CGROUP_NET=n")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 8G memory
with following parameters:

	nr_task: 16
	mode: process
	test: mmap2
	cpufreq_governor: performance
	ucode: 0x21

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <rong.a.chen@...el.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-7.6/process/16/debian-x86_64-20191114.cgz/lkp-ivb-d02/mmap2/will-it-scale/0x21

commit: 
  745abfaa9e ("bpf, riscv: Fix tail call count off by one in RV32 BPF JIT")
  6890896bd7 ("bpf: Fix missing bpf_base_func_proto in cgroup_base_func_proto for CGROUP_NET=n")

745abfaa9eafa597 6890896bd765b0504761c61901c 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     63928            -5.2%      60630        will-it-scale.per_process_ops
   1022864            -5.2%     970096        will-it-scale.workload
     63.20            +2.4%      64.71        boot-time.idle
   2439913 ± 19%     +35.0%    3294976 ± 10%  cpuidle.C6.time
    202.25 ± 30%     -31.0%     139.50        slabinfo.fsnotify_mark_connector.active_objs
     16.38 ± 19%     +45.3%      23.79 ± 10%  sched_debug.cfs_rq:/.nr_spread_over.max
      6.61 ± 36%     +53.3%      10.13 ± 12%  sched_debug.cfs_rq:/.nr_spread_over.stddev
      4230 ±125%    +371.5%      19944 ± 64%  sched_debug.cfs_rq:/.spread0.max
    238.04 ± 13%     -44.2%     132.83 ± 28%  sched_debug.cfs_rq:/.util_est_enqueued.min
     59416 ±  4%      -9.2%      53953 ±  2%  sched_debug.cpu.sched_count.min
     16080 ± 11%     -21.6%      12609 ±  7%  sched_debug.cpu.ttwu_count.min
     13342 ± 10%     -24.3%      10099 ±  6%  sched_debug.cpu.ttwu_local.min
 2.661e+09            -5.1%  2.525e+09        perf-stat.i.branch-instructions
  14914488            -4.1%   14298109        perf-stat.i.branch-misses
      4.04            +0.4        4.46 ±  3%  perf-stat.i.cache-miss-rate%
   4438980            -5.8%    4183284        perf-stat.i.cache-references
      1.15            +5.4%       1.21        perf-stat.i.cpi
 3.355e+09            -4.8%  3.194e+09        perf-stat.i.dTLB-loads
   4442441 ±  4%      -9.6%    4014009        perf-stat.i.dTLB-store-misses
 1.579e+09            -4.9%  1.502e+09        perf-stat.i.dTLB-stores
     45.02           +12.4       57.47        perf-stat.i.iTLB-load-miss-rate%
   1224412 ±  3%      -3.7%    1178714        perf-stat.i.iTLB-load-misses
   1500623           -41.9%     872114        perf-stat.i.iTLB-loads
 1.142e+10            -5.1%  1.083e+10        perf-stat.i.instructions
      0.87            -5.1%       0.83        perf-stat.i.ipc
      0.91 ±  2%      -2.5%       0.89        perf-stat.i.metric.K/sec
      1901            -4.9%       1807        perf-stat.i.metric.M/sec
      4.32 ±  2%      +0.3        4.64 ±  3%  perf-stat.overall.cache-miss-rate%
      1.15            +5.4%       1.21        perf-stat.overall.cpi
     44.92           +12.5       57.47        perf-stat.overall.iTLB-load-miss-rate%
      0.87            -5.1%       0.83        perf-stat.overall.ipc
 2.652e+09            -5.1%  2.517e+09        perf-stat.ps.branch-instructions
  14865210            -4.1%   14250821        perf-stat.ps.branch-misses
   4424338            -5.8%    4169457        perf-stat.ps.cache-references
 3.344e+09            -4.8%  3.183e+09        perf-stat.ps.dTLB-loads
   4427694 ±  4%      -9.6%    4000716        perf-stat.ps.dTLB-store-misses
 1.574e+09            -4.9%  1.497e+09        perf-stat.ps.dTLB-stores
   1220351 ±  3%      -3.7%    1174812        perf-stat.ps.iTLB-load-misses
   1495642           -41.9%     869227        perf-stat.ps.iTLB-loads
 1.138e+10            -5.1%   1.08e+10        perf-stat.ps.instructions
 3.437e+12            -5.1%  3.262e+12        perf-stat.total.instructions
      3.19 ± 13%      -0.9        2.25 ±  3%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     23.75            -0.8       22.98        perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     41.40            -0.8       40.63        perf-profile.calltrace.cycles-pp.mmap64
     18.86            -0.6       18.21        perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
     27.02            -0.6       26.40        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
     26.71            -0.5       26.17        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
      0.74 ±  3%      -0.1        0.62 ±  8%  perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
      2.66 ±  2%      -0.1        2.55 ±  2%  perf-profile.calltrace.cycles-pp.d_path.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
      2.60            -0.1        2.50        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.munmap
      1.96            -0.1        1.89 ±  2%  perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.shmem_get_unmapped_area.get_unmapped_area.do_mmap.vm_mmap_pgoff
      0.90 ±  5%      +0.1        0.98 ±  4%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap
      1.04 ±  9%      +0.1        1.19 ±  6%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_trace.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
      0.70 ±  6%      +0.3        1.01 ± 12%  perf-profile.calltrace.cycles-pp.kmem_cache_free.remove_vma.__do_munmap.__vm_munmap.__x64_sys_munmap
      1.40 ±  6%      +0.4        1.84 ±  5%  perf-profile.calltrace.cycles-pp.remove_vma.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      0.27 ±100%      +0.5        0.73 ± 16%  perf-profile.calltrace.cycles-pp.up_read.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     44.30            +0.5       44.77        perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.82 ±  2%      +0.6        5.40 ±  3%  perf-profile.calltrace.cycles-pp.free_pgd_range.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
      4.42 ±  2%      +0.6        5.02 ±  3%  perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.unmap_region.__do_munmap.__vm_munmap
     58.12            +0.8       58.91        perf-profile.calltrace.cycles-pp.munmap
     45.84            +0.8       46.64        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
     46.34            +0.9       47.21        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
     51.57            +0.9       52.47        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.munmap
     51.28            +0.9       52.21        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
     41.75            -0.8       40.95        perf-profile.children.cycles-pp.mmap64
     23.84            -0.8       23.05        perf-profile.children.cycles-pp.do_mmap
     18.98            -0.6       18.35        perf-profile.children.cycles-pp.mmap_region
     27.09            -0.6       26.48        perf-profile.children.cycles-pp.vm_mmap_pgoff
      1.41 ± 10%      -0.5        0.88 ±  4%  perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      4.18 ±  2%      -0.4        3.77 ±  3%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      5.20            -0.2        5.02        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.56 ±  3%      -0.1        0.44 ± 13%  perf-profile.children.cycles-pp.cap_vm_enough_memory
      2.69            -0.1        2.57 ±  2%  perf-profile.children.cycles-pp.d_path
      0.75 ±  4%      -0.1        0.64 ±  7%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      0.38 ±  6%      -0.1        0.28 ± 30%  perf-profile.children.cycles-pp.security_mmap_addr
      0.27 ± 13%      -0.1        0.18 ± 26%  perf-profile.children.cycles-pp.may_expand_vm
      0.20 ±  9%      -0.1        0.14 ± 16%  perf-profile.children.cycles-pp.cap_capable
      0.21 ±  7%      +0.1        0.26 ±  9%  perf-profile.children.cycles-pp.cap_mmap_file
      0.00            +0.1        0.05 ±  9%  perf-profile.children.cycles-pp.profile_munmap
      0.45            +0.1        0.51 ±  5%  perf-profile.children.cycles-pp.lru_add_drain
      0.83 ±  3%      +0.1        0.97 ±  4%  perf-profile.children.cycles-pp.__might_sleep
      0.43 ±  2%      +0.1        0.57 ±  7%  perf-profile.children.cycles-pp.fpregs_assert_state_consistent
      1.10 ±  8%      +0.2        1.26 ±  7%  perf-profile.children.cycles-pp.kmem_cache_alloc_trace
      0.52 ± 10%      +0.3        0.78 ± 16%  perf-profile.children.cycles-pp.up_read
      0.70 ±  6%      +0.3        1.02 ± 12%  perf-profile.children.cycles-pp.kmem_cache_free
      1.44 ±  6%      +0.4        1.89 ±  5%  perf-profile.children.cycles-pp.remove_vma
     44.38            +0.5       44.86        perf-profile.children.cycles-pp.__do_munmap
     86.47            +0.5       86.96        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     85.91            +0.5       86.41        perf-profile.children.cycles-pp.do_syscall_64
      4.86 ±  2%      +0.6        5.42 ±  3%  perf-profile.children.cycles-pp.free_pgd_range
      4.43 ±  2%      +0.6        5.02 ±  3%  perf-profile.children.cycles-pp.free_p4d_range
     58.50            +0.8       59.29        perf-profile.children.cycles-pp.munmap
     45.89            +0.8       46.71        perf-profile.children.cycles-pp.__vm_munmap
     46.38            +0.9       47.27        perf-profile.children.cycles-pp.__x64_sys_munmap
      1.37 ± 10%      -0.5        0.84 ±  3%  perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
      0.26 ± 12%      -0.1        0.16 ± 25%  perf-profile.self.cycles-pp.may_expand_vm
      0.19 ± 10%      -0.1        0.13 ± 21%  perf-profile.self.cycles-pp.cap_capable
      0.11 ±  9%      -0.0        0.09 ±  7%  perf-profile.self.cycles-pp.security_mmap_addr
      0.07 ± 12%      +0.0        0.10 ± 11%  perf-profile.self.cycles-pp.lru_add_drain
      0.23 ±  7%      +0.0        0.27 ±  5%  perf-profile.self.cycles-pp.userfaultfd_unmap_prep
      0.19 ± 10%      +0.0        0.23 ± 11%  perf-profile.self.cycles-pp.cap_mmap_file
      0.75 ±  3%      +0.1        0.86 ±  4%  perf-profile.self.cycles-pp.__might_sleep
      0.41 ±  2%      +0.1        0.53 ± 10%  perf-profile.self.cycles-pp.fpregs_assert_state_consistent
      1.23 ±  2%      +0.2        1.40 ±  2%  perf-profile.self.cycles-pp.__do_munmap
      0.63 ± 17%      +0.2        0.83 ± 16%  perf-profile.self.cycles-pp.common_file_perm
      0.50 ± 10%      +0.3        0.75 ± 18%  perf-profile.self.cycles-pp.up_read
      0.69 ±  6%      +0.3        1.01 ± 13%  perf-profile.self.cycles-pp.kmem_cache_free
      4.40 ±  2%      +0.6        4.98 ±  3%  perf-profile.self.cycles-pp.free_p4d_range


                                                                                
                            will-it-scale.per_process_ops                       
                                                                                
  64500 +-------------------------------------------------------------------+   
        |            +.+                                .+                  |   
  64000 |-+.+.+..   +   +                     +.  .+.+.+                    |   
  63500 |.+      +.+     +.+.+.  .+. .+.     :  +.                          |   
        |                      +.   +   +.+. :                              |   
  63000 |-+                                 +                               |   
  62500 |-+                       O   O                          O          |   
        |                      O    O                               O       |   
  62000 |-+          O                                                      |   
  61500 |-O O O            O                                                |   
        |        O O   O     O                           O O                |   
  61000 |-+              O                           O                      |   
  60500 |-+                             O   O O        O     O O      O O O |   
        |                                 O     O  O                        |   
  60000 +-------------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


View attachment "config-5.7.0-rc2-00635-g6890896bd765b0" of type "text/plain" (202662 bytes)

View attachment "job-script" of type "text/plain" (7675 bytes)

View attachment "job.yaml" of type "text/plain" (5293 bytes)

View attachment "reproduce" of type "text/plain" (337 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ