lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181102021408.GJ24195@shao2-debian>
Date:   Fri, 2 Nov 2018 10:14:08 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Joe Stringer <joe@...d.net.nz>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Alexei Starovoitov <ast@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>, lkp@...org
Subject: [LKP] [bpf]  fd978bf7fd:  will-it-scale.per_process_ops -4.0%
 regression

Greeting,

FYI, we noticed a -4.0% regression of will-it-scale.per_process_ops due to commit:


commit: fd978bf7fd312581a7ca454a991f0ffb34c4204b ("bpf: Add reference tracking to verifier")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 80 threads Skylake with 64G memory
with following parameters:

	nr_task: 100%
	mode: process
	test: mmap1
	cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale

In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -3.8% regression |
| test machine     | 80 threads Skylake with 64G memory                            |
| test parameters  | cpufreq_governor=performance                                  |
|                  | mode=process                                                  |
|                  | nr_task=100%                                                  |
|                  | test=mmap2                                                    |
+------------------+---------------------------------------------------------------+


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-7/performance/x86_64-rhel-7.2/process/100%/debian-x86_64-2018-04-03.cgz/lkp-skl-2sp2/mmap1/will-it-scale

commit: 
  84dbf35073 ("bpf: Macrofy stack state copy")
  fd978bf7fd ("bpf: Add reference tracking to verifier")

84dbf3507349696b fd978bf7fd312581a7ca454a99 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     16811            -4.0%      16140        will-it-scale.per_process_ops
   1344946            -4.0%    1291230        will-it-scale.workload
    107.75 ± 38%    +252.4%     379.75 ± 93%  cpuidle.POLL.usage
    121.70 ± 18%     +18.9%     144.70 ±  4%  sched_debug.cfs_rq:/.exec_clock.stddev
      4933            +2.0%       5031        proc-vmstat.nr_inactive_anon
      4933            +2.0%       5031        proc-vmstat.nr_zone_inactive_anon
      9874            +9.0%      10765 ±  7%  slabinfo.proc_inode_cache.active_objs
      9874            +9.0%      10765 ±  7%  slabinfo.proc_inode_cache.num_objs
     12248 ± 50%     +52.2%      18640 ±  2%  numa-meminfo.node0.Inactive
     33943 ±  8%     +16.2%      39453 ±  6%  numa-meminfo.node0.SReclaimable
     91549 ±  9%      -9.9%      82530 ±  7%  numa-meminfo.node1.Slab
     18091 ± 15%     +29.2%      23382 ± 17%  numa-vmstat.node0
      3027 ± 52%     +52.6%       4620 ±  3%  numa-vmstat.node0.nr_inactive_anon
      8485 ±  8%     +16.2%       9862 ±  6%  numa-vmstat.node0.nr_slab_reclaimable
      3027 ± 52%     +52.6%       4620 ±  3%  numa-vmstat.node0.nr_zone_inactive_anon
   1.4e+12            -2.5%  1.364e+12        perf-stat.branch-instructions
     41.42            +0.7       42.15        perf-stat.cache-miss-rate%
 2.166e+10            -2.1%   2.12e+10        perf-stat.cache-references
     12.16            +2.7%      12.49        perf-stat.cpi
 1.741e+12            -2.8%  1.692e+12        perf-stat.dTLB-loads
      0.00 ±  3%      +0.0        0.00 ±  9%  perf-stat.dTLB-store-miss-rate%
 5.713e+11            -3.9%   5.49e+11        perf-stat.dTLB-stores
 6.103e+12            -2.6%  5.943e+12        perf-stat.instructions
      0.08            -2.6%       0.08        perf-stat.ipc
 1.954e+09            -1.8%  1.919e+09        perf-stat.node-load-misses
   4538060            +1.4%    4602862        perf-stat.path-length
     49.62            -0.5       49.14        perf-profile.calltrace.cycles-pp.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     47.64            -0.5       47.17        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.do_munmap.vm_munmap.__x64_sys_munmap
     47.49            -0.5       47.02        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.do_munmap.vm_munmap
     49.99            -0.5       49.53        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     49.96            -0.5       49.51        perf-profile.calltrace.cycles-pp.vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     48.02            -0.4       47.58        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64
      1.41            -0.0        1.37        perf-profile.calltrace.cycles-pp.unmap_region.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64
     47.73            +0.4       48.11        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.__vm_enough_memory.mmap_region
     47.85            +0.4       48.25        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.__vm_enough_memory.mmap_region.do_mmap
     48.28            +0.4       48.68        perf-profile.calltrace.cycles-pp.__vm_enough_memory.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
     48.23            +0.4       48.63        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.mmap_region.do_mmap.vm_mmap_pgoff
     48.96            +0.4       49.41        perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
     49.11            +0.5       49.56        perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     49.24            +0.5       49.70        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     49.25            +0.5       49.72        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     49.62            -0.5       49.15        perf-profile.children.cycles-pp.do_munmap
     49.99            -0.5       49.53        perf-profile.children.cycles-pp.__x64_sys_munmap
     49.97            -0.5       49.51        perf-profile.children.cycles-pp.vm_munmap
      0.51 ±  2%      -0.0        0.46        perf-profile.children.cycles-pp.___might_sleep
      1.16            -0.0        1.11        perf-profile.children.cycles-pp.unmap_vmas
      1.15            -0.0        1.10        perf-profile.children.cycles-pp.unmap_page_range
      1.41            -0.0        1.37        perf-profile.children.cycles-pp.unmap_region
      0.32 ±  2%      +0.0        0.34 ±  2%  perf-profile.children.cycles-pp.up_write
      0.32 ±  2%      +0.0        0.34        perf-profile.children.cycles-pp.vm_area_alloc
      0.29            +0.0        0.32        perf-profile.children.cycles-pp.kmem_cache_alloc
     48.28            +0.4       48.68        perf-profile.children.cycles-pp.__vm_enough_memory
     48.96            +0.4       49.41        perf-profile.children.cycles-pp.mmap_region
     49.11            +0.5       49.56        perf-profile.children.cycles-pp.do_mmap
     49.25            +0.5       49.71        perf-profile.children.cycles-pp.vm_mmap_pgoff
     49.25            +0.5       49.72        perf-profile.children.cycles-pp.ksys_mmap_pgoff
      0.47 ±  3%      -0.0        0.43        perf-profile.self.cycles-pp.___might_sleep
      0.32 ±  3%      +0.0        0.34 ±  2%  perf-profile.self.cycles-pp.up_write
      0.27            +0.0        0.30        perf-profile.self.cycles-pp.kmem_cache_alloc
      0.49            +0.0        0.53        perf-profile.self.cycles-pp.percpu_counter_add_batch


                                                                                
                            will-it-scale.per_process_ops                       
                                                                                
  18000 +-+-----------------------------------------------------------------+   
        |                                                                   |   
  17500 +-+   +.+                                                           |   
        |+.+++  :            +.++++.+++           ++++.++++.++              |   
        |        :++. +      :        :          :            :             |   
  17000 +-+      +   + ++.++:          :  ++.+++ :            ++.+ ++.   +. |   
        |                   +          +.+      +                 +   +++  +|   
  16500 +-+                                                                 |   
        |              O  OOOO OOOO O O                                     |   
  16000 +-+           O O            O O O                                  |   
        |                                                                   |   
        O     O  OOO O                                                      |   
  15500 +O+OOO  O                                                           |   
        |                                                                   |   
  15000 +-+-----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                will-it-scale.workload                          
                                                                                
  1.42e+06 +-+--------------------------------------------------------------+   
   1.4e+06 +-+   ++                                                         |   
           |++.++ :                 ++.                   +++.+             |   
  1.38e+06 +-+     :           +.+++   ++          ++++.++    :             |   
  1.36e+06 +-+     +.+++++.    :         :         :           :+           |   
           |               ++++          ++.+++++.+            + ++.+++++.++|   
  1.34e+06 +-+                                                              |   
  1.32e+06 +-+                                                              |   
   1.3e+06 +-+               O                                              |   
           |            OO OO OO OOOOO  OOO                                 |   
  1.28e+06 +-+                         O                                    |   
  1.26e+06 +-+       O                                                      |   
           O     O O  OO                                                    |   
  1.24e+06 +OO OO O                                                         |   
  1.22e+06 +-+--------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample

***************************************************************************************************
lkp-skl-2sp2: 80 threads Skylake with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-7/performance/x86_64-rhel-7.2/process/100%/debian-x86_64-2018-04-03.cgz/lkp-skl-2sp2/mmap2/will-it-scale

commit: 
  84dbf35073 ("bpf: Macrofy stack state copy")
  fd978bf7fd ("bpf: Add reference tracking to verifier")

84dbf3507349696b fd978bf7fd312581a7ca454a99 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     16832            -3.8%      16186        will-it-scale.per_process_ops
   1346634            -3.8%    1294984        will-it-scale.workload
    390809 ± 21%     +51.6%     592424 ± 27%  cpuidle.C1.time
      6897            +2.7%       7085        proc-vmstat.nr_mapped
    936.00 ±  7%     +15.6%       1082 ±  5%  slabinfo.Acpi-ParseExt.active_objs
    936.00 ±  7%     +15.6%       1082 ±  5%  slabinfo.Acpi-ParseExt.num_objs
    968.00 ±  9%     +27.5%       1233 ± 16%  slabinfo.pool_workqueue.active_objs
    968.00 ±  9%     +29.7%       1255 ± 16%  slabinfo.pool_workqueue.num_objs
      8430           -14.1%       7244 ±  2%  numa-meminfo.node0.KernelStack
      4283 ± 14%     -22.4%       3325 ± 10%  numa-meminfo.node0.PageTables
     73929 ±  3%     -10.6%      66061 ±  6%  numa-meminfo.node0.SUnreclaim
      5569 ±  2%     +21.0%       6738 ±  3%  numa-meminfo.node1.KernelStack
     55085 ±  5%     +17.5%      64739 ±  5%  numa-meminfo.node1.SUnreclaim
     81155 ±  6%     +16.2%      94292 ±  7%  numa-meminfo.node1.Slab
    230.00          -100.0%       0.00        numa-vmstat.node0.nr_active_file
    100.25 ±  3%     -88.8%      11.25 ±173%  numa-vmstat.node0.nr_inactive_file
      8431           -14.1%       7245 ±  2%  numa-vmstat.node0.nr_kernel_stack
      1071 ± 14%     -22.4%     831.25 ± 10%  numa-vmstat.node0.nr_page_table_pages
     18482 ±  3%     -10.6%      16515 ±  6%  numa-vmstat.node0.nr_slab_unreclaimable
    230.00          -100.0%       0.00        numa-vmstat.node0.nr_zone_active_file
    100.25 ±  3%     -88.8%      11.25 ±173%  numa-vmstat.node0.nr_zone_inactive_file
      5569 ±  2%     +21.0%       6738 ±  3%  numa-vmstat.node1.nr_kernel_stack
      2778 ±  3%     +28.4%       3567 ± 16%  numa-vmstat.node1.nr_mapped
     13771 ±  5%     +17.5%      16184 ±  5%  numa-vmstat.node1.nr_slab_unreclaimable
 1.506e+12            -2.5%  1.468e+12        perf-stat.branch-instructions
     41.41            +0.8       42.20        perf-stat.cache-miss-rate%
 2.165e+10            -1.7%  2.129e+10        perf-stat.cache-references
     11.25            +2.8%      11.57        perf-stat.cpi
 1.891e+12            -2.8%  1.838e+12        perf-stat.dTLB-loads
 6.543e+11            -3.7%    6.3e+11        perf-stat.dTLB-stores
 6.591e+12            -2.6%  6.419e+12        perf-stat.instructions
      0.09            -2.7%       0.09        perf-stat.ipc
 1.967e+09            -1.3%  1.941e+09        perf-stat.node-load-misses
   4894750            +1.3%    4956596        perf-stat.path-length
     40.37 ± 12%     -16.2%      33.81 ±  7%  sched_debug.cfs_rq:/.load_avg.stddev
      0.05 ±  2%     +18.7%       0.06 ±  3%  sched_debug.cfs_rq:/.nr_running.stddev
      6.37 ± 40%     -50.2%       3.17 ± 32%  sched_debug.cfs_rq:/.removed.load_avg.avg
     31.64 ± 18%     -28.5%      22.63 ± 16%  sched_debug.cfs_rq:/.removed.load_avg.stddev
    293.89 ± 40%     -50.1%     146.61 ± 32%  sched_debug.cfs_rq:/.removed.runnable_sum.avg
      1459 ± 18%     -28.3%       1045 ± 16%  sched_debug.cfs_rq:/.removed.runnable_sum.stddev
      2.46 ± 43%     -60.9%       0.96 ± 66%  sched_debug.cfs_rq:/.removed.util_avg.avg
     12.42 ± 26%     -46.5%       6.64 ± 59%  sched_debug.cfs_rq:/.removed.util_avg.stddev
    385.92 ±  6%     +12.8%     435.46 ±  2%  sched_debug.cpu.nr_switches.min
    -14.21           -31.4%      -9.75        sched_debug.cpu.nr_uninterruptible.min
     47.54            -0.2       47.31        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.do_munmap.vm_munmap
     47.67            -0.2       47.45        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.do_munmap.vm_munmap.__x64_sys_munmap
     48.04            -0.2       47.86        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64
     99.36            -0.0       99.34        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.47            +0.0        1.51        perf-profile.calltrace.cycles-pp.unmap_region.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64
     94.77            -0.3       94.52        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     95.04            -0.2       94.81        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     95.77            -0.2       95.60        perf-profile.children.cycles-pp.percpu_counter_add_batch
     49.72            -0.1       49.58        perf-profile.children.cycles-pp.do_munmap
      0.53 ±  2%      -0.1        0.47        perf-profile.children.cycles-pp.___might_sleep
      0.30 ±  2%      +0.0        0.33        perf-profile.children.cycles-pp.perf_event_mmap
      0.30 ±  3%      +0.0        0.33 ±  2%  perf-profile.children.cycles-pp.vm_area_alloc
      0.33 ±  2%      +0.0        0.36 ±  2%  perf-profile.children.cycles-pp.up_write
      1.48            +0.0        1.51        perf-profile.children.cycles-pp.unmap_region
     94.77            -0.3       94.52        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.48 ±  2%      -0.0        0.44        perf-profile.self.cycles-pp.___might_sleep
      0.33 ±  2%      +0.0        0.36 ±  2%  perf-profile.self.cycles-pp.up_write
      0.53            +0.0        0.57        perf-profile.self.cycles-pp.unmap_page_range
      0.47            +0.0        0.52 ±  2%  perf-profile.self.cycles-pp.percpu_counter_add_batch





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen

View attachment "config-4.19.0-rc5-00874-gfd978bf7" of type "text/plain" (167752 bytes)

View attachment "job.yaml" of type "text/plain" (4844 bytes)

View attachment "reproduce" of type "text/plain" (309 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ