lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 6 Aug 2019 15:05:47 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Minchan Kim <minchan@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>, Minchan Kim <minchan@...nel.org>,
        Miguel de Dios <migueldedios@...gle.com>,
        Wei Wang <wvw@...gle.com>, Michal Hocko <mhocko@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...hsingularity.net>, lkp@...org
Subject: [mm]  755d6edc1a:  will-it-scale.per_process_ops -4.1% regression

Greeting,

FYI, we noticed a -4.1% regression of will-it-scale.per_process_ops due to commit:


commit: 755d6edc1aee4489c90975ec093d724d5492cecd ("[PATCH] mm: release the spinlock on zap_pte_range")
url: https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-release-the-spinlock-on-zap_pte_range/20190730-010638


in testcase: will-it-scale
on test machine: 8 threads Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz with 16G memory
with following parameters:

	nr_task: 100%
	mode: process
	test: malloc1
	cpufreq_governor: performance
	ucode: 0x21

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-2019-05-14.cgz/lkp-ivb-d01/malloc1/will-it-scale/0x21

commit: 
  v5.3-rc2
  755d6edc1a ("mm: release the spinlock on zap_pte_range")

        v5.3-rc2 755d6edc1aee4489c90975ec093 
---------------- --------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
          1:5          -20%            :4     dmesg.RIP:__d_lookup_rcu
          1:5          -20%            :4     dmesg.RIP:mnt_drop_write
           :5           20%           1:4     kmsg.ab33a8>]usb_hcd_irq
           :5           20%           1:4     kmsg.b445f28>]usb_hcd_irq
           :5           20%           1:4     kmsg.cdf63ef>]usb_hcd_irq
          1:5          -20%            :4     kmsg.d4af11>]usb_hcd_irq
          1:5          -20%            :4     kmsg.d9>]usb_hcd_irq
           :5           20%           1:4     kmsg.f805d78>]usb_hcd_irq
          5:5           -7%           4:4     perf-profile.calltrace.cycles-pp.error_entry
          7:5          -39%           5:4     perf-profile.children.cycles-pp.error_entry
          0:5           -1%           0:4     perf-profile.children.cycles-pp.error_exit
          5:5          -30%           4:4     perf-profile.self.cycles-pp.error_entry
         %stddev     %change         %stddev
             \          |                \  
    119757            -4.1%     114839        will-it-scale.per_process_ops
    958059            -4.1%     918718        will-it-scale.workload
      2429 ± 16%     -34.5%       1591 ± 32%  cpuidle.C1.usage
      0.97 ± 88%      -0.7        0.26        mpstat.cpu.all.idle%
     78.40            +2.0%      80.00        vmstat.cpu.sy
     45.42            +2.1%      46.38        turbostat.CorWatt
     50.46            +2.0%      51.45        turbostat.PkgWatt
      6641 ±  4%      +8.6%       7215 ±  8%  slabinfo.anon_vma_chain.num_objs
      1327 ±  3%     +23.0%       1632 ±  5%  slabinfo.kmalloc-96.active_objs
      1327 ±  3%     +23.0%       1632 ±  5%  slabinfo.kmalloc-96.num_objs
      1235 ± 30%     +37.7%       1700 ± 18%  interrupts.29:PCI-MSI.409600-edge.eth0
      4361 ± 81%    +149.4%      10876 ± 32%  interrupts.CPU0.NMI:Non-maskable_interrupts
      4361 ± 81%    +149.4%      10876 ± 32%  interrupts.CPU0.PMI:Performance_monitoring_interrupts
      1235 ± 30%     +37.7%       1700 ± 18%  interrupts.CPU7.29:PCI-MSI.409600-edge.eth0
     93196            +9.1%     101723 ±  6%  sched_debug.cfs_rq:/.load.min
     15.37 ± 11%     +13.6%      17.46 ±  3%  sched_debug.cfs_rq:/.nr_spread_over.max
      5.01 ± 11%     +14.5%       5.74 ±  4%  sched_debug.cfs_rq:/.nr_spread_over.stddev
     53.80 ± 15%     +41.6%      76.21 ±  7%  sched_debug.cfs_rq:/.util_avg.stddev
     60098            +1.6%      61056        proc-vmstat.nr_active_anon
      6867            -1.2%       6781        proc-vmstat.nr_slab_unreclaimable
     60098            +1.6%      61056        proc-vmstat.nr_zone_active_anon
 5.757e+08            -4.2%  5.517e+08        proc-vmstat.numa_hit
 5.757e+08            -4.2%  5.517e+08        proc-vmstat.numa_local
 5.758e+08            -4.1%   5.52e+08        proc-vmstat.pgalloc_normal
 2.881e+08            -4.1%  2.762e+08        proc-vmstat.pgfault
 5.758e+08            -4.1%   5.52e+08        proc-vmstat.pgfree
 2.861e+09 ± 41%     +41.1%  4.038e+09        perf-stat.i.branch-instructions
  41921318 ± 38%     +34.9%   56552695 ±  2%  perf-stat.i.cache-references
 2.173e+10 ± 41%     +34.9%  2.931e+10        perf-stat.i.cpu-cycles
  2.26e+09 ± 41%     +41.3%  3.194e+09        perf-stat.i.dTLB-stores
     57813 ± 26%     +66.7%      96370 ±  6%  perf-stat.i.iTLB-loads
 1.365e+10 ± 41%     +37.9%  1.882e+10        perf-stat.i.instructions
    661.20 ± 40%     +45.4%     961.52        perf-stat.i.instructions-per-iTLB-miss
      0.47 ± 41%     +37.3%       0.64        perf-stat.i.ipc
    948620            -3.5%     915067        perf-stat.i.minor-faults
    948620            -3.5%     915067        perf-stat.i.page-faults
      0.51 ±  7%      -0.1        0.45        perf-stat.overall.branch-miss-rate%
      1.59            -2.4%       1.56        perf-stat.overall.cpi
      0.38            -0.0        0.35 ±  2%  perf-stat.overall.dTLB-store-miss-rate%
    875.11            +8.7%     950.89        perf-stat.overall.instructions-per-iTLB-miss
      0.63            +2.4%       0.64        perf-stat.overall.ipc
   4337585 ± 41%     +42.3%    6173557        perf-stat.overall.path-length
 2.855e+09 ± 41%     +41.0%  4.028e+09        perf-stat.ps.branch-instructions
  41833739 ± 38%     +34.8%   56408902 ±  2%  perf-stat.ps.cache-references
 2.255e+09 ± 41%     +41.2%  3.186e+09        perf-stat.ps.dTLB-stores
     57677 ± 26%     +66.7%      96124 ±  6%  perf-stat.ps.iTLB-loads
 1.362e+10 ± 41%     +37.8%  1.877e+10        perf-stat.ps.instructions
    946368            -3.6%     912714        perf-stat.ps.minor-faults
    946368            -3.6%     912714        perf-stat.ps.page-faults
 4.155e+12 ± 41%     +36.5%  5.672e+12        perf-stat.total.instructions
     20.10            -0.7       19.42        perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
     17.83            -0.7       17.17        perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
      5.47 ±  2%      -0.5        4.92 ±  4%  perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap
      5.75 ±  2%      -0.5        5.20 ±  4%  perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
      5.69 ±  2%      -0.5        5.17 ±  4%  perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap
      2.61            -0.5        2.16 ± 12%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region
      2.09 ±  2%      -0.4        1.67 ± 15%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain
      2.81 ±  2%      -0.2        2.56 ±  2%  perf-profile.calltrace.cycles-pp.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault.__do_page_fault
      2.62 ±  2%      -0.2        2.45 ±  2%  perf-profile.calltrace.cycles-pp.flush_tlb_func_common.flush_tlb_mm_range.tlb_flush_mmu.tlb_finish_mmu.unmap_region
      1.89 ±  2%      -0.2        1.73        perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.unmap_region.__do_munmap.__vm_munmap
      3.05 ±  2%      -0.1        2.91        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap
      1.07 ±  3%      -0.1        0.95 ±  2%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      0.91 ±  3%      -0.1        0.84 ±  4%  perf-profile.calltrace.cycles-pp.native_flush_tlb.flush_tlb_func_common.flush_tlb_mm_range.tlb_flush_mmu.tlb_finish_mmu
      1.94 ±  3%      +0.1        2.06        perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      1.31 ±  8%      +0.1        1.45        perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
      0.31 ± 81%      +0.2        0.54 ±  3%  perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault.__do_page_fault
      2.27 ± 50%      +0.7        2.97 ±  3%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
     43.67            +2.4       46.10        perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     39.41 ±  2%      +2.7       42.07        perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     18.28 ±  2%      +3.7       21.95        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
     17.43 ±  2%      +3.7       21.12        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
     35.89 ± 50%     +11.0       46.92        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     36.13 ± 50%     +11.1       47.22        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     51.68 ± 50%     +14.5       66.17        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
     51.90 ± 50%     +14.5       66.42        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
     17.89            -0.7       17.20        perf-profile.children.cycles-pp.handle_mm_fault
     20.13            -0.7       19.45        perf-profile.children.cycles-pp.__do_page_fault
      5.25 ±  2%      -0.6        4.62 ±  8%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      5.50 ±  2%      -0.6        4.95 ±  4%  perf-profile.children.cycles-pp.pagevec_lru_move_fn
      5.93 ±  2%      -0.5        5.39 ±  4%  perf-profile.children.cycles-pp.lru_add_drain
      5.86 ±  2%      -0.5        5.33 ±  4%  perf-profile.children.cycles-pp.lru_add_drain_cpu
      2.80 ±  2%      -0.3        2.55 ±  3%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      2.86 ±  2%      -0.3        2.60 ±  2%  perf-profile.children.cycles-pp.__anon_vma_prepare
      1.92 ±  3%      -0.2        1.75        perf-profile.children.cycles-pp.unlink_anon_vmas
      1.88 ±  4%      -0.2        1.72 ±  2%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      2.03 ±  3%      -0.2        1.88        perf-profile.children.cycles-pp.free_pgtables
      3.06 ±  2%      -0.1        2.92        perf-profile.children.cycles-pp.flush_tlb_mm_range
      0.89 ±  5%      -0.1        0.76 ±  6%  perf-profile.children.cycles-pp.__might_sleep
      1.58 ±  2%      -0.1        1.45        perf-profile.children.cycles-pp.native_flush_tlb
      1.97            -0.1        1.85 ±  2%  perf-profile.children.cycles-pp.flush_tlb_func_common
      0.41 ±  8%      -0.1        0.32 ±  8%  perf-profile.children.cycles-pp.___pte_free_tlb
      0.10 ± 14%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.should_fail_alloc_page
      0.55 ±  3%      -0.1        0.49 ±  4%  perf-profile.children.cycles-pp.down_write
      0.10 ± 19%      -0.1        0.05 ± 58%  perf-profile.children.cycles-pp.should_failslab
      0.28 ± 10%      -0.0        0.23        perf-profile.children.cycles-pp.anon_vma_interval_tree_remove
      0.11 ± 19%      -0.0        0.07 ±  7%  perf-profile.children.cycles-pp.policy_nodemask
      0.10 ± 11%      -0.0        0.06 ± 14%  perf-profile.children.cycles-pp.__vma_link_file
      0.11 ±  9%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.anon_vma_chain_link
      0.13 ±  8%      -0.0        0.11 ±  4%  perf-profile.children.cycles-pp.try_charge
      0.18 ±  6%      -0.0        0.16 ±  5%  perf-profile.children.cycles-pp.inc_zone_page_state
      0.14 ±  2%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.anon_vma_interval_tree_insert
      0.10 ± 17%      +0.0        0.14 ±  7%  perf-profile.children.cycles-pp.strlen
      0.52 ±  2%      +0.0        0.56 ±  3%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      0.17 ± 16%      +0.0        0.21 ±  6%  perf-profile.children.cycles-pp.uncharge_page
      0.08 ± 16%      +0.0        0.13 ±  7%  perf-profile.children.cycles-pp.__vma_link_list
      0.26 ±  6%      +0.1        0.31 ±  6%  perf-profile.children.cycles-pp.mem_cgroup_charge_statistics
      0.00            +0.1        0.06 ± 22%  perf-profile.children.cycles-pp.__get_vma_policy
      0.13 ±  9%      +0.1        0.19 ±  9%  perf-profile.children.cycles-pp.vma_merge
      0.02 ±122%      +0.1        0.09 ± 11%  perf-profile.children.cycles-pp.kthread_blkcg
      0.25 ± 11%      +0.1        0.33 ±  6%  perf-profile.children.cycles-pp.get_task_policy
      0.00            +0.1        0.08 ±  5%  perf-profile.children.cycles-pp.memcpy
      0.25 ±  9%      +0.1        0.35 ±  2%  perf-profile.children.cycles-pp.memcpy_erms
      1.97 ±  2%      +0.1        2.09        perf-profile.children.cycles-pp.get_unmapped_area
      1.34 ±  7%      +0.1        1.47 ±  2%  perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      0.38 ±  5%      +0.1        0.52 ±  5%  perf-profile.children.cycles-pp.alloc_pages_current
      3.08 ±  2%      +0.2        3.24 ±  2%  perf-profile.children.cycles-pp.syscall_return_via_sysret
     64.46            +2.0       66.45        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     64.19            +2.0       66.19        perf-profile.children.cycles-pp.do_syscall_64
     43.77            +2.4       46.18        perf-profile.children.cycles-pp.__do_munmap
     44.49            +2.5       46.95        perf-profile.children.cycles-pp.__vm_munmap
     44.77            +2.5       47.24        perf-profile.children.cycles-pp.__x64_sys_munmap
     39.43 ±  2%      +2.7       42.10        perf-profile.children.cycles-pp.unmap_region
     18.07 ±  2%      +3.7       21.73        perf-profile.children.cycles-pp.unmap_page_range
     18.29 ±  2%      +3.7       21.97        perf-profile.children.cycles-pp.unmap_vmas
      6.02 ±  3%      -0.5        5.57 ±  3%  perf-profile.self.cycles-pp.do_syscall_64
      1.73            -0.1        1.59        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      1.56 ±  2%      -0.1        1.44        perf-profile.self.cycles-pp.native_flush_tlb
      0.34 ± 11%      -0.1        0.24 ±  7%  perf-profile.self.cycles-pp.strlcpy
      0.57 ±  5%      -0.1        0.49 ±  6%  perf-profile.self.cycles-pp.unlink_anon_vmas
      0.68 ±  4%      -0.1        0.60 ±  8%  perf-profile.self.cycles-pp._raw_spin_lock
      0.37 ±  5%      -0.1        0.31 ±  6%  perf-profile.self.cycles-pp.cpumask_any_but
      0.42 ±  7%      -0.1        0.36 ±  6%  perf-profile.self.cycles-pp.handle_mm_fault
      0.23 ±  7%      -0.1        0.18 ±  4%  perf-profile.self.cycles-pp.__perf_sw_event
      0.10 ± 23%      -0.0        0.06 ±  9%  perf-profile.self.cycles-pp.policy_nodemask
      0.09 ± 11%      -0.0        0.04 ± 59%  perf-profile.self.cycles-pp.__vma_link_file
      0.13 ±  6%      -0.0        0.10 ±  8%  perf-profile.self.cycles-pp.try_charge
      0.14 ±  2%      -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
      0.10 ± 15%      +0.0        0.11 ±  4%  perf-profile.self.cycles-pp.strlen
      0.09 ± 17%      +0.0        0.12 ±  5%  perf-profile.self.cycles-pp.memcg_check_events
      0.07 ± 19%      +0.0        0.10 ±  7%  perf-profile.self.cycles-pp.__vma_link_list
      0.16 ± 16%      +0.0        0.20 ±  5%  perf-profile.self.cycles-pp.uncharge_page
      0.24 ±  7%      +0.0        0.28 ±  2%  perf-profile.self.cycles-pp.memcpy_erms
      0.04 ± 53%      +0.0        0.09 ±  8%  perf-profile.self.cycles-pp.do_page_fault
      0.42 ±  9%      +0.1        0.48 ±  7%  perf-profile.self.cycles-pp.find_next_bit
      0.13 ± 10%      +0.1        0.19 ±  8%  perf-profile.self.cycles-pp.vma_merge
      0.02 ±122%      +0.1        0.09 ± 11%  perf-profile.self.cycles-pp.kthread_blkcg
      0.25 ± 10%      +0.1        0.32 ±  7%  perf-profile.self.cycles-pp.get_task_policy
      0.00            +0.1        0.08 ±  6%  perf-profile.self.cycles-pp.memcpy
      0.14 ±  5%      +0.1        0.25 ± 15%  perf-profile.self.cycles-pp.alloc_pages_current
      3.08 ±  2%      +0.2        3.23 ±  2%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.43 ± 10%      +0.2        0.58 ±  6%  perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown
     11.00 ±  2%      +3.6       14.56        perf-profile.self.cycles-pp.unmap_page_range


                                                                                
                            will-it-scale.per_process_ops                       
                                                                                
  120000 +-+----------------------------------------------------------------+   
         |  +.      +.    +..         ..   +       +.   +.       +          |   
  119000 +-+                 +.+..+..+                                      |   
  118000 +-+                                                                |   
         |                                                                  |   
  117000 +-+                                                                |   
         |       O                                                          |   
  116000 O-+O  O    O          O                                            |   
         |             O  O  O                          O           O       |   
  115000 +-+                      O  O                     O  O  O    O  O  O   
  114000 +-+                                                                |   
         |                              O  O O  O     O                     |   
  113000 +-+                                       O                        |   
         |                                                                  |   
  112000 +-+----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


View attachment "config-5.3.0-rc2-00001-g755d6edc1aee44" of type "text/plain" (199591 bytes)

View attachment "job-script" of type "text/plain" (7364 bytes)

View attachment "job.yaml" of type "text/plain" (4989 bytes)

View attachment "reproduce" of type "text/plain" (310 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ