lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20181009093534.GB13396@shao2-debian>
Date:   Tue, 9 Oct 2018 17:35:34 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...riel.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Jirka Hladky <jhladky@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [LKP] [mm, sched/numa]  efaffc5e40:
 perf-bench-numa-mem.GB_per_thread 38.7% improvement

Greeting,

FYI, we noticed a 38.7% improvement of perf-bench-numa-mem.GB_per_thread due to commit:


commit: efaffc5e40aeced0bcb497ed7a0a5b8c14abfcdf ("mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: perf-bench-numa-mem
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory
with following parameters:

	nr_threads: 2t
	mem_proc: 300M
	cpufreq_governor: performance
	ucode: 0x42d




Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mem_proc/nr_threads/rootfs/tbox_group/testcase/ucode:
  gcc-7/performance/x86_64-rhel-7.2/300M/2t/debian-x86_64-2018-04-03.cgz/ivb44/perf-bench-numa-mem/0x42d

commit: 
  6fd98e775f ("sched/numa: Avoid task migration for small NUMA improvement")
  efaffc5e40 ("mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration")

6fd98e775f24fd41 efaffc5e40aeced0bcb497ed7a 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.85 ±  5%     +38.7%       1.18 ±  6%  perf-bench-numa-mem.GB_per_thread
      0.15           +36.6%       0.20 ±  5%  perf-bench-numa-mem.GB_sec_thread
     14.04           +36.6%      19.18 ±  6%  perf-bench-numa-mem.GB_sec_total
     81.51 ±  5%     +38.7%     113.07 ±  6%  perf-bench-numa-mem.GB_total
      6.84           -26.5%       5.02 ±  6%  perf-bench-numa-mem.nsecs_byte_thread
     34.74 ±  5%     +13.4%      39.39 ±  5%  perf-bench-numa-mem.time.system_time
      1799 ±  6%     +11.6%       2008 ±  5%  perf-bench-numa-mem.time.voluntary_context_switches
     53165 ±  4%     +22.2%      64991 ± 18%  interrupts.CAL:Function_call_interrupts
     91155            +2.0%      92949        vmstat.system.in
    410.00 ±  4%     +34.1%     550.00 ±  7%  slabinfo.file_lock_cache.active_objs
    410.00 ±  4%     +34.1%     550.00 ±  7%  slabinfo.file_lock_cache.num_objs
     17863 ±  5%     -14.8%      15221 ± 11%  numa-meminfo.node0.Mapped
    457399 ± 33%    +120.0%    1006181 ± 35%  numa-meminfo.node1.Active
    457399 ± 33%    +120.0%    1006091 ± 35%  numa-meminfo.node1.Active(anon)
    308898 ± 35%    +145.0%     756724 ± 36%  numa-meminfo.node1.AnonHugePages
    456984 ± 33%    +120.1%    1006048 ± 35%  numa-meminfo.node1.AnonPages
   1101857 ± 14%     +51.7%    1671401 ± 21%  numa-meminfo.node1.MemUsed
      4566 ±  5%     -15.2%       3872 ± 11%  numa-vmstat.node0.nr_mapped
    367556 ±  3%      -6.6%     343176 ±  4%  numa-vmstat.node0.numa_local
    116715 ± 33%    +118.3%     254735 ± 33%  numa-vmstat.node1.nr_active_anon
    116570 ± 33%    +118.4%     254606 ± 33%  numa-vmstat.node1.nr_anon_pages
    151.25 ± 35%    +136.5%     357.75 ± 34%  numa-vmstat.node1.nr_anon_transparent_hugepages
    116715 ± 33%    +118.2%     254724 ± 33%  numa-vmstat.node1.nr_zone_active_anon
    272539           +10.6%     301362 ±  7%  numa-vmstat.node1.numa_hit
     23658            +2.7%      24302        proc-vmstat.nr_slab_unreclaimable
    614708 ±  6%    +113.3%    1310981 ±  7%  proc-vmstat.numa_pages_migrated
   4541241           +17.7%    5343559        proc-vmstat.pgalloc_normal
   3936210 ± 25%     +35.1%    5318885        proc-vmstat.pgfree
     52096 ± 15%     +95.6%     101888 ± 11%  proc-vmstat.pgmigrate_fail
    614708 ±  6%    +113.3%    1310981 ±  7%  proc-vmstat.pgmigrate_success
      7267 ± 26%     +34.3%       9757 ±  2%  proc-vmstat.thp_deferred_split_page
 1.556e+10 ±  5%     +33.6%  2.078e+10 ± 14%  perf-stat.branch-instructions
 8.441e+08 ±  3%     +19.0%  1.004e+09 ±  5%  perf-stat.cache-misses
 1.737e+09 ±  6%     +24.0%  2.154e+09 ±  5%  perf-stat.cache-references
     10.77           -20.3%       8.58 ± 15%  perf-stat.cpi
 1.486e+10 ±  5%     +37.2%  2.038e+10 ± 13%  perf-stat.dTLB-loads
 1.729e+10 ±  9%     +26.7%   2.19e+10 ±  3%  perf-stat.dTLB-stores
 9.068e+10 ±  5%     +32.3%    1.2e+11 ± 11%  perf-stat.instructions
      0.09           +28.4%       0.12 ± 15%  perf-stat.ipc
     31.61            -5.0       26.61 ±  6%  perf-stat.node-load-miss-rate%
  4.24e+08 ±  8%      +9.1%  4.627e+08 ±  4%  perf-stat.node-load-misses
 9.165e+08 ±  6%     +39.6%   1.28e+09 ±  6%  perf-stat.node-loads
      7007 ±  4%     -52.6%       3324 ±  3%  sched_debug.cfs_rq:/.min_vruntime.avg
     23357 ± 13%     -18.8%      18961 ±  4%  sched_debug.cfs_rq:/.min_vruntime.max
      2988 ±  7%     -76.8%     693.68 ± 23%  sched_debug.cfs_rq:/.min_vruntime.min
      4011 ±  8%    -128.9%      -1159        sched_debug.cfs_rq:/.spread0.avg
     20353 ± 15%     -28.9%      14466 ± 25%  sched_debug.cfs_rq:/.spread0.max
    115.17 ±  4%     -14.4%      98.57 ± 12%  sched_debug.cfs_rq:/.util_est_enqueued.stddev
  15213474 ± 84%     -83.1%    2564813 ± 91%  sched_debug.cpu.avg_idle.max
   2126334 ± 85%     -80.3%     419030 ± 65%  sched_debug.cpu.avg_idle.stddev
      0.00 ±  3%     -11.0%       0.00 ±  7%  sched_debug.cpu.next_balance.stddev
      5690 ±  8%     -19.7%       4568 ±  4%  sched_debug.cpu.nr_switches.max
      1001 ±  6%     -12.8%     872.62 ±  3%  sched_debug.cpu.nr_switches.stddev
      8.17 ± 14%      -8.2        0.00        perf-profile.calltrace.cycles-pp.waitid
      4.42 ±101%      -4.4        0.00        perf-profile.calltrace.cycles-pp.pipe_write.__vfs_write.vfs_write.ksys_write.do_syscall_64
      6.25 ± 60%      -4.3        1.92 ±173%  perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.25 ± 60%      -4.3        1.92 ±173%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.25 ± 60%      -4.3        1.92 ±173%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.25 ± 60%      -4.3        1.92 ±173%  perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      6.25 ± 60%      -4.2        2.08 ±173%  perf-profile.calltrace.cycles-pp.filemap_map_pages.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
      4.00 ±100%      -4.0        0.00        perf-profile.calltrace.cycles-pp.arch_show_interrupts.seq_read.proc_reg_read.__vfs_read.vfs_read
      4.00 ±100%      -4.0        0.00        perf-profile.calltrace.cycles-pp.proc_reg_read.__vfs_read.vfs_read.ksys_read.do_syscall_64
      4.00 ±100%      -4.0        0.00        perf-profile.calltrace.cycles-pp.seq_read.proc_reg_read.__vfs_read.vfs_read.ksys_read
      3.75 ±101%      -3.8        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.waitid
      3.75 ±101%      -3.8        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.waitid
      3.75 ±101%      -3.8        0.00        perf-profile.calltrace.cycles-pp.__do_sys_waitid.do_syscall_64.entry_SYSCALL_64_after_hwframe.waitid
      3.75 ±101%      -3.8        0.00        perf-profile.calltrace.cycles-pp.kernel_waitid.__do_sys_waitid.do_syscall_64.entry_SYSCALL_64_after_hwframe.waitid
      3.75 ±101%      -3.8        0.00        perf-profile.calltrace.cycles-pp.do_wait.kernel_waitid.__do_sys_waitid.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.75 ±101%      -3.8        0.00        perf-profile.calltrace.cycles-pp.d_invalidate.proc_flush_task.release_task.wait_consider_task.do_wait
      3.75 ±101%      -3.8        0.00        perf-profile.calltrace.cycles-pp.proc_flush_task.release_task.wait_consider_task.do_wait.kernel_waitid
      3.75 ±101%      -3.8        0.00        perf-profile.calltrace.cycles-pp.shrink_dcache_parent.d_invalidate.proc_flush_task.release_task.wait_consider_task
      3.75 ±101%      -3.8        0.00        perf-profile.calltrace.cycles-pp.wait_consider_task.do_wait.kernel_waitid.__do_sys_waitid.do_syscall_64
      3.75 ±101%      -3.8        0.00        perf-profile.calltrace.cycles-pp.release_task.wait_consider_task.do_wait.kernel_waitid.__do_sys_waitid
      4.42 ±101%      -2.5        1.92 ±173%  perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.17 ±103%      -2.2        1.92 ±173%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.00 ±100%      -1.9        2.08 ±173%  perf-profile.calltrace.cycles-pp.__vfs_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      8.17 ± 14%      -8.2        0.00        perf-profile.children.cycles-pp.waitid
      4.42 ±101%      -4.4        0.00        perf-profile.children.cycles-pp.pipe_write
      6.25 ± 60%      -4.3        1.92 ±173%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
      6.25 ± 60%      -4.3        1.92 ±173%  perf-profile.children.cycles-pp.vm_mmap_pgoff
      6.25 ± 60%      -4.3        1.92 ±173%  perf-profile.children.cycles-pp.do_mmap
      6.25 ± 60%      -4.3        1.92 ±173%  perf-profile.children.cycles-pp.mmap_region
      6.25 ± 60%      -4.2        2.08 ±173%  perf-profile.children.cycles-pp.filemap_map_pages
      4.00 ±100%      -4.0        0.00        perf-profile.children.cycles-pp.arch_show_interrupts
      4.00 ±100%      -4.0        0.00        perf-profile.children.cycles-pp.proc_reg_read
      3.75 ±101%      -3.8        0.00        perf-profile.children.cycles-pp.__do_sys_waitid
      3.75 ±101%      -3.8        0.00        perf-profile.children.cycles-pp.kernel_waitid
      3.75 ±101%      -3.8        0.00        perf-profile.children.cycles-pp.do_wait
      3.75 ±101%      -3.8        0.00        perf-profile.children.cycles-pp.d_invalidate
      3.75 ±101%      -3.8        0.00        perf-profile.children.cycles-pp.proc_flush_task
      3.75 ±101%      -3.8        0.00        perf-profile.children.cycles-pp.shrink_dcache_parent
      3.75 ±101%      -3.8        0.00        perf-profile.children.cycles-pp.wait_consider_task
      3.75 ±101%      -3.8        0.00        perf-profile.children.cycles-pp.release_task
      4.42 ±101%      -2.5        1.92 ±173%  perf-profile.children.cycles-pp.ksys_write
      4.42 ±101%      -2.5        1.92 ±173%  perf-profile.children.cycles-pp.vfs_write
      4.42 ±101%      -2.5        1.92 ±173%  perf-profile.children.cycles-pp.__vfs_write
      4.17 ±103%      -2.2        1.92 ±173%  perf-profile.children.cycles-pp.path_openat
      4.00 ±100%      -1.9        2.08 ±173%  perf-profile.children.cycles-pp.__vfs_read
      4.00 ±100%      -1.9        2.08 ±173%  perf-profile.children.cycles-pp.seq_read
      4.17 ±103%      -2.1        2.08 ±173%  perf-profile.self.cycles-pp.filemap_map_pages


                                                                                
                       perf-bench-numa-mem.nsecs_byte_thread                    
                                                                                
  7.5 +-+-------------------------------------------------------------------+   
      |     .+      .+.  +          .+         .+         .+                |   
    7 +-+.+.  +   .+      +.+. .+.+.  +   .+.+.  + .+.+.+.  +   .+.+.      .|   
      |        +.+            +        +.+        +          +.+     +..+.+ |   
  6.5 +-+                                                                   |   
      |                                                                     |   
    6 +-+                                                                   |   
      |                                                                     |   
  5.5 +-+                                  O                                |   
      | O O  O O   O      O                                                 |   
    5 +-+            O                 O                                    |   
      |          O      O       O    O   O   O                              |   
  4.5 O-+                   O O   O                                         |   
      |                                                                     |   
    4 +-+-------------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                         perf-bench-numa-mem.GB_sec_thread                      
                                                                                
  0.23 +-+------------------------------------------------------------------+   
  0.22 +-+                   O O   O                                        |   
       O          O              O   O                                      |   
  0.21 +-+              O                     O                             |   
   0.2 +-+            O                O  O                                 |   
       |   O O            O                                                 |   
  0.19 +-O      O   O                                                       |   
  0.18 +-+                                  O                               |   
  0.17 +-+                                                                  |   
       |                                                                    |   
  0.16 +-+                                                                  |   
  0.15 +-+                                                   +.+.        .+.|   
       |.+.+.  .+.+.+.   .+..+.+.+.+. .+..+.+.+. .+.+.+..+. +    +.+..+.+   |   
  0.14 +-+   +.       +.+            +          +          +                |   
  0.13 +-+------------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                        perf-bench-numa-mem.GB_sec_total                        
                                                                                
  22 +-+--------------------------------------------------------------------+   
  21 +-+                   O O    O                                         |   
     O          O               O   O        O                              |   
  20 +-+               O                 O                                  |   
  19 +-+             O                O                                     |   
     | O O  O O   O      O                                                  |   
  18 +-+                                                                    |   
  17 +-+                                   O                                |   
  16 +-+                                                                    |   
     |                                                                      |   
  15 +-+                                                                    |   
  14 +-+     .+.+.          .+..     .+..+.     .+..+.     .+..+.+. .+..+.+.|   
     | +.+..+     +..+. .+.+    +.+.+      +.+.+      +.+.+        +        |   
  13 +-+               +                                                    |   
  12 +-+--------------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen

View attachment "config-4.19.0-rc5-00246-gefaffc5" of type "text/plain" (167709 bytes)

View attachment "job-script" of type "text/plain" (7002 bytes)

View attachment "job.yaml" of type "text/plain" (4588 bytes)

View attachment "reproduce" of type "text/plain" (323 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ