lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170814060507.GE23258@yexl-desktop>
Date:   Mon, 14 Aug 2017 14:05:07 +0800
From:   kernel test robot <xiaolong.ye@...el.com>
To:     Guillaume Knispel <guillaume.knispel@...ersonicimagine.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Manfred Spraul <manfred@...orfullife.com>,
        Kees Cook <keescook@...omium.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        Alexey Dobriyan <adobriyan@...il.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Serge Hallyn <serge@...lyn.com>,
        Andrey Vagin <avagin@...nvz.org>,
        Guillaume Knispel <guillaume.knispel@...ersonicimagine.com>,
        Marc Pardo <marc.pardo@...ersonicimagine.com>,
        linux-kernel@...r.kernel.org, lkp@...org
Subject: [lkp-robot] [ipc]  cb6268f05d:  reaim.jobs_per_min 865% improvement


Greeting,

FYI, we noticed a 865% improvement of reaim.jobs_per_min due to commit:


commit: cb6268f05df684e00607762fd8ad95d515e2407f ("ipc: optimize semget/shmget/msgget for lots of keys")
url: https://github.com/0day-ci/linux/commits/Guillaume-Knispel/ipc-optimize-semget-shmget-msgget-for-lots-of-keys/20170731-170031


in testcase: reaim
on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory
with following parameters:

	runtime: 300s
	nr_task: 5000
	test: shared_memory
	cpufreq_governor: performance

test-description: REAIM is an updated and improved version of AIM 7 benchmark.
test-url: https://sourceforge.net/projects/re-aim-7/


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/01org/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

testcase/path_params/tbox_group/run: reaim/300s-5000-shared_memory-performance/lkp-hsw-ep5

fd2b2c57ec2020ae  cb6268f05df684e00607762fd8  
----------------  --------------------------  
    634033 ±  4%       865%    6120365        reaim.jobs_per_min
       126 ±  4%       865%       1224        reaim.jobs_per_min_child
    672755 ±  5%       831%    6263184        reaim.max_jobs_per_min
     34.14 ±  3%        11%      37.82        reaim.std_dev_percent
     65.40              -6%      61.66        reaim.jti
     13.53              -7%      12.60        reaim.child_utime
     47.48 ±  4%       -90%       4.90        reaim.parent_time
      1981 ±  4%       -90%        204        reaim.child_systime
     12.19             -90%       1.24        reaim.std_dev_time
   5160570 ±  7%       517%   31838188        reaim.time.minor_page_faults
     88.17 ±  7%       474%     505.69        reaim.time.user_time
    125994 ± 11%       244%     433632        reaim.time.involuntary_context_switches
   4053795 ±  7%        79%    7256949        reaim.time.voluntary_context_switches
      3974             -28%       2864        reaim.time.percent_of_cpu_this_job_got
     12855 ±  4%       -36%       8249        reaim.time.system_time
     27741 ±  3%        86%      51490        vmstat.system.cs
    214773 ±  5%        44%     308264        interrupts.CAL:Function_call_interrupts
       fail:runs  %reproduction    fail:runs
           |             |             |    
           :4           25%           1:4     stderr.create_shared_memory():can't_create_shared_memory,pausing
         0            6e+05     551946 ±104%  latency_stats.avg.call_rwsem_down_write_failed.shmctl_down.SyS_shmctl.do_syscall_64.return_from_SYSCALL_64
    240707 ±  7%     -2e+05      81933        latency_stats.avg.call_rwsem_down_write_failed.ipcget.SyS_shmget.entry_SYSCALL_64_fastpath
    282714 ±  4%     -2e+05      76327        latency_stats.avg.call_rwsem_down_write_failed.shm_close.remove_vma.do_munmap.SyS_shmdt.entry_SYSCALL_64_fastpath
    313951 ±  5%     -2e+05      78957        latency_stats.avg.call_rwsem_down_write_failed.do_shmat.SyS_shmat.entry_SYSCALL_64_fastpath
    341015 ±  4%     -3e+05      78091        latency_stats.avg.call_rwsem_down_write_failed.shmctl_down.SyS_shmctl.entry_SYSCALL_64_fastpath
         0            6e+05     551946 ±104%  latency_stats.max.call_rwsem_down_write_failed.shmctl_down.SyS_shmctl.do_syscall_64.return_from_SYSCALL_64
  21599230 ±  3%     -2e+07    3153822 ±  6%  latency_stats.max.call_rwsem_down_write_failed.shmctl_down.SyS_shmctl.entry_SYSCALL_64_fastpath
  21608679 ±  3%     -2e+07    3152519 ±  6%  latency_stats.max.call_rwsem_down_write_failed.ipcget.SyS_shmget.entry_SYSCALL_64_fastpath
  21612440 ±  3%     -2e+07    3153028 ±  6%  latency_stats.max.call_rwsem_down_write_failed.do_shmat.SyS_shmat.entry_SYSCALL_64_fastpath
  21613940 ±  3%     -2e+07    3154107 ±  6%  latency_stats.max.call_rwsem_down_write_failed.shm_close.remove_vma.do_munmap.SyS_shmdt.entry_SYSCALL_64_fastpath
  21615866 ±  3%     -2e+07    3154900 ±  6%  latency_stats.max.max
 3.835e+10 ±  4%      9e+10  1.254e+11 ±  7%  latency_stats.sum.io_schedule.__lock_page.do_wp_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
 1.672e+09 ± 22%      5e+09  6.765e+09 ± 86%  latency_stats.sum.call_rwsem_down_write_failed.ipcget.SyS_semget.entry_SYSCALL_64_fastpath
   2757771 ± 40%      2e+08  2.425e+08 ± 26%  latency_stats.sum.io_schedule.wait_on_page_bit.__migration_entry_wait.migration_entry_wait.do_swap_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
         0            6e+05     551946 ±104%  latency_stats.sum.call_rwsem_down_write_failed.shmctl_down.SyS_shmctl.do_syscall_64.return_from_SYSCALL_64
     24449 ± 67%      2e+05     200503 ±  8%  latency_stats.sum.io_schedule.__lock_page_or_retry.filemap_fault.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
     40790 ± 10%      2e+05     191446 ± 11%  latency_stats.sum.ep_poll.SyS_epoll_wait.do_syscall_64.return_from_SYSCALL_64
     27684 ± 19%      1e+05     172543 ± 13%  latency_stats.sum.devkmsg_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
     52252 ±  4%      1e+05     189551 ±  5%  latency_stats.sum.wait_woken.inotify_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
      6530 ± 95%      9e+04      91966 ± 21%  latency_stats.sum.io_schedule.__lock_page_killable.__lock_page_or_retry.filemap_fault.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
      5976 ± 20%      3e+04      34861 ±  4%  latency_stats.sum.pipe_wait.pipe_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
 2.659e+11 ±  3%     -2e+11  8.371e+10        latency_stats.sum.call_rwsem_down_write_failed.do_shmat.SyS_shmat.entry_SYSCALL_64_fastpath
   5864993 ±  7%       454%   32495687        perf-stat.page-faults
   5864993 ±  7%       454%   32495687        perf-stat.minor-faults
      0.11 ±  3%       422%       0.60        perf-stat.branch-miss-rate%
 1.501e+08 ±  7%       356%  6.847e+08        perf-stat.node-store-misses
 4.444e+11 ±  6%       316%  1.849e+12        perf-stat.dTLB-stores
 1.077e+08 ±  5%       314%   4.46e+08        perf-stat.node-loads
 1.067e+09 ±  7%       287%  4.133e+09        perf-stat.cache-misses
 6.282e+08 ±  7%       275%  2.355e+09        perf-stat.node-load-misses
 1.757e+08 ±  7%       261%  6.339e+08        perf-stat.node-stores
  4.56e+09 ±  6%       232%  1.516e+10        perf-stat.branch-misses
      3.20             212%       9.98        perf-stat.cache-miss-rate%
  71703040 ±  6%       163%  1.884e+08        perf-stat.dTLB-store-misses
 2.831e+08 ± 11%       111%  5.964e+08        perf-stat.iTLB-loads
   9086070 ±  7%        74%   15819624        perf-stat.context-switches
   1340441 ±  8%        72%    2308738        perf-stat.cpu-migrations
 2.164e+09 ±  7%        32%  2.847e+09        perf-stat.iTLB-load-misses
 3.337e+10 ±  5%        24%   4.14e+10        perf-stat.cache-references
     46.07              13%      51.93        perf-stat.node-store-miss-rate%
      0.66               9%       0.72        perf-stat.ipc
     85.35                       84.08        perf-stat.node-load-miss-rate%
     88.46              -7%      82.67        perf-stat.iTLB-load-miss-rate%
      1.51              -8%       1.39        perf-stat.cpi
 5.134e+12 ±  4%       -28%  3.698e+12        perf-stat.dTLB-loads
 1.972e+13 ±  4%       -32%  1.342e+13        perf-stat.instructions
 3.976e+12 ±  4%       -36%  2.533e+12        perf-stat.branch-instructions
      0.02 ±  6%       -37%       0.01        perf-stat.dTLB-store-miss-rate%
 2.977e+13 ±  4%       -37%  1.865e+13        perf-stat.cpu-cycles
      9132 ±  3%       -48%       4715        perf-stat.instructions-per-iTLB-miss
      0.06 ± 27%       -68%       0.02        perf-stat.dTLB-load-miss-rate%
 3.271e+09 ± 27%       -77%  7.473e+08        perf-stat.dTLB-load-misses



                                reaim.parent_time

  55 ++---------------------------------------------------------------------+
  50 ++.*.     .*..                                                      *..|
     *.   *..*.    *.*..*..*..*.*..*..*..*.*..*..*.*..*..  .*.*..*..*.. +   *
  45 ++                                                  *.            *    |
  40 ++                                                                     |
  35 ++                                                                     |
  30 ++                                                                     |
     |                                                                      |
  25 ++                                                                     |
  20 ++                                                                     |
  15 ++                                                                     |
  10 ++                                                                     |
     |                                                                      |
   5 O+ O O  O  O  O O  O  O  O O  O  O  O O  O  O O  O  O  O O  O          |
   0 ++---------------------------------------------------------------------+


                                reaim.child_systime

  2200 ++-*-*-----*---------------------------------------------------------+
  2000 *+      *.  :                           *.*..*..*.          .*.. .*..|
       |           :     .*.*..    .*..      ..          *..*..*.*.    *    *
  1800 ++           *..*.      *..*    *..*.*                               |
  1600 ++                                                                   |
  1400 ++                                                                   |
  1200 ++                                                                   |
       |                                                                    |
  1000 ++                                                                   |
   800 ++                                                                   |
   600 ++                                                                   |
   400 ++                                                                   |
       |                                                                    |
   200 O+ O O  O  O O  O  O O  O  O O  O  O O  O O  O  O O  O  O O          |
     0 ++-------------------------------------------------------------------+


                                 reaim.jobs_per_min

  7e+06 ++------------------------------------------------------------------+
        O  O O  O O  O  O O  O  O O  O O  O  O                              |
  6e+06 ++                                     O  O O  O  O O  O  O         |
        |                                                                   |
  5e+06 ++                                                                  |
        |                                                                   |
  4e+06 ++                                                                  |
        |                                                                   |
  3e+06 ++                                                                  |
        |                                                                   |
  2e+06 ++                                                                  |
        |                                                                   |
  1e+06 ++                                                                  |
        *..*.*..*.*..*..*.*..*..*.*..*.*..*..*.*..*.*..*..*.*..*..*.*..*.*..*
      0 ++------------------------------------------------------------------+


                             reaim.jobs_per_min_child

  1400 ++-------------------------------------------------------------------+
       O  O O  O  O O  O  O O  O  O O  O  O O                               |
  1200 ++                                      O O  O  O O  O  O O          |
       |                                                                    |
  1000 ++                                                                   |
       |                                                                    |
   800 ++                                                                   |
       |                                                                    |
   600 ++                                                                   |
       |                                                                    |
   400 ++                                                                   |
       |                                                                    |
   200 ++                                                                   |
       *..*.*..*..*.*..*..*.*..*..*.*..*..*.*..*.*..*..*.*..*..*.*..*..*.*..*
     0 ++-------------------------------------------------------------------+


  [*] bisect-good sample
  [O] bisect-bad  sample


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong

View attachment "config-4.13.0-rc2-00023-gcb6268f0" of type "text/plain" (160961 bytes)

View attachment "job-script" of type "text/plain" (6767 bytes)

View attachment "job.yaml" of type "text/plain" (4438 bytes)

View attachment "reproduce" of type "text/plain" (4906 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ