lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210831025623.GC4286@xsang-OptiPlex-9020>
Date:   Tue, 31 Aug 2021 10:56:23 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Amir Goldstein <amir73il@...il.com>
Cc:     Jan Kara <jack@...e.cz>, Matthew Bobrowski <repnop@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...ux.intel.com
Subject: [fsnotify]  e43de7f086:  will-it-scale.per_thread_ops 10.2%
 improvement



Greeting,

FYI, we noticed a 10.2% improvement of will-it-scale.per_thread_ops due to commit:


commit: e43de7f0862b8598cd1ef440e3b4701cd107ea40 ("fsnotify: optimize the case of no marks of any type")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: will-it-scale
on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

	nr_task: 100%
	mode: thread
	test: eventfd1
	cpufreq_governor: performance
	ucode: 0x5003006

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install                job.yaml  # job file is attached in this email
        bin/lkp split-job --compatible job.yaml  # generate the yaml file for lkp run
        bin/lkp run                    generated-yaml-file

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/eventfd1/will-it-scale/0x5003006

commit: 
  ec44610fe2 ("fsnotify: count all objects with attached connectors")
  e43de7f086 ("fsnotify: optimize the case of no marks of any type")

ec44610fe2b86dae e43de7f0862b8598cd1ef440e3b 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 3.057e+08           +10.2%  3.368e+08        will-it-scale.192.threads
   1592331           +10.2%    1754346        will-it-scale.per_thread_ops
 3.057e+08           +10.2%  3.368e+08        will-it-scale.workload
     18.46            +1.9       20.39        mpstat.cpu.all.usr%
      0.04 ± 25%     -53.1%       0.02 ± 63%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
    401780 ±  4%      -6.1%     377284 ±  5%  proc-vmstat.numa_pte_updates
      2561 ± 18%     -19.0%       2075 ±  6%  interrupts.CPU110.TLB:TLB_shootdowns
      4743 ± 28%     -30.0%       3318 ± 23%  interrupts.CPU122.CAL:Function_call_interrupts
      4311 ± 38%     -27.9%       3110 ±  9%  interrupts.CPU126.CAL:Function_call_interrupts
      2586 ± 14%     -18.3%       2112 ±  5%  interrupts.CPU126.TLB:TLB_shootdowns
      2558 ± 16%     -16.7%       2131 ±  4%  interrupts.CPU140.TLB:TLB_shootdowns
      2581 ± 14%     -18.0%       2117 ±  6%  interrupts.CPU178.TLB:TLB_shootdowns
      3670 ± 10%     -14.0%       3155 ± 10%  interrupts.CPU183.CAL:Function_call_interrupts
   1.1e+11            -3.6%   1.06e+11        perf-stat.i.branch-instructions
      0.43 ±  3%      +0.2        0.65        perf-stat.i.branch-miss-rate%
 4.684e+08 ±  2%     +46.4%  6.859e+08        perf-stat.i.branch-misses
   1352675 ±  4%      -7.9%    1246174        perf-stat.i.cache-misses
      1.00            +3.3%       1.04        perf-stat.i.cpi
    571300 ±  5%     +14.4%     653738        perf-stat.i.cycles-between-cache-misses
 1.619e+11            -1.0%  1.604e+11        perf-stat.i.dTLB-loads
    161216            +8.5%     174902        perf-stat.i.dTLB-store-misses
 1.113e+11            -7.0%  1.035e+11        perf-stat.i.dTLB-stores
  4.36e+08 ±  2%     +50.0%  6.541e+08 ±  2%  perf-stat.i.iTLB-load-misses
    600928 ± 11%    +173.9%    1645842 ± 34%  perf-stat.i.iTLB-loads
 5.529e+11            -3.1%  5.356e+11        perf-stat.i.instructions
      1268 ±  2%     -35.3%     820.45 ±  2%  perf-stat.i.instructions-per-iTLB-miss
      1.00            -3.2%       0.96        perf-stat.i.ipc
      1995            -3.5%       1926        perf-stat.i.metric.M/sec
    245924 ±  5%      -8.5%     225135        perf-stat.i.node-load-misses
      0.43 ±  3%      +0.2        0.65        perf-stat.overall.branch-miss-rate%
      1.00            +3.3%       1.04        perf-stat.overall.cpi
    397012 ±  4%      +8.1%     429116 ±  2%  perf-stat.overall.cycles-between-cache-misses
      0.00            +0.0        0.00        perf-stat.overall.dTLB-store-miss-rate%
      1269 ±  2%     -35.4%     819.49 ±  2%  perf-stat.overall.instructions-per-iTLB-miss
      1.00            -3.2%       0.96        perf-stat.overall.ipc
    545194           -12.1%     479465        perf-stat.overall.path-length
 1.096e+11            -3.6%  1.056e+11        perf-stat.ps.branch-instructions
 4.668e+08 ±  2%     +46.4%  6.836e+08        perf-stat.ps.branch-misses
   1396634 ±  4%      -7.6%    1290738        perf-stat.ps.cache-misses
   9527800 ± 26%     -16.0%    7998925 ±  3%  perf-stat.ps.cache-references
 1.614e+11            -1.0%  1.598e+11        perf-stat.ps.dTLB-loads
    160900            +8.9%     175191        perf-stat.ps.dTLB-store-misses
 1.109e+11            -7.0%  1.032e+11        perf-stat.ps.dTLB-stores
 4.345e+08 ±  2%     +50.0%  6.519e+08 ±  2%  perf-stat.ps.iTLB-load-misses
    601410 ± 12%    +172.8%    1640540 ± 34%  perf-stat.ps.iTLB-loads
 5.511e+11            -3.1%  5.338e+11        perf-stat.ps.instructions
    245861 ±  6%      -8.7%     224556        perf-stat.ps.node-load-misses
 1.667e+14            -3.1%  1.615e+14        perf-stat.total.instructions
     28.69            -3.6       25.08        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
     35.83            -3.3       32.53        perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
     40.48            -3.0       37.49        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
     42.00            -2.8       39.16        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
      7.75 ±  4%      -1.8        6.00 ±  2%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     56.76            -1.4       55.37        perf-profile.calltrace.cycles-pp.__libc_read
      1.11            +0.0        1.15        perf-profile.calltrace.cycles-pp.___might_sleep.__might_fault._copy_to_iter.eventfd_read.new_sync_read
      0.82            +0.0        0.86 ±  4%  perf-profile.calltrace.cycles-pp.__x64_sys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
      0.55            +0.0        0.60        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_read
      0.55            +0.0        0.60        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_write
      0.66            +0.1        0.71        perf-profile.calltrace.cycles-pp.__might_sleep.__might_fault._copy_from_user.eventfd_write.vfs_write
      0.56            +0.1        0.62        perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
      0.56            +0.1        0.62        perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
      2.13            +0.1        2.19        perf-profile.calltrace.cycles-pp.__might_fault._copy_from_user.eventfd_write.vfs_write.ksys_write
      0.62            +0.1        0.71        perf-profile.calltrace.cycles-pp.__might_sleep.__might_fault._copy_to_iter.eventfd_read.new_sync_read
      1.09            +0.1        1.19        perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.eventfd_read.new_sync_read
      1.32            +0.1        1.42        perf-profile.calltrace.cycles-pp.fput_many.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
      1.52            +0.1        1.64        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
      2.32            +0.1        2.45        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
      1.52            +0.1        1.65        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
      2.30            +0.1        2.44        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
      1.11            +0.1        1.25        perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_from_user.eventfd_write.vfs_write.ksys_write
      1.43            +0.1        1.57        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
      1.42            +0.1        1.56 ±  2%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_write
      1.34            +0.1        1.48        perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled.copyout._copy_to_iter.eventfd_read.new_sync_read
      1.74            +0.2        1.89        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.eventfd_read.new_sync_read.vfs_read.ksys_read
      1.34            +0.2        1.50        perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled._copy_from_user.eventfd_write.vfs_write.ksys_write
      1.78            +0.2        1.95        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.eventfd_write.vfs_write.ksys_write.do_syscall_64
      0.34 ± 70%      +0.2        0.52        perf-profile.calltrace.cycles-pp.iov_iter_init.new_sync_read.vfs_read.ksys_read.do_syscall_64
      1.94            +0.2        2.16        perf-profile.calltrace.cycles-pp.__might_fault._copy_to_iter.eventfd_read.new_sync_read.vfs_read
     30.86            +0.2       31.10        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
      2.99            +0.3        3.30        perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.eventfd_read.new_sync_read.vfs_read
      0.18 ±141%      +0.4        0.55 ±  2%  perf-profile.calltrace.cycles-pp.testcase
     32.36            +0.4       32.76        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
      5.51            +0.4        5.92        perf-profile.calltrace.cycles-pp._copy_from_user.eventfd_write.vfs_write.ksys_write.do_syscall_64
      3.47 ±  6%      +0.5        3.98 ±  3%  perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_read.ksys_read.do_syscall_64
      3.52 ±  6%      +0.5        4.04 ±  3%  perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_write.ksys_write.do_syscall_64
      7.81            +0.8        8.59        perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
      7.84            +0.8        8.64        perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_write
      7.51            +0.8        8.31        perf-profile.calltrace.cycles-pp._copy_to_iter.eventfd_read.new_sync_read.vfs_read.ksys_read
      4.39 ±  6%      +1.0        5.36 ±  3%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     15.06            +1.1       16.15        perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     11.67            +1.2       12.84        perf-profile.calltrace.cycles-pp.eventfd_read.new_sync_read.vfs_read.ksys_read.do_syscall_64
      9.32            +1.3       10.63        perf-profile.calltrace.cycles-pp.eventfd_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     47.12            +1.8       48.95        perf-profile.calltrace.cycles-pp.__libc_write
      7.16            -7.2        0.00        perf-profile.children.cycles-pp.fsnotify
     28.98            -3.8       25.18        perf-profile.children.cycles-pp.vfs_read
     36.02            -3.3       32.74        perf-profile.children.cycles-pp.ksys_read
     71.72            -2.7       68.99        perf-profile.children.cycles-pp.do_syscall_64
     74.56            -2.4       72.14        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     56.97            -1.4       55.60        perf-profile.children.cycles-pp.__libc_read
      0.35            -0.1        0.30        perf-profile.children.cycles-pp.fput
      0.50            +0.0        0.52        perf-profile.children.cycles-pp.__pthread_disable_asynccancel
      0.52            +0.0        0.54        perf-profile.children.cycles-pp.iov_iter_init
      0.38 ±  2%      +0.0        0.41 ±  2%  perf-profile.children.cycles-pp.rcu_read_unlock_strict
      0.85            +0.0        0.89 ±  3%  perf-profile.children.cycles-pp.__x64_sys_read
      2.25            +0.0        2.29        perf-profile.children.cycles-pp.___might_sleep
      0.83            +0.1        0.88 ±  3%  perf-profile.children.cycles-pp.__x64_sys_write
      0.76 ±  2%      +0.1        0.84 ±  2%  perf-profile.children.cycles-pp.testcase
      1.17            +0.1        1.28        perf-profile.children.cycles-pp.syscall_enter_from_user_mode
      1.17            +0.1        1.28        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      2.70            +0.1        2.81        perf-profile.children.cycles-pp.fput_many
      1.31            +0.1        1.45        perf-profile.children.cycles-pp.__might_sleep
      3.16            +0.3        3.41        perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      2.38            +0.3        2.63        perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      4.98            +0.3        5.25        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      4.37            +0.3        4.68        perf-profile.children.cycles-pp.__might_fault
      3.12            +0.3        3.44        perf-profile.children.cycles-pp.copyout
      3.63            +0.3        3.96        perf-profile.children.cycles-pp._raw_spin_lock_irq
      3.32            +0.4        3.69        perf-profile.children.cycles-pp.copy_user_generic_unrolled
      5.80            +0.4        6.24        perf-profile.children.cycles-pp._copy_from_user
      0.30 ±  5%      +0.7        1.01 ±  3%  perf-profile.children.cycles-pp.apparmor_file_permission
      7.62            +0.8        8.43        perf-profile.children.cycles-pp._copy_to_iter
      8.50            +0.8        9.31        perf-profile.children.cycles-pp.syscall_return_via_sysret
     10.10            +1.0       11.09        perf-profile.children.cycles-pp.__entry_text_start
      7.20 ±  5%      +1.1        8.26 ±  3%  perf-profile.children.cycles-pp.common_file_perm
     15.25            +1.1       16.36        perf-profile.children.cycles-pp.new_sync_read
     11.87            +1.2       13.05        perf-profile.children.cycles-pp.eventfd_read
      9.49            +1.3       10.80        perf-profile.children.cycles-pp.eventfd_write
     47.33            +1.8       49.18        perf-profile.children.cycles-pp.__libc_write
      6.85            -6.9        0.00        perf-profile.self.cycles-pp.fsnotify
      2.97 ±  3%      -0.8        2.16 ±  5%  perf-profile.self.cycles-pp.vfs_read
      2.69 ±  3%      -0.6        2.05 ±  7%  perf-profile.self.cycles-pp.vfs_write
      1.56            -0.1        1.50 ±  2%  perf-profile.self.cycles-pp.ksys_write
      0.25 ±  4%      +0.0        0.27        perf-profile.self.cycles-pp.rcu_read_unlock_strict
      0.77            +0.0        0.80 ±  3%  perf-profile.self.cycles-pp.__x64_sys_read
      2.20            +0.0        2.23        perf-profile.self.cycles-pp.___might_sleep
      0.41            +0.0        0.45        perf-profile.self.cycles-pp.copyout
      0.75            +0.0        0.80 ±  4%  perf-profile.self.cycles-pp.__x64_sys_write
      0.94            +0.1        1.00        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.62            +0.1        0.68        perf-profile.self.cycles-pp.__fdget_pos
      0.67 ±  3%      +0.1        0.74 ±  2%  perf-profile.self.cycles-pp.testcase
      0.87            +0.1        0.94        perf-profile.self.cycles-pp._copy_from_user
      2.58            +0.1        2.66        perf-profile.self.cycles-pp.fput_many
      1.39            +0.1        1.47 ±  2%  perf-profile.self.cycles-pp.ksys_read
      0.99            +0.1        1.08        perf-profile.self.cycles-pp.syscall_enter_from_user_mode
      1.16            +0.1        1.26        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      1.15            +0.1        1.27        perf-profile.self.cycles-pp.__might_sleep
      0.85            +0.1        0.97 ±  2%  perf-profile.self.cycles-pp.__might_fault
      2.50            +0.2        2.70        perf-profile.self.cycles-pp.eventfd_read
      2.14            +0.2        2.37        perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      3.02            +0.2        3.26        perf-profile.self.cycles-pp.exit_to_user_mode_prepare
      2.42            +0.3        2.69        perf-profile.self.cycles-pp._copy_to_iter
      2.88            +0.3        3.19        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      3.46            +0.3        3.78        perf-profile.self.cycles-pp._raw_spin_lock_irq
      3.12            +0.4        3.47        perf-profile.self.cycles-pp.copy_user_generic_unrolled
      4.53            +0.4        4.93        perf-profile.self.cycles-pp.__entry_text_start
      4.59            +0.4        5.00        perf-profile.self.cycles-pp.__libc_write
      4.58            +0.4        5.01 ±  2%  perf-profile.self.cycles-pp.__libc_read
      1.92 ±  2%      +0.7        2.61 ±  2%  perf-profile.self.cycles-pp.eventfd_write
      0.18 ±  8%      +0.7        0.90        perf-profile.self.cycles-pp.apparmor_file_permission
      8.43            +0.8        9.24        perf-profile.self.cycles-pp.syscall_return_via_sysret
      5.64 ±  6%      +0.9        6.58 ±  5%  perf-profile.self.cycles-pp.common_file_perm


                                                                                
                             will-it-scale.per_thread_ops                       
                                                                                
  1.8e+06 +-----------------------------------------------------------------+   
          |O  O         O   O    O   OO                                     |   
  1.6e+06 |-+    .+++.+ +.++     +.+++   ++. ++.+ ++. ++.++ +.+ +.   .+ ++.+|   
          |+.++++      +    ++.++     +.+   +    +   +     +   +  +++  +    |   
          |                                                                 |   
  1.4e+06 |-+                                                               |   
          |                                                                 |   
  1.2e+06 |-+                                                               |   
          |                                                                 |   
    1e+06 |-+                                                               |   
          |                                                                 |   
          |                                                                 |   
   800000 |-+                                                               |   
          |         O                                                       |   
   600000 +-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.14.0-rc4-00116-ge43de7f0862b" of type "text/plain" (175472 bytes)

View attachment "job-script" of type "text/plain" (7902 bytes)

View attachment "job.yaml" of type "text/plain" (5349 bytes)

View attachment "reproduce" of type "text/plain" (340 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ