lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202310251458.48b4452d-oliver.sang@intel.com>
Date:   Wed, 25 Oct 2023 15:18:18 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Jan Kara <jack@...e.cz>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        Yury Norov <yury.norov@...il.com>, Jan Kara <jack@...e.cz>,
        <linux-kernel@...r.kernel.org>, <ying.huang@...el.com>,
        <feng.tang@...el.com>, <fengwei.yin@...el.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Mirsad Todorovac <mirsad.todorovac@....unizg.hr>,
        Matthew Wilcox <willy@...radead.org>,
        <linux-fsdevel@...r.kernel.org>, <oliver.sang@...el.com>
Subject: Re: [PATCH 1/2] lib/find: Make functions safe on changing bitmaps



Hello,

kernel test robot noticed a 3.7% improvement of will-it-scale.per_thread_ops on:


commit: df671b17195cd6526e029c70d04dfb72561082d7 ("[PATCH 1/2] lib/find: Make functions safe on changing bitmaps")
url: https://github.com/intel-lab-lkp/linux/commits/Jan-Kara/lib-find-Make-functions-safe-on-changing-bitmaps/20231011-230553
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 1c8b86a3799f7e5be903c3f49fcdaee29fd385b5
patch link: https://lore.kernel.org/all/20231011150252.32737-1-jack@suse.cz/
patch subject: [PATCH 1/2] lib/find: Make functions safe on changing bitmaps

testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

	nr_task: 50%
	mode: thread
	test: tlb_flush3
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231025/202310251458.48b4452d-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/tlb_flush3/will-it-scale

commit: 
  1c8b86a379 ("Merge tag 'xsa441-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip")
  df671b1719 ("lib/find: Make functions safe on changing bitmaps")

1c8b86a3799f7e5b df671b17195cd6526e029c70d04 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.14 ± 19%     +36.9%       0.19 ± 17%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
  2.26e+08            +3.6%  2.343e+08        proc-vmstat.pgfault
      0.04           +25.0%       0.05        turbostat.IPC
     32666           -15.5%      27605 ±  2%  turbostat.POLL
      7856            +2.2%       8025        vmstat.system.cs
   6331931            +2.3%    6478704        vmstat.system.in
    700119            +3.7%     725931        will-it-scale.52.threads
     13463            +3.7%      13959        will-it-scale.per_thread_ops
    700119            +3.7%     725931        will-it-scale.workload
      8.36            -7.3%       7.74        perf-stat.i.MPKI
 4.591e+09            +3.4%  4.747e+09        perf-stat.i.branch-instructions
 1.832e+08            +2.8%  1.883e+08        perf-stat.i.branch-misses
     26.70            -0.3       26.40        perf-stat.i.cache-miss-rate%
      7852            +2.2%       8021        perf-stat.i.context-switches
      6.43            -7.2%       5.97        perf-stat.i.cpi
    769.61            +1.8%     783.29        perf-stat.i.cpu-migrations
  6.39e+09            +3.4%  6.606e+09        perf-stat.i.dTLB-loads
  2.94e+09            +3.2%  3.035e+09        perf-stat.i.dTLB-stores
     78.29            -0.9       77.44        perf-stat.i.iTLB-load-miss-rate%
  18959450            +3.5%   19621273        perf-stat.i.iTLB-load-misses
   5254435            +8.7%    5713444        perf-stat.i.iTLB-loads
 2.236e+10            +7.7%  2.408e+10        perf-stat.i.instructions
      1181            +4.0%       1228        perf-stat.i.instructions-per-iTLB-miss
      0.16            +7.7%       0.17        perf-stat.i.ipc
      0.02 ± 36%     -49.6%       0.01 ± 53%  perf-stat.i.major-faults
    485.08            +3.0%     499.67        perf-stat.i.metric.K/sec
    141.71            +3.2%     146.25        perf-stat.i.metric.M/sec
    747997            +3.7%     775416        perf-stat.i.minor-faults
   3127957           -13.9%    2693728        perf-stat.i.node-loads
  26089697            +3.4%   26965335        perf-stat.i.node-store-misses
    767569            +3.7%     796095        perf-stat.i.node-stores
    747997            +3.7%     775416        perf-stat.i.page-faults
      8.35            -7.3%       7.74        perf-stat.overall.MPKI
     26.70            -0.3       26.40        perf-stat.overall.cache-miss-rate%
      6.43            -7.1%       5.97        perf-stat.overall.cpi
     78.30            -0.9       77.45        perf-stat.overall.iTLB-load-miss-rate%
      1179            +4.0%       1226        perf-stat.overall.instructions-per-iTLB-miss
      0.16            +7.7%       0.17        perf-stat.overall.ipc
   9644584            +3.8%   10011125        perf-stat.overall.path-length
 4.575e+09            +3.4%  4.731e+09        perf-stat.ps.branch-instructions
 1.825e+08            +2.8%  1.876e+08        perf-stat.ps.branch-misses
      7825            +2.2%       7995        perf-stat.ps.context-switches
    767.16            +1.8%     780.76        perf-stat.ps.cpu-migrations
 6.368e+09            +3.4%  6.583e+09        perf-stat.ps.dTLB-loads
  2.93e+09            +3.2%  3.025e+09        perf-stat.ps.dTLB-stores
  18896725            +3.5%   19555325        perf-stat.ps.iTLB-load-misses
   5236456            +8.7%    5693636        perf-stat.ps.iTLB-loads
 2.229e+10            +7.6%  2.399e+10        perf-stat.ps.instructions
    745423            +3.7%     772705        perf-stat.ps.minor-faults
   3117663           -13.9%    2684861        perf-stat.ps.node-loads
  26002765            +3.4%   26875267        perf-stat.ps.node-store-misses
    764789            +3.7%     793098        perf-stat.ps.node-stores
    745423            +3.7%     772705        perf-stat.ps.page-faults
 6.752e+12            +7.6%  7.267e+12        perf-stat.total.instructions
     19.21            -1.0       18.18        perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
     17.00            -0.9       16.09        perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
     65.30            -0.6       64.69        perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
     65.34            -0.6       64.75        perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
     65.98            -0.5       65.45        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     65.96            -0.5       65.42        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      9.72 ±  2%      -0.5        9.20        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
     66.33            -0.5       65.81        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     66.46            -0.5       65.95        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
     31.88            -0.4       31.43        perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
     67.72            -0.4       67.28        perf-profile.calltrace.cycles-pp.__madvise
     32.15            -0.4       31.73        perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
     32.60            -0.4       32.21        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
     32.93            -0.3       32.58        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
     31.07            -0.3       30.74        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
     31.58            -0.3       31.28        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
     31.61            -0.3       31.30        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
     31.80            -0.3       31.51        perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
      8.34            -0.1        8.22        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
      8.06            -0.1        7.95        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
      7.98            -0.1        7.87        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
      0.59 ±  3%      +0.1        0.65 ±  2%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.testcase
      1.46            +0.1        1.53        perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      1.48            +0.1        1.55        perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      1.53            +0.1        1.62        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      2.92            +0.1        3.02        perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
      1.26 ±  2%      +0.1        1.36        perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
      1.84            +0.1        1.96        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      7.87            +0.1        8.00        perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
      2.03 ±  2%      +0.1        2.17        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      2.90            +0.2        3.06        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
      2.62 ±  3%      +0.2        2.80        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
      2.58 ±  3%      +0.2        2.76        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      2.95 ±  3%      +0.2        3.14        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
      2.75            +0.2        2.94        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
      4.96            +0.3        5.29        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
      4.92            +0.3        5.25        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
      5.13            +0.3        5.46        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
      5.08            +0.4        5.44        perf-profile.calltrace.cycles-pp.testcase
     37.25            -2.0       35.24        perf-profile.children.cycles-pp.llist_add_batch
     62.82            -0.8       62.04        perf-profile.children.cycles-pp.on_each_cpu_cond_mask
     62.82            -0.8       62.04        perf-profile.children.cycles-pp.smp_call_function_many_cond
     63.70            -0.7       62.98        perf-profile.children.cycles-pp.flush_tlb_mm_range
     65.30            -0.6       64.70        perf-profile.children.cycles-pp.zap_page_range_single
     65.34            -0.6       64.75        perf-profile.children.cycles-pp.madvise_vma_behavior
     65.98            -0.5       65.45        perf-profile.children.cycles-pp.__x64_sys_madvise
     65.96            -0.5       65.43        perf-profile.children.cycles-pp.do_madvise
     66.52            -0.5       66.01        perf-profile.children.cycles-pp.do_syscall_64
     66.65            -0.5       66.16        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     67.79            -0.4       67.36        perf-profile.children.cycles-pp.__madvise
     32.94            -0.3       32.60        perf-profile.children.cycles-pp.tlb_finish_mmu
     31.74            -0.3       31.43        perf-profile.children.cycles-pp.zap_pte_range
     31.76            -0.3       31.46        perf-profile.children.cycles-pp.zap_pmd_range
     31.95            -0.3       31.66        perf-profile.children.cycles-pp.unmap_page_range
      0.42 ±  2%      +0.0        0.46        perf-profile.children.cycles-pp.error_entry
      0.20 ±  3%      +0.0        0.24 ±  5%  perf-profile.children.cycles-pp.up_read
      0.69            +0.0        0.74        perf-profile.children.cycles-pp.native_flush_tlb_local
      1.47            +0.1        1.55        perf-profile.children.cycles-pp.filemap_map_pages
      1.48            +0.1        1.56        perf-profile.children.cycles-pp.do_read_fault
      1.54            +0.1        1.62        perf-profile.children.cycles-pp.do_fault
      2.75            +0.1        2.86        perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
      1.85            +0.1        1.98        perf-profile.children.cycles-pp.__handle_mm_fault
      2.04 ±  2%      +0.1        2.18        perf-profile.children.cycles-pp.handle_mm_fault
      2.63 ±  3%      +0.2        2.81        perf-profile.children.cycles-pp.exc_page_fault
      2.62 ±  3%      +0.2        2.80        perf-profile.children.cycles-pp.do_user_addr_fault
      3.24 ±  3%      +0.2        3.44        perf-profile.children.cycles-pp.asm_exc_page_fault
      3.83            +0.2        4.04        perf-profile.children.cycles-pp.flush_tlb_func
      0.69 ±  2%      +0.2        0.92        perf-profile.children.cycles-pp._find_next_bit
      9.92            +0.3       10.23        perf-profile.children.cycles-pp.llist_reverse_order
      5.45            +0.4        5.81        perf-profile.children.cycles-pp.testcase
     18.42            +0.5       18.96        perf-profile.children.cycles-pp.asm_sysvec_call_function
     16.24            +0.5       16.78        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
     15.78            +0.5       16.32        perf-profile.children.cycles-pp.__sysvec_call_function
     16.36            +0.5       16.90        perf-profile.children.cycles-pp.sysvec_call_function
     27.92            -1.9       26.04        perf-profile.self.cycles-pp.llist_add_batch
      0.16 ±  2%      +0.0        0.18 ±  4%  perf-profile.self.cycles-pp.up_read
      0.42 ±  2%      +0.0        0.45        perf-profile.self.cycles-pp.error_entry
      0.21 ±  4%      +0.0        0.24 ±  5%  perf-profile.self.cycles-pp.down_read
      0.26 ±  2%      +0.0        0.29 ±  3%  perf-profile.self.cycles-pp.tlb_finish_mmu
      2.01            +0.0        2.05        perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
      0.68            +0.0        0.73        perf-profile.self.cycles-pp.native_flush_tlb_local
      3.10            +0.2        3.26        perf-profile.self.cycles-pp.flush_tlb_func
      0.50 ±  2%      +0.2        0.68        perf-profile.self.cycles-pp._find_next_bit
      9.92            +0.3       10.22        perf-profile.self.cycles-pp.llist_reverse_order
     16.10            +0.5       16.64        perf-profile.self.cycles-pp.smp_call_function_many_cond




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ