linux-kernel - Re: [PATCH] maple_tree: Remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202301242012.efecb3a0-oliver.sang@intel.com>
Date:   Tue, 24 Jan 2023 21:02:42 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Liam Howlett <liam.howlett@...cle.com>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        Jirka Hladky <jhladky@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Liam Howlett <Liam.Howlett@...cle.com>, <linux-mm@...ck.org>,
        <ying.huang@...el.com>, <feng.tang@...el.com>,
        <zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>,
        "maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] maple_tree: Remove GFP_ZERO from kmem_cache_alloc() and
 kmem_cache_alloc_bulk()


Greeting,

FYI, we noticed a 3.8% improvement of will-it-scale.per_process_ops due to commit:


commit: 180c117d46be304a08b14fb080010773faf50788 ("[PATCH] maple_tree: Remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc_bulk()")
url: https://github.com/intel-lab-lkp/linux/commits/Liam-Howlett/maple_tree-Remove-GFP_ZERO-from-kmem_cache_alloc-and-kmem_cache_alloc_bulk/20230106-000849
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 41c03ba9beea760bd2d2ac9250b09a2e192da2dc
patch link: https://lore.kernel.org/all/20230105160427.2988454-1-Liam.Howlett@oracle.com/
patch subject: [PATCH] maple_tree: Remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc_bulk()

in testcase: will-it-scale
on test machine: 104 threads 2 sockets (Skylake) with 192G memory
with following parameters:

	nr_task: 16
	mode: process
	test: mmap1
	cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-11/performance/x86_64-rhel-8.3/process/16/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/mmap1/will-it-scale

commit: 
  41c03ba9be ("Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost")
  180c117d46 ("maple_tree: Remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc_bulk()")

41c03ba9beea760b 180c117d46be304a08b14fb0800 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   2604393            +3.8%    2702107        will-it-scale.16.processes
    162774            +3.8%     168881        will-it-scale.per_process_ops
   2604393            +3.8%    2702107        will-it-scale.workload
 1.254e+10            +2.6%  1.286e+10        perf-stat.i.branch-instructions
 1.251e+10            +2.2%  1.279e+10        perf-stat.i.dTLB-loads
 6.283e+09            +1.3%  6.367e+09        perf-stat.i.dTLB-stores
 5.838e+10            +2.5%  5.987e+10        perf-stat.i.instructions
    301.29            +2.2%     307.85        perf-stat.i.metric.M/sec
      0.81            -2.1%       0.79        perf-stat.overall.cpi
   6756492            -1.0%    6686622        perf-stat.overall.path-length
  1.25e+10            +2.6%  1.282e+10        perf-stat.ps.branch-instructions
 1.247e+10            +2.2%  1.275e+10        perf-stat.ps.dTLB-loads
 6.263e+09            +1.3%  6.346e+09        perf-stat.ps.dTLB-stores
 5.819e+10            +2.5%  5.967e+10        perf-stat.ps.instructions
  1.76e+13            +2.7%  1.807e+13        perf-stat.total.instructions
      2.31 ± 10%      -1.0        1.29 ±  7%  perf-profile.calltrace.cycles-pp.mas_preallocate.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      2.24 ± 10%      -1.0        1.23 ±  7%  perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap
      2.28 ± 11%      -1.0        1.27 ±  8%  perf-profile.calltrace.cycles-pp.mas_preallocate.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
      2.24 ± 10%      -1.0        1.23 ±  8%  perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.mmap_region.do_mmap.vm_mmap_pgoff
      1.52 ± 11%      -0.8        0.67 ±  8%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_bulk.mas_alloc_nodes.mas_preallocate.mmap_region.do_mmap
      1.49 ± 11%      -0.8        0.66 ±  7%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_bulk.mas_alloc_nodes.mas_preallocate.do_mas_align_munmap.__vm_munmap
      4.94 ± 10%      -2.1        2.83 ±  8%  perf-profile.children.cycles-pp.mas_alloc_nodes
      4.61 ± 10%      -2.0        2.57 ±  8%  perf-profile.children.cycles-pp.mas_preallocate
      1.96 ± 10%      -1.8        0.21 ±  7%  perf-profile.children.cycles-pp.memset_erms
      3.09 ± 10%      -1.7        1.36 ±  7%  perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
      0.09 ± 13%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.vm_area_free
      0.07 ± 10%      +0.0        0.09 ± 11%  perf-profile.children.cycles-pp.mas_wr_modify
      0.22 ± 10%      +0.1        0.28 ±  7%  perf-profile.children.cycles-pp.__might_sleep
      0.00            +0.2        0.22 ± 10%  perf-profile.children.cycles-pp.mas_pop_node
      1.89 ± 11%      -1.7        0.21 ±  8%  perf-profile.self.cycles-pp.memset_erms
      0.72 ± 11%      -0.3        0.37 ±  7%  perf-profile.self.cycles-pp.kmem_cache_alloc_bulk
      0.09 ± 13%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.vm_area_free
      0.11 ±  7%      +0.1        0.17 ± 11%  perf-profile.self.cycles-pp.do_mas_munmap
      0.00            +0.2        0.21 ±  9%  perf-profile.self.cycles-pp.mas_pop_node




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests



View attachment "config-6.2.0-rc2-00058-g180c117d46be" of type "text/plain" (166849 bytes)

View attachment "job-script" of type "text/plain" (7816 bytes)

View attachment "job.yaml" of type "text/plain" (5294 bytes)

View attachment "reproduce" of type "text/plain" (345 bytes)