linux-kernel - [linus:master] [shmem] 4601e2fc8b: will-it-scale.per_thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Fri, 11 Nov 2022 16:37:25 +0800
From:   kernel test robot <yujie.liu@...el.com>
To:     Matthew Wilcox <willy@...radead.org>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
        <ying.huang@...el.com>, <feng.tang@...el.com>,
        <zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>
Subject: [linus:master] [shmem] 4601e2fc8b: will-it-scale.per_thread_ops
 13.7% improvement

Greeting,

FYI, we noticed a 13.7% improvement of will-it-scale.per_thread_ops due to commit:

commit: 4601e2fc8b57840660ce1a1ee98aea873fa15eee ("shmem: convert shmem_file_read_iter() to use shmem_get_folio()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
with following parameters:

	nr_task: 100%
	mode: thread
	test: pread2
	cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale


Details are as below:

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-11/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp5/pread2/will-it-scale

commit: 
  eff1f906c2 ("shmem: convert shmem_write_begin() to use shmem_get_folio()")
  4601e2fc8b ("shmem: convert shmem_file_read_iter() to use shmem_get_folio()")

eff1f906c2dcd83c 4601e2fc8b57840660ce1a1ee98 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1508791 ±  3%     +13.7%    1715505 ±  2%  will-it-scale.128.threads
     11786 ±  3%     +13.7%      13401 ±  2%  will-it-scale.per_thread_ops
   1508791 ±  3%     +13.7%    1715505 ±  2%  will-it-scale.workload
      2.92 ± 15%     +43.7%       4.20 ± 16%  turbostat.CPU%c1
     58550 ±  4%     -16.4%      48936 ±  5%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.20 ±  9%     +17.5%       0.23 ±  5%  sched_debug.cfs_rq:/.nr_running.stddev
     58605 ±  5%     -16.5%      48957 ±  5%  sched_debug.cfs_rq:/.spread0.stddev
    191.02 ±  4%     +16.1%     221.72 ±  5%  sched_debug.cfs_rq:/.util_est_enqueued.stddev
      0.23 ±  3%     +11.1%       0.25 ±  4%  sched_debug.cpu.nr_running.stddev
     12.20            -1.1%      12.07        perf-stat.i.cpi
      0.00 ±  9%      -0.0        0.00 ±  5%  perf-stat.i.dTLB-store-miss-rate%
 9.003e+08 ±  2%      +6.4%  9.582e+08        perf-stat.i.dTLB-stores
     82.71            +2.2       84.95        perf-stat.i.node-store-miss-rate%
   5815837           +10.2%    6408731        perf-stat.i.node-store-misses
   1223798 ±  2%      -6.6%    1142824 ±  2%  perf-stat.i.node-stores
     12.19            -1.0%      12.06        perf-stat.overall.cpi
      0.01 ±  3%      -0.0        0.00 ±  5%  perf-stat.overall.dTLB-store-miss-rate%
     82.60            +2.2       84.85        perf-stat.overall.node-store-miss-rate%
   6712074 ±  2%     -12.0%    5904631 ±  2%  perf-stat.overall.path-length
 8.981e+08 ±  2%      +6.4%  9.558e+08        perf-stat.ps.dTLB-stores
   5796378           +10.2%    6387291        perf-stat.ps.node-store-misses
   1220724 ±  2%      -6.6%    1140426 ±  2%  perf-stat.ps.node-stores
     41.14           -41.1        0.00        perf-profile.calltrace.cycles-pp.shmem_getpage.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
     41.10           -41.1        0.00        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_getpage.shmem_file_read_iter.vfs_read.__x64_sys_pread64
     41.04           -41.0        0.00        perf-profile.calltrace.cycles-pp.__filemap_get_folio.shmem_get_folio_gfp.shmem_getpage.shmem_file_read_iter.vfs_read
     40.18           -40.2        0.00        perf-profile.calltrace.cycles-pp.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_getpage.shmem_file_read_iter
     39.18           -39.2        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_getpage
      0.00            +0.6        0.59 ±  7%  perf-profile.calltrace.cycles-pp.io_schedule.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter
      0.00           +39.4       39.45        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter
      0.00           +40.5       40.46        perf-profile.calltrace.cycles-pp.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read
      0.00           +41.2       41.24        perf-profile.calltrace.cycles-pp.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64
      0.00           +41.3       41.30        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
     41.14           -41.1        0.00        perf-profile.children.cycles-pp.shmem_getpage
      0.10 ±  4%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.copyout
      0.12 ±  3%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      0.07            +0.0        0.09        perf-profile.children.cycles-pp.folio_unlock
      0.12 ±  3%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp._copy_to_iter
      0.13 ±  2%      +0.0        0.15 ±  4%  perf-profile.children.cycles-pp.copy_page_to_iter
      0.00            +0.1        0.06 ±  9%  perf-profile.children.cycles-pp.PageHeadHuge
      0.46            -0.1        0.37 ±  3%  perf-profile.self.cycles-pp.shmem_file_read_iter
      0.82 ±  2%      -0.1        0.74 ±  4%  perf-profile.self.cycles-pp.__filemap_get_folio
      0.12 ±  3%      +0.0        0.14 ±  3%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      0.07            +0.0        0.09        perf-profile.self.cycles-pp.folio_unlock


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

View attachment "config-6.0.0-rc3-00333-g4601e2fc8b57" of type "text/plain" (164475 bytes)

View attachment "job-script" of type "text/plain" (8001 bytes)

View attachment "job.yaml" of type "text/plain" (5386 bytes)

View attachment "reproduce" of type "text/plain" (346 bytes)