linux-kernel - [linus:master] [rcu] d96c52fe49: fio.write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202210311603.655d1ba5-yujie.liu@intel.com>
Date:   Mon, 31 Oct 2022 16:37:28 +0800
From:   kernel test robot <yujie.liu@...el.com>
To:     "Paul E. McKenney" <paulmck@...nel.org>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        Brian Foster <bfoster@...hat.com>,
        Dave Chinner <david@...morbit.com>,
        Al Viro <viro@...iv.linux.org.uk>, Ian Kent <raven@...maw.net>,
        <linux-kernel@...r.kernel.org>, <rcu@...r.kernel.org>,
        <ying.huang@...el.com>, <feng.tang@...el.com>,
        <zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>
Subject: [linus:master] [rcu] d96c52fe49: fio.write_iops 5.8% improvement

Greeting,

FYI, we noticed a 5.8% improvement of fio.write_iops due to commit:

commit: d96c52fe4907c68adc5e61a0bef7aec0933223d5 ("rcu: Add polled expedited grace-period primitives")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: fio-basic
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
with following parameters:

	disk: 2pmem
	fs: xfs
	mount_option: dax
	runtime: 200s
	nr_task: 50%
	time_based: tb
	rw: write
	bs: 4k
	ioengine: mmap
	test_size: 200G
	cpufreq_governor: performance

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio

In addition to that, the commit also has significant impact on the following tests:

+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | fio-basic: fio.read_iops 7.7% improvement                                                     |
| test machine     | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory |
| test parameters  | bs=4k                                                                                         |
|                  | cpufreq_governor=performance                                                                  |
|                  | disk=2pmem                                                                                    |
|                  | fs=xfs                                                                                        |
|                  | ioengine=mmap                                                                                 |
|                  | mount_option=dax                                                                              |
|                  | nr_task=50%                                                                                   |
|                  | runtime=200s                                                                                  |
|                  | rw=read                                                                                       |
|                  | test_size=200G                                                                                |
|                  | time_based=tb                                                                                 |
+------------------+-----------------------------------------------------------------------------------------------+


Details are as below:

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
  4k/gcc-11/performance/2pmem/xfs/mmap/x86_64-rhel-8.3/dax/50%/debian-11.1-x86_64-20220510.cgz/200s/write/lkp-csl-2sp7/200G/fio-basic/tb

commit: 
  e4333cb20f ("rcutorture: Verify that polled GP API sees synchronous grace periods")
  d96c52fe49 ("rcu: Add polled expedited grace-period primitives")

e4333cb20f047d96 d96c52fe4907c68adc5e61a0bef 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     23.87 ±  3%     +32.7       56.58        fio.latency_20us%
     76.01           -32.6       43.37 ±  2%  fio.latency_50us%
 4.463e+08            +5.8%  4.722e+08        fio.time.minor_page_faults
    521.13            +6.9%     557.11        fio.time.user_time
 4.463e+08            +5.8%  4.722e+08        fio.workload
      8717            +5.8%       9221        fio.write_bw_MBps
     23210            -6.8%      21632        fio.write_clat_90%_us
     24960            -5.6%      23552        fio.write_clat_95%_us
     21149            -5.6%      19960        fio.write_clat_mean_us
   2231570            +5.8%    2360791        fio.write_iops
      2.70            +6.8%       2.89        iostat.cpu.user
      0.03            +0.0        0.03 ±  6%  mpstat.cpu.all.soft%
     43.52            +1.0%      43.98        turbostat.RAMWatt
   1925180 ±  2%     +13.8%    2191273 ±  6%  numa-meminfo.node1.MemUsed
     84399 ±  4%     +11.2%      93823 ± 10%  numa-meminfo.node1.SUnreclaim
     64201 ±138%     +78.6%     114665 ± 74%  numa-meminfo.node1.Unevictable
     21100 ±  4%     +11.2%      23455 ± 10%  numa-vmstat.node1.nr_slab_unreclaimable
     16050 ±138%     +78.6%      28665 ± 74%  numa-vmstat.node1.nr_unevictable
     16050 ±138%     +78.6%      28665 ± 74%  numa-vmstat.node1.nr_zone_unevictable
    431413            +5.8%     456351        proc-vmstat.nr_page_table_pages
   2574662            +3.6%    2668198        proc-vmstat.numa_hit
   2485812            +3.8%    2581127        proc-vmstat.numa_local
   2572931            +3.7%    2668242        proc-vmstat.pgalloc_normal
  4.47e+08            +5.8%  4.729e+08        proc-vmstat.pgfault
   2460652            +4.1%    2560513        proc-vmstat.pgfree
    872002            +5.8%     922500        proc-vmstat.thp_fault_fallback
 7.577e+09            +4.0%  7.878e+09        perf-stat.i.branch-instructions
      0.22 ±  2%      +0.0        0.25 ±  3%  perf-stat.i.branch-miss-rate%
  16697023 ±  3%     +19.7%   19981806 ±  3%  perf-stat.i.branch-misses
 1.862e+08            +5.3%  1.961e+08        perf-stat.i.cache-misses
 2.653e+08            +5.1%  2.787e+08        perf-stat.i.cache-references
      4.42            -3.6%       4.26        perf-stat.i.cpi
    731.57 ±  3%      -6.1%     686.63        perf-stat.i.cycles-between-cache-misses
 7.891e+09            +4.1%  8.217e+09        perf-stat.i.dTLB-loads
      0.47            +0.0        0.48        perf-stat.i.dTLB-store-miss-rate%
  12378051            +7.7%   13336505        perf-stat.i.dTLB-store-misses
 2.643e+09            +5.7%  2.793e+09        perf-stat.i.dTLB-stores
     67.44            +3.5       70.99        perf-stat.i.iTLB-load-miss-rate%
  12036801 ±  3%     +18.8%   14298923 ±  2%  perf-stat.i.iTLB-load-misses
 3.046e+10            +3.9%  3.166e+10        perf-stat.i.instructions
      2572 ±  4%     -12.5%       2249        perf-stat.i.instructions-per-iTLB-miss
      0.23            +3.9%       0.24        perf-stat.i.ipc
      1171            +5.6%       1237        perf-stat.i.metric.K/sec
    191.44            +4.3%     199.67        perf-stat.i.metric.M/sec
   2217598            +6.0%    2349848        perf-stat.i.minor-faults
    360845 ±  4%      +5.7%     381564 ±  2%  perf-stat.i.node-loads
  44614403 ±  2%      +7.1%   47802681 ±  2%  perf-stat.i.node-store-misses
   2217612            +6.0%    2349862        perf-stat.i.page-faults
      0.22 ±  2%      +0.0        0.25 ±  3%  perf-stat.overall.branch-miss-rate%
      4.41            -3.6%       4.25        perf-stat.overall.cpi
    720.90            -4.9%     685.85        perf-stat.overall.cycles-between-cache-misses
      0.47            +0.0        0.48        perf-stat.overall.dTLB-store-miss-rate%
     67.43            +3.6       70.99        perf-stat.overall.iTLB-load-miss-rate%
      2533 ±  3%     -12.6%       2215 ±  2%  perf-stat.overall.instructions-per-iTLB-miss
      0.23            +3.7%       0.24        perf-stat.overall.ipc
     13751            -1.9%      13485        perf-stat.overall.path-length
  7.54e+09            +4.0%  7.839e+09        perf-stat.ps.branch-instructions
  16623549 ±  3%     +19.7%   19891545 ±  3%  perf-stat.ps.branch-misses
 1.853e+08            +5.3%  1.951e+08        perf-stat.ps.cache-misses
  2.64e+08            +5.0%  2.773e+08        perf-stat.ps.cache-references
 7.852e+09            +4.1%  8.177e+09        perf-stat.ps.dTLB-loads
  12317699            +7.7%   13270764        perf-stat.ps.dTLB-store-misses
 2.631e+09            +5.7%  2.779e+09        perf-stat.ps.dTLB-stores
  11979927 ±  3%     +18.8%   14229710 ±  2%  perf-stat.ps.iTLB-load-misses
 3.031e+10            +3.9%   3.15e+10        perf-stat.ps.instructions
   2206821            +6.0%    2338290        perf-stat.ps.minor-faults
    359302 ±  4%      +5.7%     379788 ±  2%  perf-stat.ps.node-loads
  44388499 ±  2%      +7.1%   47560032 ±  2%  perf-stat.ps.node-store-misses
   2206835            +6.0%    2338303        perf-stat.ps.page-faults
 6.138e+12            +3.7%  6.367e+12        perf-stat.total.instructions


***************************************************************************************************
lkp-csl-2sp7: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
  4k/gcc-11/performance/2pmem/xfs/mmap/x86_64-rhel-8.3/dax/50%/debian-11.1-x86_64-20220510.cgz/200s/read/lkp-csl-2sp7/200G/fio-basic/tb

commit: 
  e4333cb20f ("rcutorture: Verify that polled GP API sees synchronous grace periods")
  d96c52fe49 ("rcu: Add polled expedited grace-period primitives")

e4333cb20f047d96 d96c52fe4907c68adc5e61a0bef 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     18.06 ±  4%     +41.5       59.58 ±  3%  fio.latency_20us%
     81.86           -41.5       40.40 ±  5%  fio.latency_50us%
      8656            +7.7%       9320        fio.read_bw_MBps
     23466            -8.4%      21504        fio.read_clat_90%_us
     24789            -8.8%      22613        fio.read_clat_95%_us
     28501            -5.7%      26880        fio.read_clat_99%_us
     21373            -7.2%      19825        fio.read_clat_mean_us
   2216141            +7.7%    2386084        fio.read_iops
 4.432e+08            +7.7%  4.772e+08        fio.time.minor_page_faults
    452.30            +7.5%     486.28        fio.time.user_time
 4.432e+08            +7.7%  4.772e+08        fio.workload
      2.36            +7.3%       2.53        iostat.cpu.user
  14396416 ± 10%     -16.9%   11959978 ±  8%  meminfo.DirectMap2M
      0.65 ±  2%      -0.1        0.51 ± 44%  perf-profile.calltrace.cycles-pp.insert_pfn.__vm_insert_mixed.dax_fault_iter.dax_iomap_pte_fault.__xfs_filemap_fault
      0.63 ±  3%      -0.1        0.50 ± 44%  perf-profile.calltrace.cycles-pp.__get_locked_pte.insert_pfn.__vm_insert_mixed.dax_fault_iter.dax_iomap_pte_fault
      0.60 ±  2%      -0.1        0.46 ± 44%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__get_locked_pte.insert_pfn.__vm_insert_mixed.dax_fault_iter
      0.65 ±  2%      -0.1        0.59 ±  9%  perf-profile.children.cycles-pp.insert_pfn
      0.63 ±  3%      -0.1        0.57 ±  9%  perf-profile.children.cycles-pp.__get_locked_pte
      1.14 ±  2%      -0.1        0.99 ±  9%  perf-profile.self.cycles-pp._raw_spin_lock
    429439            +7.2%     460393        proc-vmstat.nr_page_table_pages
   2550960            +5.6%    2693959        proc-vmstat.numa_hit
   2464007            +5.8%    2606912        proc-vmstat.numa_local
   2551059            +5.6%    2694047        proc-vmstat.pgalloc_normal
 4.439e+08            +7.7%  4.779e+08        proc-vmstat.pgfault
   2447827            +5.4%    2579922        proc-vmstat.pgfree
    865972            +7.7%     932383        proc-vmstat.thp_fault_fallback
      7.80            +3.9%       8.11 ±  3%  perf-stat.i.MPKI
  7.99e+09            +4.9%  8.382e+09        perf-stat.i.branch-instructions
      0.25 ±  5%      +0.0        0.29 ±  9%  perf-stat.i.branch-miss-rate%
  20400526 ±  5%     +15.4%   23535017 ±  2%  perf-stat.i.branch-misses
 1.795e+08            +7.1%  1.922e+08        perf-stat.i.cache-misses
 2.486e+08            +7.2%  2.664e+08        perf-stat.i.cache-references
      4.23            -4.6%       4.04        perf-stat.i.cpi
    749.57            -6.6%     700.00        perf-stat.i.cycles-between-cache-misses
   9606077            +6.2%   10206063        perf-stat.i.dTLB-load-misses
 8.018e+09            +4.7%  8.396e+09        perf-stat.i.dTLB-loads
   2231154            +7.5%    2399136        perf-stat.i.dTLB-store-misses
 2.668e+09            +7.3%  2.862e+09        perf-stat.i.dTLB-stores
     73.29 ±  6%      +4.4       77.67        perf-stat.i.iTLB-load-miss-rate%
  14219792 ±  8%     +19.2%   16946973        perf-stat.i.iTLB-load-misses
 3.183e+10            +4.8%  3.337e+10        perf-stat.i.instructions
      2312 ±  8%     -13.1%       2009 ±  2%  perf-stat.i.instructions-per-iTLB-miss
      0.24            +4.7%       0.25        perf-stat.i.ipc
      1273            +7.4%       1368        perf-stat.i.metric.K/sec
    197.16            +5.2%     207.39        perf-stat.i.metric.M/sec
   2205049            +7.5%    2371531        perf-stat.i.minor-faults
   7323132            +4.0%    7616704        perf-stat.i.node-store-misses
   2205063            +7.5%    2371545        perf-stat.i.page-faults
      7.81            +2.2%       7.98        perf-stat.overall.MPKI
      0.26 ±  5%      +0.0        0.28 ±  2%  perf-stat.overall.branch-miss-rate%
      4.22            -4.7%       4.02        perf-stat.overall.cpi
    748.61            -6.7%     698.69        perf-stat.overall.cycles-between-cache-misses
     73.28 ±  6%      +4.4       77.69        perf-stat.overall.iTLB-load-miss-rate%
      2252 ±  8%     -12.6%       1969        perf-stat.overall.instructions-per-iTLB-miss
      0.24            +4.9%       0.25        perf-stat.overall.ipc
     14444            -2.5%      14084        perf-stat.overall.path-length
 7.951e+09            +4.9%  8.341e+09        perf-stat.ps.branch-instructions
  20309598 ±  5%     +15.4%   23427778 ±  2%  perf-stat.ps.branch-misses
 1.786e+08            +7.1%  1.912e+08        perf-stat.ps.cache-misses
 2.474e+08            +7.2%  2.651e+08        perf-stat.ps.cache-references
   9559588            +6.2%   10156429        perf-stat.ps.dTLB-load-misses
 7.979e+09            +4.7%  8.355e+09        perf-stat.ps.dTLB-loads
   2220324            +7.5%    2387471        perf-stat.ps.dTLB-store-misses
 2.655e+09            +7.3%  2.848e+09        perf-stat.ps.dTLB-stores
  14152666 ±  7%     +19.2%   16863826        perf-stat.ps.iTLB-load-misses
 3.167e+10            +4.8%  3.321e+10        perf-stat.ps.instructions
   2194343            +7.5%    2359991        perf-stat.ps.minor-faults
   7285981            +4.0%    7577980        perf-stat.ps.node-store-misses
   2194357            +7.5%    2360005        perf-stat.ps.page-faults
 6.402e+12            +5.0%  6.721e+12        perf-stat.total.instructions


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

View attachment "config-5.19.0-rc3-00008-gd96c52fe4907" of type "text/plain" (163487 bytes)

View attachment "job-script" of type "text/plain" (8626 bytes)

View attachment "job.yaml" of type "text/plain" (5888 bytes)

View attachment "reproduce" of type "text/plain" (975 bytes)