lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202211202030.b9842879-oliver.sang@intel.com>
Date:   Sun, 20 Nov 2022 21:01:20 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Jens Axboe <axboe@...nel.dk>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        Christiano Haesbaert <haesbaert@...sbaert.org>,
        <linux-kernel@...r.kernel.org>, <io-uring@...r.kernel.org>,
        <ying.huang@...el.com>, <feng.tang@...el.com>,
        <zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>
Subject: [linus:master] [io_uring]  46a525e199:  fio.read_iops 22.4%
 improvement


Greeting,

FYI, we noticed a 22.4% improvement of fio.read_iops due to commit:


commit: 46a525e199e4037516f7e498c18f065b09df32ac ("io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: fio-basic
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
with following parameters:

	disk: 2pmem
	fs: ext2
	mount_option: dax
	runtime: 200s
	nr_task: 50%
	time_based: tb
	rw: read
	bs: 2M
	ioengine: io_uring
	test_size: 200G
	cpufreq_governor: performance

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
  2M/gcc-11/performance/2pmem/ext2/io_uring/x86_64-rhel-8.3/dax/50%/debian-11.1-x86_64-20220510.cgz/200s/read/lkp-csl-2sp7/200G/fio-basic/tb

commit: 
  b000145e99 ("io_uring/rw: defer fsnotify calls to task context")
  46a525e199 ("io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL")

b000145e99078094 46a525e199e4037516f7e498c18 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     21.62 ± 28%     +13.5       35.14 ± 17%  fio.latency_100ms%
      0.21 ±  7%      -0.0        0.17 ±  6%  fio.latency_20ms%
     11.42 ± 20%      -7.1        4.34 ± 21%  fio.latency_500ms%
      0.68 ± 62%      -0.6        0.06 ± 11%  fio.latency_750ms%
     20312 ±  5%     +22.4%      24856        fio.read_bw_MBps
  2.66e+08 ±  7%     -34.2%  1.751e+08 ±  5%  fio.read_clat_90%_us
 3.305e+08 ±  6%     -28.9%  2.349e+08 ±  9%  fio.read_clat_95%_us
 4.549e+08 ± 14%     -33.7%  3.016e+08 ±  2%  fio.read_clat_99%_us
 1.515e+08 ±  5%     -18.5%  1.235e+08        fio.read_clat_mean_us
  84751148 ± 10%     -38.1%   52462994 ±  8%  fio.read_clat_stddev
     10156 ±  5%     +22.4%      12428        fio.read_iops
     34028 ±  4%     -17.5%      28081 ±  4%  fio.read_slat_mean_us
    207091 ±  7%     +23.0%     254784 ±  5%  fio.read_slat_stddev
   1322240 ±  6%     +39.5%    1844804 ±  2%  fio.time.involuntary_context_switches
    152665 ±  3%    +415.2%     786457        fio.time.minor_page_faults
      8350 ±  2%      +6.6%       8904        fio.time.percent_of_cpu_this_job_got
     16651 ±  2%      +6.8%      17775        fio.time.system_time
     96.17 ±  8%     -14.3%      82.42 ±  5%  fio.time.user_time
   2032503 ±  5%     +22.4%    2487213        fio.workload
    715266 ± 13%     +22.3%     874621 ±  6%  numa-numastat.node1.numa_hit
    715333 ± 13%     +22.2%     874330 ±  6%  numa-vmstat.node1.numa_hit
 1.966e+09 ± 17%     -52.7%  9.303e+08 ± 12%  cpuidle..time
   4301667 ± 16%     -53.0%    2021131 ± 20%  cpuidle..usage
     10.98 ± 15%     -49.4%       5.56 ± 11%  iostat.cpu.idle
     88.19            +6.4%      93.80        iostat.cpu.system
      9.76 ± 17%     -53.3%       4.56 ± 15%  turbostat.CPU%c1
     57.07            +4.1%      59.40        turbostat.RAMWatt
     10.12 ± 17%      -5.5        4.66 ± 13%  mpstat.cpu.all.idle%
      0.25 ± 12%      -0.1        0.12 ±  6%  mpstat.cpu.all.soft%
      0.83 ± 10%      -0.2        0.65 ±  6%  mpstat.cpu.all.usr%
     10.60 ± 15%     -50.9%       5.20 ± 16%  vmstat.cpu.id
    159.40 ±  4%     +27.7%     203.50 ±  2%  vmstat.procs.r
     15208 ±  4%     +15.1%      17506        vmstat.system.cs
     20749            +2.9%      21355        proc-vmstat.nr_kernel_stack
    225429 ±  7%    +282.2%     861493        proc-vmstat.numa_hint_faults
    136434 ±  8%    +456.6%     759435        proc-vmstat.numa_hint_faults_local
   1090679            +2.6%    1119565        proc-vmstat.numa_hit
      6836 ±  3%    +520.3%      42407        proc-vmstat.numa_huge_pte_updates
   1005727            +2.6%    1031985        proc-vmstat.numa_local
    775153 ±  8%     +31.7%    1020704 ±  8%  proc-vmstat.numa_pages_migrated
   3833507 ±  3%    +490.7%   22643711        proc-vmstat.numa_pte_updates
   1090769            +2.5%    1118337        proc-vmstat.pgalloc_normal
    979930           +65.5%    1621767        proc-vmstat.pgfault
    895767 ± 15%      +7.3%     961415        proc-vmstat.pgfree
    775153 ±  8%     +31.7%    1020704 ±  8%  proc-vmstat.pgmigrate_success
      1462 ±  8%     +31.9%       1927 ±  8%  proc-vmstat.thp_migration_success
     98.27            +0.4       98.64        perf-profile.calltrace.cycles-pp.ret_from_fork
     97.37            +0.4       97.73        perf-profile.calltrace.cycles-pp.io_read.io_issue_sqe.io_wq_submit_work.io_worker_handle_work.io_wqe_worker
     98.23            +0.4       98.60        perf-profile.calltrace.cycles-pp.io_worker_handle_work.io_wqe_worker.ret_from_fork
     98.23            +0.4       98.61        perf-profile.calltrace.cycles-pp.io_wqe_worker.ret_from_fork
      1.13 ± 12%      -0.3        0.86 ±  4%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.27 ±  6%      -0.0        0.23 ±  4%  perf-profile.children.cycles-pp.try_to_wake_up
      0.28 ±  6%      -0.0        0.24 ±  3%  perf-profile.children.cycles-pp.__wake_up_common
      0.28 ±  6%      -0.0        0.24 ±  4%  perf-profile.children.cycles-pp.__wake_up_common_lock
      0.26 ±  6%      -0.0        0.22 ±  5%  perf-profile.children.cycles-pp.autoremove_wake_function
      0.08 ±  6%      -0.0        0.06 ±  8%  perf-profile.children.cycles-pp.select_task_rq
      0.00            +0.3        0.33 ±  4%  perf-profile.children.cycles-pp.io_assign_current_work
     98.27            +0.4       98.64        perf-profile.children.cycles-pp.ret_from_fork
     98.23            +0.4       98.60        perf-profile.children.cycles-pp.io_worker_handle_work
     98.23            +0.4       98.61        perf-profile.children.cycles-pp.io_wqe_worker
    131360 ±  8%     +37.7%     180837 ±  3%  sched_debug.cfs_rq:/.MIN_vruntime.avg
      1.49 ±  5%     +26.4%       1.88        sched_debug.cfs_rq:/.h_nr_running.avg
    195967 ± 10%     -22.4%     152129 ± 10%  sched_debug.cfs_rq:/.load.stddev
    144.62 ± 20%     +44.6%     209.08 ±  9%  sched_debug.cfs_rq:/.load_avg.min
    131362 ±  8%     +37.7%     180842 ±  3%  sched_debug.cfs_rq:/.max_vruntime.avg
    201691 ±  7%     +13.2%     228371 ±  3%  sched_debug.cfs_rq:/.min_vruntime.avg
      1.32 ±  5%     +25.7%       1.66        sched_debug.cfs_rq:/.nr_running.avg
      0.54 ±  5%     +14.1%       0.62 ±  3%  sched_debug.cfs_rq:/.nr_running.stddev
      1584 ±  5%     +25.2%       1983        sched_debug.cfs_rq:/.runnable_avg.avg
      2.63 ± 13%     +77.8%       4.68 ± 13%  sched_debug.cfs_rq:/.spread.avg
      5.35 ± 15%     +36.7%       7.31 ±  9%  sched_debug.cfs_rq:/.spread.stddev
   1549299 ±  6%     -31.1%    1067585 ±  7%  sched_debug.cpu.avg_idle.avg
   3658796 ± 10%     -30.2%    2554830 ± 15%  sched_debug.cpu.avg_idle.max
    276671 ± 18%     -41.5%     161849 ± 20%  sched_debug.cpu.avg_idle.min
    718901 ± 13%     -31.3%     494161 ± 11%  sched_debug.cpu.avg_idle.stddev
    872915 ±  9%     -36.4%     555389 ±  3%  sched_debug.cpu.max_idle_balance_cost.avg
   1335161 ± 12%     -42.1%     773347 ± 15%  sched_debug.cpu.max_idle_balance_cost.max
    602677 ±  9%     -17.0%     500000        sched_debug.cpu.max_idle_balance_cost.min
    166368 ± 11%     -64.5%      59141 ± 46%  sched_debug.cpu.max_idle_balance_cost.stddev
      1.49 ±  5%     +26.0%       1.88 ±  2%  sched_debug.cpu.nr_running.avg
     16805 ±  4%     +16.8%      19629        sched_debug.cpu.nr_switches.avg
      9904 ±  6%     +17.8%      11670 ±  2%  sched_debug.cpu.nr_switches.min
      4092 ±  9%     +25.8%       5147 ±  9%  sched_debug.cpu.nr_switches.stddev
     40.89            +1.2%      41.37        perf-stat.i.MPKI
 2.899e+09 ±  5%     +20.7%  3.498e+09        perf-stat.i.branch-instructions
      0.20 ±  4%      -0.0        0.16 ±  4%  perf-stat.i.branch-miss-rate%
 6.679e+08 ±  5%     +22.1%  8.153e+08        perf-stat.i.cache-misses
 7.021e+08 ±  5%     +21.9%  8.556e+08        perf-stat.i.cache-references
     15269 ±  5%     +15.1%      17569        perf-stat.i.context-switches
     14.35 ±  5%     -14.9%      12.21        perf-stat.i.cpi
 2.381e+11            +5.8%  2.517e+11        perf-stat.i.cpu-cycles
    372.98 ±  5%     -16.3%     312.29        perf-stat.i.cycles-between-cache-misses
      0.00 ±  9%      -0.0        0.00 ±  6%  perf-stat.i.dTLB-load-miss-rate%
 2.976e+09 ±  5%     +20.7%   3.59e+09        perf-stat.i.dTLB-loads
      0.00 ±  2%      +0.0        0.01        perf-stat.i.dTLB-store-miss-rate%
    123809 ±  5%     +48.9%     184353        perf-stat.i.dTLB-store-misses
 2.799e+09 ±  5%     +21.3%  3.394e+09        perf-stat.i.dTLB-stores
     74.58 ±  3%      +6.6       81.23        perf-stat.i.iTLB-load-miss-rate%
    383587 ± 13%     -37.2%     240807 ± 14%  perf-stat.i.iTLB-loads
 1.707e+10 ±  5%     +20.9%  2.063e+10        perf-stat.i.instructions
      0.07 ±  5%     +15.1%       0.08        perf-stat.i.ipc
      2.48            +5.8%       2.62        perf-stat.i.metric.GHz
      1290 ±  9%     -38.8%     790.15 ±  2%  perf-stat.i.metric.K/sec
     98.96 ±  5%     +21.4%     120.15        perf-stat.i.metric.M/sec
      4017 ±  2%     +80.4%       7248        perf-stat.i.minor-faults
     46.99 ±  4%     -10.2       36.84 ±  3%  perf-stat.i.node-load-miss-rate%
  36469552 ± 11%     +20.0%   43763092        perf-stat.i.node-loads
     25.92 ±  9%     -22.8        3.08 ± 14%  perf-stat.i.node-store-miss-rate%
  41252120 ± 10%     -89.2%    4473071 ± 20%  perf-stat.i.node-store-misses
 1.465e+08 ±  9%     +38.1%  2.022e+08 ±  2%  perf-stat.i.node-stores
      4017 ±  2%     +80.4%       7248        perf-stat.i.page-faults
      0.16 ±  5%      -0.0        0.14 ±  4%  perf-stat.overall.branch-miss-rate%
     14.08 ±  6%     -13.1%      12.24        perf-stat.overall.cpi
    359.74 ±  6%     -14.0%     309.45        perf-stat.overall.cycles-between-cache-misses
      0.00 ± 10%      -0.0        0.00 ±  6%  perf-stat.overall.dTLB-load-miss-rate%
      0.00 ±  2%      +0.0        0.01        perf-stat.overall.dTLB-store-miss-rate%
     64.21 ±  4%     +11.1       75.32 ±  2%  perf-stat.overall.iTLB-load-miss-rate%
     26608 ±  5%     +14.3%      30425 ±  6%  perf-stat.overall.instructions-per-iTLB-miss
      0.07 ±  5%     +14.7%       0.08        perf-stat.overall.ipc
     44.39 ±  2%      -7.9       36.50 ±  3%  perf-stat.overall.node-load-miss-rate%
     22.12 ±  8%     -20.1        2.06 ± 19%  perf-stat.overall.node-store-miss-rate%
 2.881e+09 ±  5%     +20.9%  3.484e+09        perf-stat.ps.branch-instructions
 6.644e+08 ±  5%     +22.3%  8.128e+08        perf-stat.ps.cache-misses
 6.983e+08 ±  5%     +22.1%  8.529e+08        perf-stat.ps.cache-references
     15070 ±  5%     +15.5%      17412        perf-stat.ps.context-switches
 2.383e+11            +5.6%  2.515e+11        perf-stat.ps.cpu-cycles
 2.957e+09 ±  5%     +20.9%  3.576e+09        perf-stat.ps.dTLB-loads
    123050 ±  5%     +49.6%     184070        perf-stat.ps.dTLB-store-misses
 2.783e+09 ±  5%     +21.5%  3.382e+09        perf-stat.ps.dTLB-stores
    358319 ± 14%     -37.8%     222740 ± 12%  perf-stat.ps.iTLB-loads
 1.697e+10 ±  5%     +21.1%  2.055e+10        perf-stat.ps.instructions
      4008 ±  2%     +79.2%       7183        perf-stat.ps.minor-faults
  36583387 ± 11%     +19.6%   43742068        perf-stat.ps.node-loads
  41398082 ± 11%     -89.7%    4252522 ± 20%  perf-stat.ps.node-store-misses
 1.459e+08 ±  9%     +38.5%  2.021e+08 ±  2%  perf-stat.ps.node-stores
      4008 ±  2%     +79.2%       7183        perf-stat.ps.page-faults
 3.428e+12 ±  5%     +21.2%  4.155e+12        perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://01.org/lkp



View attachment "config-6.0.0-rc6-00054-g46a525e199e4" of type "text/plain" (164404 bytes)

View attachment "job-script" of type "text/plain" (8608 bytes)

View attachment "job.yaml" of type "text/plain" (5877 bytes)

View attachment "reproduce" of type "text/plain" (942 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ