lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200708072346.GL3874@shao2-debian>
Date:   Wed, 8 Jul 2020 15:23:46 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Shaokun Zhang <zhangshaokun@...ilicon.com>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Shaokun Zhang <zhangshaokun@...ilicon.com>,
        Will Deacon <will@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Boqun Feng <boqun.feng@...il.com>,
        Yuqi Jin <jinyuqi@...wei.com>, lkp@...ts.01.org
Subject: [fs] 936e92b615: unixbench.score 32.3% improvement

Greeting,

FYI, we noticed a 32.3% improvement of unixbench.score due to commit:


commit: 936e92b615e212d08eb74951324bef25ba564c34 ("[PATCH RESEND] fs: Move @f_count to different cacheline with @f_mode")
url: https://github.com/0day-ci/linux/commits/Shaokun-Zhang/fs-Move-f_count-to-different-cacheline-with-f_mode/20200624-163511
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 5e857ce6eae7ca21b2055cca4885545e29228fe2

in testcase: unixbench
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

	runtime: 300s
	nr_task: 30%
	test: syscall
	cpufreq_governor: performance
	ucode: 0x5002f01

test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-7.6/30%/debian-x86_64-20191114.cgz/300s/lkp-csl-2ap3/syscall/unixbench/0x5002f01

commit: 
  5e857ce6ea ("Merge branch 'hch' (maccess patches from Christoph Hellwig)")
  936e92b615 ("fs: Move @f_count to different cacheline with @f_mode")

5e857ce6eae7ca21 936e92b615e212d08eb74951324 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2297 ±  2%     +32.3%       3038        unixbench.score
    171.74           +34.8%     231.55        unixbench.time.user_time
 1.366e+09           +32.6%  1.812e+09        unixbench.workload
     26472 ±  6%   +1270.0%     362665 ±158%  cpuidle.C1.usage
      0.25 ±  2%      +0.1        0.33        mpstat.cpu.all.usr%
      8.32 ± 43%    +129.7%      19.12 ± 63%  sched_debug.cpu.clock.stddev
      8.32 ± 43%    +129.7%      19.12 ± 63%  sched_debug.cpu.clock_task.stddev
      2100 ±  2%     -15.6%       1772 ±  9%  sched_debug.cpu.nr_switches.min
    373.34 ±  3%     +12.4%     419.48 ±  6%  sched_debug.cpu.ttwu_local.stddev
      2740 ± 12%     -72.3%     757.75 ±105%  numa-vmstat.node0.nr_inactive_anon
      3139 ±  8%     -69.9%     946.25 ± 97%  numa-vmstat.node0.nr_shmem
      2740 ± 12%     -72.3%     757.75 ±105%  numa-vmstat.node0.nr_zone_inactive_anon
    373.75 ± 51%    +443.3%       2030 ± 26%  numa-vmstat.node2.nr_inactive_anon
    496.00 ± 19%    +366.1%       2311 ± 29%  numa-vmstat.node2.nr_shmem
    373.75 ± 51%    +443.3%       2030 ± 26%  numa-vmstat.node2.nr_zone_inactive_anon
     13728 ± 13%    +148.1%      34056 ± 46%  numa-vmstat.node3.nr_active_anon
     78558           +11.3%      87431 ±  6%  numa-vmstat.node3.nr_file_pages
      9939 ±  8%     +19.7%      11902 ± 13%  numa-vmstat.node3.nr_shmem
     13728 ± 13%    +148.1%      34056 ± 46%  numa-vmstat.node3.nr_zone_active_anon
     11103 ± 13%     -71.2%       3201 ± 99%  numa-meminfo.node0.Inactive
     10962 ± 12%     -72.3%       3032 ±105%  numa-meminfo.node0.Inactive(anon)
      8551 ± 31%     -29.4%       6034 ± 18%  numa-meminfo.node0.Mapped
     12560 ±  8%     -69.9%       3786 ± 97%  numa-meminfo.node0.Shmem
      1596 ± 51%    +415.6%       8230 ± 24%  numa-meminfo.node2.Inactive
      1496 ± 51%    +442.8%       8122 ± 26%  numa-meminfo.node2.Inactive(anon)
      1984 ± 19%    +366.1%       9248 ± 29%  numa-meminfo.node2.Shmem
     54929 ± 13%    +148.0%     136212 ± 46%  numa-meminfo.node3.Active
     54929 ± 13%    +148.0%     136206 ± 46%  numa-meminfo.node3.Active(anon)
    314216           +11.3%     349697 ±  6%  numa-meminfo.node3.FilePages
    747907 ±  2%     +15.2%     861672 ±  9%  numa-meminfo.node3.MemUsed
     39744 ±  8%     +19.7%      47580 ± 13%  numa-meminfo.node3.Shmem
     13.94 ±  6%     -13.9        0.00        perf-profile.calltrace.cycles-pp.dnotify_flush.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.7        0.66 ±  8%  perf-profile.calltrace.cycles-pp.__x64_sys_umask.do_syscall_64.entry_SYSCALL_64_after_hwframe
     31.64 ±  8%      +3.4       35.08 ±  5%  perf-profile.calltrace.cycles-pp.__fget_files.ksys_dup.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.82 ±  8%      +5.6       12.41 ± 12%  perf-profile.calltrace.cycles-pp.fput_many.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
     23.54 ± 58%     +12.7       36.27 ±  5%  perf-profile.calltrace.cycles-pp.ksys_dup.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
     23.54 ± 58%     +12.7       36.29 ±  5%  perf-profile.calltrace.cycles-pp.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
     13.98 ±  6%     -14.0        0.00        perf-profile.children.cycles-pp.dnotify_flush
     39.81 ±  6%     -10.8       28.96 ±  9%  perf-profile.children.cycles-pp.filp_close
     40.13 ±  6%     -10.7       29.44 ±  9%  perf-profile.children.cycles-pp.__x64_sys_close
      0.15 ± 10%      -0.0        0.13 ±  8%  perf-profile.children.cycles-pp.scheduler_tick
      0.05 ±  8%      +0.0        0.07 ±  6%  perf-profile.children.cycles-pp.__x64_sys_getuid
      0.10 ±  7%      +0.0        0.12 ±  8%  perf-profile.children.cycles-pp.__prepare_exit_to_usermode
      0.44 ±  7%      +0.1        0.56 ±  6%  perf-profile.children.cycles-pp.syscall_return_via_sysret
     31.78 ±  8%      +3.4       35.22 ±  5%  perf-profile.children.cycles-pp.__fget_files
     32.52 ±  8%      +3.7       36.27 ±  5%  perf-profile.children.cycles-pp.ksys_dup
     32.54 ±  8%      +3.8       36.30 ±  5%  perf-profile.children.cycles-pp.__x64_sys_dup
      6.86 ±  7%      +5.6       12.45 ± 12%  perf-profile.children.cycles-pp.fput_many
     13.91 ±  6%     -13.9        0.00        perf-profile.self.cycles-pp.dnotify_flush
     18.05 ±  5%      -1.6       16.41 ±  7%  perf-profile.self.cycles-pp.filp_close
      0.06 ±  6%      +0.0        0.08 ±  8%  perf-profile.self.cycles-pp.__prepare_exit_to_usermode
      0.09 ±  9%      +0.0        0.11 ±  7%  perf-profile.self.cycles-pp.do_syscall_64
      0.16 ±  9%      +0.0        0.20 ±  4%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.30 ±  8%      +0.1        0.36 ±  7%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.44 ±  7%      +0.1        0.56 ±  6%  perf-profile.self.cycles-pp.syscall_return_via_sysret
     31.61 ±  8%      +3.4       35.00 ±  5%  perf-profile.self.cycles-pp.__fget_files
      6.81 ±  7%      +5.6       12.38 ± 12%  perf-profile.self.cycles-pp.fput_many
     36623 ±  3%     +11.5%      40822 ±  7%  softirqs.CPU100.SCHED
     16499 ± 40%     +27.8%      21088 ± 35%  softirqs.CPU122.RCU
     16758 ± 41%     +30.0%      21781 ± 35%  softirqs.CPU126.RCU
    178.25 ± 11%   +7718.2%      13936 ±168%  softirqs.CPU13.NET_RX
     40883 ±  4%      -6.9%      38055 ±  2%  softirqs.CPU132.SCHED
     16029 ± 41%     +35.9%      21789 ± 33%  softirqs.CPU144.RCU
     16220 ± 43%     +32.4%      21484 ± 35%  softirqs.CPU145.RCU
     16393 ± 39%     +29.9%      21301 ± 32%  softirqs.CPU146.RCU
     16217 ± 39%     +29.8%      21055 ± 35%  softirqs.CPU147.RCU
     37011 ± 12%     +12.4%      41589 ±  5%  softirqs.CPU149.SCHED
     16127 ± 41%     +34.5%      21685 ± 34%  softirqs.CPU150.RCU
     16131 ± 41%     +32.3%      21333 ± 35%  softirqs.CPU151.RCU
     16558 ± 37%     +28.2%      21230 ± 34%  softirqs.CPU152.RCU
     15863 ± 40%     +34.1%      21266 ± 32%  softirqs.CPU153.RCU
     16044 ± 41%     +32.7%      21286 ± 34%  softirqs.CPU154.RCU
     16057 ± 40%     +34.9%      21658 ± 33%  softirqs.CPU155.RCU
     16352 ± 39%     +31.0%      21423 ± 33%  softirqs.CPU156.RCU
     16006 ± 39%     +33.4%      21348 ± 32%  softirqs.CPU158.RCU
     16300 ± 41%     +32.0%      21521 ± 34%  softirqs.CPU161.RCU
     37546 ±  4%     +13.5%      42605 ±  3%  softirqs.CPU161.SCHED
     16411 ± 41%     +33.4%      21894 ± 33%  softirqs.CPU162.RCU
     16329 ± 41%     +32.9%      21704 ± 35%  softirqs.CPU163.RCU
     16517 ± 39%     +29.8%      21441 ± 34%  softirqs.CPU164.RCU
     16227 ± 41%     +32.3%      21471 ± 34%  softirqs.CPU165.RCU
     16347 ± 40%     +31.4%      21481 ± 35%  softirqs.CPU166.RCU
     16360 ± 43%     +32.2%      21631 ± 35%  softirqs.CPU167.RCU
     36986           +11.3%      41148 ±  6%  softirqs.CPU167.SCHED
     16218 ± 44%     +34.7%      21843 ± 33%  softirqs.CPU189.RCU
     16501 ± 39%     +32.0%      21783 ± 33%  softirqs.CPU52.RCU
     17101 ± 41%     +29.4%      22121 ± 35%  softirqs.CPU68.RCU
 1.087e+09           +20.9%  1.314e+09        perf-stat.i.branch-instructions
  19778787           +22.1%   24144895 ± 16%  perf-stat.i.branch-misses
     22.88           -17.7%      18.84 ±  2%  perf-stat.i.cpi
 1.635e+09           +23.6%  2.021e+09        perf-stat.i.dTLB-loads
     20648 ±  2%    +218.4%      65736 ±110%  perf-stat.i.dTLB-store-misses
 1.023e+09           +24.8%  1.276e+09        perf-stat.i.dTLB-stores
     78.10            +1.4       79.54        perf-stat.i.iTLB-load-miss-rate%
  16169669            +8.2%   17493234        perf-stat.i.iTLB-load-misses
 5.364e+09           +21.3%  6.507e+09        perf-stat.i.instructions
    369.33           +11.8%     413.03 ±  5%  perf-stat.i.instructions-per-iTLB-miss
      0.41 ±  2%     +83.3%       0.76 ± 16%  perf-stat.i.metric.K/sec
     19.79           +23.2%      24.39        perf-stat.i.metric.M/sec
   4460149 ±  2%     -45.1%    2447884 ± 14%  perf-stat.i.node-load-misses
    241219 ±  2%     -58.8%      99443 ± 47%  perf-stat.i.node-loads
   1679821 ±  2%      -4.4%    1605611 ±  3%  perf-stat.i.node-store-misses
     25.91           -17.6%      21.36        perf-stat.overall.cpi
     82.51            +1.7       84.17        perf-stat.overall.iTLB-load-miss-rate%
    331.21           +12.2%     371.62        perf-stat.overall.instructions-per-iTLB-miss
      0.04           +21.3%       0.05        perf-stat.overall.ipc
      1566            -8.4%       1435        perf-stat.overall.path-length
 1.089e+09           +21.0%  1.318e+09        perf-stat.ps.branch-instructions
  19801099           +21.7%   24102537 ± 15%  perf-stat.ps.branch-misses
 1.641e+09           +23.6%  2.028e+09        perf-stat.ps.dTLB-loads
     20512 ±  2%    +212.7%      64142 ±109%  perf-stat.ps.dTLB-store-misses
 1.027e+09           +24.8%  1.282e+09        perf-stat.ps.dTLB-stores
  16239916            +8.2%   17567773        perf-stat.ps.iTLB-load-misses
 5.378e+09           +21.4%  6.527e+09        perf-stat.ps.instructions
   4485062 ±  2%     -45.2%    2458026 ± 14%  perf-stat.ps.node-load-misses
    242388 ±  2%     -59.0%      99493 ± 47%  perf-stat.ps.node-loads
   1689890 ±  2%      -4.5%    1614182 ±  3%  perf-stat.ps.node-store-misses
 2.139e+12           +21.5%    2.6e+12        perf-stat.total.instructions
    288.00 ± 13%   +8910.9%      25951 ±168%  interrupts.34:PCI-MSI.524292-edge.eth0-TxRx-3
      2042 ± 57%    +190.2%       5927 ± 26%  interrupts.CPU1.NMI:Non-maskable_interrupts
      2042 ± 57%    +190.2%       5927 ± 26%  interrupts.CPU1.PMI:Performance_monitoring_interrupts
      3.75 ± 34%   +2373.3%      92.75 ±130%  interrupts.CPU100.TLB:TLB_shootdowns
      3510 ± 88%     -85.1%     522.00 ±124%  interrupts.CPU107.NMI:Non-maskable_interrupts
      3510 ± 88%     -85.1%     522.00 ±124%  interrupts.CPU107.PMI:Performance_monitoring_interrupts
      3813 ± 74%     -73.3%       1018 ±150%  interrupts.CPU110.NMI:Non-maskable_interrupts
      3813 ± 74%     -73.3%       1018 ±150%  interrupts.CPU110.PMI:Performance_monitoring_interrupts
      4536 ± 51%     -97.1%     131.50 ±  8%  interrupts.CPU111.NMI:Non-maskable_interrupts
      4536 ± 51%     -97.1%     131.50 ±  8%  interrupts.CPU111.PMI:Performance_monitoring_interrupts
      4476 ± 47%     -97.5%     113.00 ± 19%  interrupts.CPU112.NMI:Non-maskable_interrupts
      4476 ± 47%     -97.5%     113.00 ± 19%  interrupts.CPU112.PMI:Performance_monitoring_interrupts
      3522 ± 36%     +92.7%       6787 ± 16%  interrupts.CPU120.NMI:Non-maskable_interrupts
      3522 ± 36%     +92.7%       6787 ± 16%  interrupts.CPU120.PMI:Performance_monitoring_interrupts
      2888 ± 66%    +117.5%       6283 ± 21%  interrupts.CPU123.NMI:Non-maskable_interrupts
      2888 ± 66%    +117.5%       6283 ± 21%  interrupts.CPU123.PMI:Performance_monitoring_interrupts
      3109 ± 61%    +132.5%       7230 ±  7%  interrupts.CPU124.NMI:Non-maskable_interrupts
      3109 ± 61%    +132.5%       7230 ±  7%  interrupts.CPU124.PMI:Performance_monitoring_interrupts
      1067 ± 19%     -21.6%     836.50        interrupts.CPU125.CAL:Function_call_interrupts
    288.00 ± 13%   +8910.9%      25951 ±168%  interrupts.CPU13.34:PCI-MSI.524292-edge.eth0-TxRx-3
    244.25 ± 96%     -95.3%      11.50 ± 95%  interrupts.CPU13.TLB:TLB_shootdowns
      2056 ±117%    +206.3%       6298 ± 20%  interrupts.CPU130.NMI:Non-maskable_interrupts
      2056 ±117%    +206.3%       6298 ± 20%  interrupts.CPU130.PMI:Performance_monitoring_interrupts
    831.50           +21.4%       1009 ± 13%  interrupts.CPU133.CAL:Function_call_interrupts
      8.00 ± 29%    +634.4%      58.75 ±119%  interrupts.CPU133.RES:Rescheduling_interrupts
      1629 ±159%    +265.3%       5952 ± 29%  interrupts.CPU139.NMI:Non-maskable_interrupts
      1629 ±159%    +265.3%       5952 ± 29%  interrupts.CPU139.PMI:Performance_monitoring_interrupts
      1660 ±159%    +161.0%       4332 ± 61%  interrupts.CPU141.NMI:Non-maskable_interrupts
      1660 ±159%    +161.0%       4332 ± 61%  interrupts.CPU141.PMI:Performance_monitoring_interrupts
    882.75 ±147%    +542.5%       5671 ± 38%  interrupts.CPU143.NMI:Non-maskable_interrupts
    882.75 ±147%    +542.5%       5671 ± 38%  interrupts.CPU143.PMI:Performance_monitoring_interrupts
      2600 ± 29%     +68.8%       4389 ± 47%  interrupts.CPU144.NMI:Non-maskable_interrupts
      2600 ± 29%     +68.8%       4389 ± 47%  interrupts.CPU144.PMI:Performance_monitoring_interrupts
      1494 ± 20%     +91.3%       2859 ± 29%  interrupts.CPU147.NMI:Non-maskable_interrupts
      1494 ± 20%     +91.3%       2859 ± 29%  interrupts.CPU147.PMI:Performance_monitoring_interrupts
      3657 ± 54%     -96.3%     133.75 ±  8%  interrupts.CPU15.NMI:Non-maskable_interrupts
      3657 ± 54%     -96.3%     133.75 ±  8%  interrupts.CPU15.PMI:Performance_monitoring_interrupts
      5165 ± 40%     -97.8%     115.00 ± 26%  interrupts.CPU16.NMI:Non-maskable_interrupts
      5165 ± 40%     -97.8%     115.00 ± 26%  interrupts.CPU16.PMI:Performance_monitoring_interrupts
     34.00 ±125%     -84.6%       5.25 ± 49%  interrupts.CPU186.RES:Rescheduling_interrupts
      1033 ± 24%     -19.0%     836.75        interrupts.CPU190.CAL:Function_call_interrupts
     68.00 ± 28%     +55.5%     105.75 ±  9%  interrupts.CPU26.RES:Rescheduling_interrupts
    882.25 ±  4%      +6.3%     937.75 ±  7%  interrupts.CPU32.CAL:Function_call_interrupts
    139.25 ± 96%     -74.0%      36.25 ± 72%  interrupts.CPU32.TLB:TLB_shootdowns
    848.25 ±130%    +368.9%       3977 ± 56%  interrupts.CPU35.NMI:Non-maskable_interrupts
    848.25 ±130%    +368.9%       3977 ± 56%  interrupts.CPU35.PMI:Performance_monitoring_interrupts
    958.25 ± 11%     -10.6%     856.75        interrupts.CPU36.CAL:Function_call_interrupts
      1903 ± 72%    +127.9%       4337 ± 23%  interrupts.CPU41.NMI:Non-maskable_interrupts
      1903 ± 72%    +127.9%       4337 ± 23%  interrupts.CPU41.PMI:Performance_monitoring_interrupts
      1320 ±158%    +245.4%       4560 ± 32%  interrupts.CPU47.NMI:Non-maskable_interrupts
      1320 ±158%    +245.4%       4560 ± 32%  interrupts.CPU47.PMI:Performance_monitoring_interrupts
    837.50            +5.2%     881.25 ±  4%  interrupts.CPU61.CAL:Function_call_interrupts
      1074 ± 28%     -22.1%     836.50        interrupts.CPU69.CAL:Function_call_interrupts
      1042 ± 12%     -18.7%     847.50 ±  2%  interrupts.CPU86.CAL:Function_call_interrupts


                                                                                
                                  unixbench.score                               
                                                                                
  3200 +--------------------------------------------------------------------+   
       |                 O             O        O                           |   
  3000 |-+    O O           O   O  O O    O O O                             |   
       | O  O      O   O                                                    |   
       |                      O                                             |   
  2800 |-+                                                                  |   
       |                                                                    |   
  2600 |-+                                                                  |   
       |                                                                    |   
  2400 |-+                                                                  |   
       |      +.+..   .+.+..+.         +..+. .+.  .+. .+..+.+.+..+.+.+.  .+.|   
       |.+.. +      .+        +.+..+. +     +   +.   +                 +.   |   
  2200 |-+  +      +                 +                                      |   
       |                                                                    |   
  2000 +--------------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                  unixbench.workload                            
                                                                                
  1.9e+09 +-----------------------------------------------------------------+   
          |                 O O          O        O                         |   
  1.8e+09 |-+    O O              O O  O   O O O                            |   
          | O O      O   O      O                                           |   
  1.7e+09 |-+                                                               |   
          |                                                                 |   
  1.6e+09 |-+                                                               |   
          |                                                                 |   
  1.5e+09 |-+                                                               |   
          |                                                                 |   
  1.4e+09 |-+    +.+    .+..+.+          +.+. .+.. .+.   .+..+. .+. .+..   .|   
          |.+. ..   :  +       + .+.+.. +    +    +   +.+      +   +    +.+ |   
  1.3e+09 |-+ +     : +         +      +                                    |   
          |          +                                                      |   
  1.2e+09 +-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


View attachment "config-5.8.0-rc1-00128-g936e92b615e21" of type "text/plain" (206161 bytes)

View attachment "job-script" of type "text/plain" (7442 bytes)

View attachment "job.yaml" of type "text/plain" (5044 bytes)

View attachment "reproduce" of type "text/plain" (293 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ