lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Tue, 15 Nov 2016 01:18:00 +0800
From:   kernel test robot <xiaolong.ye@...el.com>
To:     Jaegeuk Kim <jaegeuk@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>, lkp@...org
Subject: [lkp] [f2fs]  442d0256a5:  fsmark.files_per_sec 46.1% improvement


Greeting,

FYI, we noticed a 46.1% improvement of fsmark.files_per_sec due to commit:


commit 442d0256a5407a1b89d505f0346d92bf14bb1bf5 ("f2fs: remove percpu_count due to performance regression")
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master

in testcase: fsmark
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory
with following parameters:

	iterations: 1x
	nr_threads: 1t
	disk: 1BRD_48G
	fs: f2fs
	filesize: 4M
	test_size: 40G
	sync_method: NoSync
	cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload.
test-url: https://sourceforge.net/projects/fsmark/



Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-6/performance/1BRD_48G/4M/f2fs/1x/x86_64-rhel-7.2/1t/debian-x86_64-2016-08-31.cgz/NoSync/ivb44/40G/fsmark

commit: 
  670be5e771 ("f2fs: make clean inodes when flushing inode page")
  442d0256a5 ("f2fs: remove percpu_count due to performance regression")

670be5e771171195 442d0256a5407a1b89d505f034 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    242.02 ±  1%     +46.1%     353.62 ±  0%  fsmark.files_per_sec
     42.33 ±  1%     -31.6%      28.97 ±  0%  fsmark.time.elapsed_time
     42.33 ±  1%     -31.6%      28.97 ±  0%  fsmark.time.elapsed_time.max
     55.50 ±  0%     +43.7%      79.75 ±  1%  fsmark.time.percent_of_cpu_this_job_got
      2773 ±  1%     -64.0%     997.25 ±  5%  fsmark.time.voluntary_context_switches
     34455 ±  2%     -16.8%      28678 ±  0%  interrupts.CAL:Function_call_interrupts
   7351302 ±  0%     -15.8%    6191740 ±  1%  meminfo.Dirty
     16.50 ± 31%     -89.4%       1.75 ±109%  numa-numastat.node1.other_node
      1873 ± 33%    +385.7%       9099 ± 79%  softirqs.NET_RX
    821551 ±  1%     +45.3%    1193734 ±  0%  vmstat.io.bo
      6944 ±  7%     -11.5%       6145 ±  4%  slabinfo.cred_jar.active_objs
      6944 ±  7%     -11.5%       6145 ±  4%  slabinfo.cred_jar.num_objs
      5230 ±  8%     +18.8%       6211 ±  2%  sched_debug.cpu.nr_switches.max
     11.00 ± 26%    +100.0%      22.00 ±  8%  sched_debug.cpu.nr_uninterruptible.max
      4.01 ± 20%     +25.5%       5.03 ±  4%  sched_debug.cpu.nr_uninterruptible.stddev
     34291 ±132%     -54.7%      15533 ±  3%  latency_stats.avg.max
     46416 ± 83%     -35.7%      29824 ±  0%  latency_stats.max.max
  18117134 ±  2%     -69.2%    5577804 ±  6%  latency_stats.sum.balance_dirty_pages.balance_dirty_pages_ratelimited.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.[f2fs].__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
  18117134 ±  2%     -69.2%    5577804 ±  6%  latency_stats.sum.max
      6.40 ±  2%      +9.1%       6.98 ±  0%  turbostat.%Busy
    166.00 ±  2%     +13.0%     187.50 ±  0%  turbostat.Avg_MHz
     26.09 ±  3%      -9.7%      23.55 ±  7%  turbostat.Pkg%pc2
      6.26 ±  1%     +11.1%       6.96 ±  1%  turbostat.RAMWatt
     26255 ±  0%     -12.2%      23058 ±  1%  cpuidle.C1-IVT.usage
   1083489 ±  6%     -30.9%     748928 ± 38%  cpuidle.C1E-IVT.time
  11379459 ± 25%     -56.7%    4925050 ± 39%  cpuidle.C3-IVT.time
 1.929e+09 ±  1%     -31.2%  1.328e+09 ±  0%  cpuidle.C6-IVT.time
   2280801 ±  2%     -30.9%    1575163 ±  0%  cpuidle.C6-IVT.usage
    790.00 ± 22%     -73.8%     207.00 ± 55%  proc-vmstat.kswapd_low_wmark_hit_quickly
   1837687 ±  0%     -15.8%    1547845 ±  1%  proc-vmstat.nr_dirty
      4095 ± 25%     -56.9%       1765 ± 25%  proc-vmstat.nr_vmscan_immediate_reclaim
   1837844 ±  0%     -15.8%    1548021 ±  1%  proc-vmstat.nr_zone_write_pending
      1043 ± 25%     -62.9%     387.25 ± 57%  proc-vmstat.pageoutrun
    100774 ±  1%     -26.3%      74263 ±  1%  proc-vmstat.pgfault
     67078 ± 33%     +55.0%     103946 ± 24%  numa-meminfo.node0.Active
     37255 ± 28%     +85.2%      68987 ± 19%  numa-meminfo.node0.Active(anon)
     11658 ± 46%    +150.0%      29143 ± 30%  numa-meminfo.node0.AnonHugePages
     35098 ± 27%     +91.7%      67270 ± 21%  numa-meminfo.node0.AnonPages
   3756859 ±  8%     -19.7%    3017524 ±  4%  numa-meminfo.node0.Dirty
     52647 ± 19%     -62.5%      19762 ± 67%  numa-meminfo.node1.Active(anon)
     29839 ± 20%     -84.9%       4510 ± 77%  numa-meminfo.node1.AnonHugePages
     52124 ± 18%     -64.4%      18556 ± 74%  numa-meminfo.node1.AnonPages
  10638108 ± 42%     +32.6%   14110975 ± 38%  numa-meminfo.node1.MemFree
      9308 ± 28%     +85.2%      17241 ± 19%  numa-vmstat.node0.nr_active_anon
      8768 ± 27%     +91.8%      16815 ± 21%  numa-vmstat.node0.nr_anon_pages
    939188 ±  8%     -19.7%     754348 ±  4%  numa-vmstat.node0.nr_dirty
      9308 ± 28%     +85.3%      17247 ± 19%  numa-vmstat.node0.nr_zone_active_anon
    939274 ±  8%     -19.7%     754420 ±  4%  numa-vmstat.node0.nr_zone_write_pending
     13154 ± 19%     -62.5%       4929 ± 67%  numa-vmstat.node1.nr_active_anon
     13026 ± 18%     -64.5%       4629 ± 75%  numa-vmstat.node1.nr_anon_pages
   2660256 ± 42%     +32.6%    3527067 ± 38%  numa-vmstat.node1.nr_free_pages
     13154 ± 19%     -62.5%       4929 ± 67%  numa-vmstat.node1.nr_zone_active_anon
     11.00 ± 25%     -90.9%       1.00 ±173%  numa-vmstat.node1.numa_other
 5.459e+10 ±  5%     -45.7%  2.963e+10 ±  5%  perf-stat.branch-instructions
 3.795e+08 ±  2%     -40.6%  2.255e+08 ±  6%  perf-stat.branch-misses
     63.89 ±  0%      +5.3%      67.28 ±  1%  perf-stat.cache-miss-rate%
 2.509e+09 ±  3%      -7.3%  2.326e+09 ±  5%  perf-stat.cache-misses
 3.927e+09 ±  3%     -12.0%  3.455e+09 ±  3%  perf-stat.cache-references
     71651 ±  2%     -32.3%      48512 ±  3%  perf-stat.context-switches
 4.504e+11 ±  5%     -25.4%  3.361e+11 ±  5%  perf-stat.cpu-cycles
      1461 ±  5%     -25.8%       1085 ±  2%  perf-stat.cpu-migrations
      3.19 ±  2%     -91.1%       0.28 ± 12%  perf-stat.dTLB-load-miss-rate%
 1.748e+09 ±  3%     -94.5%   95740875 ± 10%  perf-stat.dTLB-load-misses
 5.302e+10 ±  0%     -36.3%  3.376e+10 ±  5%  perf-stat.dTLB-loads
 3.923e+10 ±  1%     -27.2%  2.854e+10 ±  0%  perf-stat.dTLB-stores
     70.07 ±  1%      -6.4%      65.61 ±  2%  perf-stat.iTLB-load-miss-rate%
  21266825 ±  1%     -30.5%   14782098 ±  1%  perf-stat.iTLB-load-misses
   9081380 ±  2%     -14.4%    7773636 ±  9%  perf-stat.iTLB-loads
  3.06e+11 ±  5%     -46.2%  1.648e+11 ±  5%  perf-stat.instructions
     14385 ±  4%     -22.5%      11146 ±  4%  perf-stat.instructions-per-iTLB-miss
      0.68 ±  2%     -27.9%       0.49 ±  1%  perf-stat.ipc
     88712 ±  1%     -28.0%      63845 ±  1%  perf-stat.minor-faults
     32.29 ±  1%      +4.8%      33.84 ±  2%  perf-stat.node-store-miss-rate%
     88736 ±  1%     -28.0%      63861 ±  1%  perf-stat.page-faults
      8.76 ± 13%    -100.0%       0.00 ± -1%  perf-profile.calltrace.cycles-pp.__percpu_counter_sum.f2fs_balance_fs.f2fs_write_data_page.f2fs_write_cache_pages.f2fs_write_data_pages
      1.79 ± 10%     -36.2%       1.14 ± 58%  perf-profile.calltrace.cycles-pp.__tick_nohz_idle_enter.tick_nohz_irq_exit.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt
      1.59 ± 10%    -100.0%       0.00 ± -1%  perf-profile.calltrace.cycles-pp._find_next_bit.__percpu_counter_sum.f2fs_balance_fs.f2fs_write_data_page.f2fs_write_cache_pages
      1.09 ± 14%    -100.0%       0.00 ± -1%  perf-profile.calltrace.cycles-pp._find_next_bit.find_next_bit.__percpu_counter_sum.f2fs_balance_fs.f2fs_write_data_page
      9.78 ± 12%    -100.0%       0.00 ± -1%  perf-profile.calltrace.cycles-pp.f2fs_balance_fs.f2fs_write_data_page.f2fs_write_cache_pages.f2fs_write_data_pages.do_writepages
     21.65 ± 13%     -41.1%      12.75 ± 60%  perf-profile.calltrace.cycles-pp.f2fs_write_data_page.f2fs_write_cache_pages.f2fs_write_data_pages.do_writepages.__writeback_single_inode
      1.48 ± 14%    -100.0%       0.00 ± -1%  perf-profile.calltrace.cycles-pp.find_next_bit.__percpu_counter_sum.f2fs_balance_fs.f2fs_write_data_page.f2fs_write_cache_pages
      0.89 ±  9%     -40.1%       0.53 ± 57%  perf-profile.calltrace.cycles-pp.irq_enter.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle
      2.23 ±  9%     -34.7%       1.46 ± 59%  perf-profile.calltrace.cycles-pp.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle
      1.44 ±  9%     -35.8%       0.92 ± 58%  perf-profile.calltrace.cycles-pp.tick_nohz_stop_sched_tick.__tick_nohz_idle_enter.tick_nohz_irq_exit.irq_exit.smp_apic_timer_interrupt
      8.82 ± 12%    -100.0%       0.00 ± -1%  perf-profile.children.cycles-pp.__percpu_counter_sum
      1.86 ±  9%     -35.7%       1.20 ± 58%  perf-profile.children.cycles-pp.__tick_nohz_idle_enter
      2.86 ± 11%     -96.1%       0.11 ± 58%  perf-profile.children.cycles-pp._find_next_bit
      9.83 ± 12%     -99.2%       0.08 ± 57%  perf-profile.children.cycles-pp.f2fs_balance_fs
     21.65 ± 13%     -41.1%      12.75 ± 60%  perf-profile.children.cycles-pp.f2fs_write_data_page
      2.52 ± 12%     -94.2%       0.15 ± 61%  perf-profile.children.cycles-pp.find_next_bit
      0.93 ± 10%     -38.6%       0.57 ± 58%  perf-profile.children.cycles-pp.irq_enter
      1.91 ± 10%     -35.0%       1.24 ± 58%  perf-profile.children.cycles-pp.tick_nohz_irq_exit
      1.51 ±  8%     -35.9%       0.97 ± 58%  perf-profile.children.cycles-pp.tick_nohz_stop_sched_tick
      5.12 ± 14%    -100.0%       0.00 ± -1%  perf-profile.self.cycles-pp.__percpu_counter_sum
      2.86 ± 11%     -96.1%       0.11 ± 58%  perf-profile.self.cycles-pp._find_next_bit
      1.36 ± 11%     -92.8%       0.10 ± 59%  perf-profile.self.cycles-pp.find_next_bit




                               fsmark.files_per_sec

  400 ++--------------------------------------------------------------------+
      O O    O O      O  O O O  O   O  O   O                O               |
  350 ++   O      O O             O      O    O O O  O O O    O O  O O O  O O
  300 ++                                                                    |
      |                                                                     |
  250 *+*..*   *..*.*.*..*.*.*..*.*.*..*.    .*.*.*..                       |
      |    :   :                         *.*.        *                      |
  200 ++   :   :                                                            |
      |     : :                                                             |
  150 ++    : :                                                             |
  100 ++    : :                                                             |
      |     : :                                                             |
   50 ++     :                                                              |
      |      :                                                              |
    0 ++-----*--------------------------------------------------------------+

	[*] bisect-good sample
	[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong

View attachment "config-4.9.0-rc1-00088-g442d025" of type "text/plain" (153709 bytes)

View attachment "job-script" of type "text/plain" (7166 bytes)

View attachment "job.yaml" of type "text/plain" (4788 bytes)

View attachment "reproduce" of type "text/plain" (330 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ