lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171225024513.GB31543@yexl-desktop>
Date:   Mon, 25 Dec 2017 10:45:13 +0800
From:   kernel test robot <xiaolong.ye@...el.com>
To:     Scott Mayhew <smayhew@...hat.com>
Cc:     Anna Schumaker <Anna.Schumaker@...app.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [lkp-robot] [nfs]  dc4fd9ab01:  fsmark.app_overhead 850.9% regression


Greeting,

FYI, we noticed a 850.9% regression of fsmark.app_overhead due to commit:


commit: dc4fd9ab01ab379ae5af522b3efd4187a7c30a31 ("nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: fsmark
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory
with following parameters:

	iterations: 1x
	nr_threads: 64t
	disk: 1BRD_48G
	fs: f2fs
	fs2: nfsv4
	filesize: 4M
	test_size: 40G
	sync_method: fsyncBeforeClose
	cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload.
test-url: https://sourceforge.net/projects/fsmark/



Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-7/performance/1BRD_48G/4M/nfsv4/f2fs/1x/x86_64-rhel-7.2/64t/debian-x86_64-2016-08-31.cgz/fsyncBeforeClose/ivb44/40G/fsmark

commit: 
  ccede75985 ("xprtrdma: Spread reply processing over more CPUs")
  dc4fd9ab01 ("nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests")

ccede7598588ae34 dc4fd9ab01ab379ae5af522b3e 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  13296395 ±  7%    +850.9%  1.264e+08        fsmark.app_overhead
   1230639            -2.7%    1197079        fsmark.time.voluntary_context_switches
      1.08            +0.1        1.21        mpstat.cpu.iowait%
      6.00           +16.7%       7.00        nfsstat.Client.nfs.v4.getattr
    910.00 ±  2%      -9.5%     823.50        proc-vmstat.kswapd_low_wmark_hit_quickly
     23577            -2.2%      23058        vmstat.system.cs
  53057401 ±  2%     +18.5%   62851465        cpuidle.C3.time
  29194791 ± 44%     -54.6%   13240888 ± 33%  cpuidle.POLL.time
    807076 ± 29%     -47.3%     425456 ± 13%  numa-numastat.node0.numa_miss
    812559 ± 29%     -46.9%     431720 ± 12%  numa-numastat.node0.other_node
    807076 ± 29%     -47.3%     425456 ± 13%  numa-numastat.node1.numa_foreign
      6413           +14.7%       7358        numa-meminfo.node0.KernelStack
      2369 ±  8%     +46.6%       3474 ±  7%  numa-meminfo.node0.PageTables
      6744           -12.4%       5908 ±  2%  numa-meminfo.node1.KernelStack
      3512 ±  6%     -31.9%       2392 ± 13%  numa-meminfo.node1.PageTables
    565.50 ± 64%     +79.5%       1015 ±  4%  slabinfo.btrfs_extent_buffer.active_objs
    565.50 ± 64%     +79.5%       1015 ±  4%  slabinfo.btrfs_extent_buffer.num_objs
      1827 ±  4%     +20.9%       2210 ±  5%  slabinfo.fsnotify_mark_connector.active_objs
      1827 ±  4%     +20.9%       2210 ±  5%  slabinfo.fsnotify_mark_connector.num_objs
    809.25 ±  9%     +16.9%     946.00 ±  4%  slabinfo.nfsd4_openowners.active_objs
    809.25 ±  9%     +16.9%     946.00 ±  4%  slabinfo.nfsd4_openowners.num_objs
    194.75 ±  2%      -5.0%     185.00        turbostat.Avg_MHz
      0.77 ±  2%      +0.1        0.91        turbostat.C3%
      0.65 ±  2%     +22.7%       0.80        turbostat.CPU%c3
     62.34 ±  2%      -4.5%      59.52        turbostat.CorWatt
     70.25 ±  3%      -6.8%      65.50        turbostat.PkgTmp
     93.02            -3.4%      89.82        turbostat.PkgWatt
      6411           +14.7%       7357        numa-vmstat.node0.nr_kernel_stack
    590.75 ±  8%     +46.7%     866.75 ±  7%  numa-vmstat.node0.nr_page_table_pages
    120900 ± 13%     +27.2%     153728 ±  6%  numa-vmstat.node0.numa_foreign
     77163 ± 42%     -49.3%      39123 ± 24%  numa-vmstat.node0.numa_miss
     82748 ± 35%     -45.1%      45452 ± 25%  numa-vmstat.node0.numa_other
      6741           -12.4%       5907 ±  2%  numa-vmstat.node1.nr_kernel_stack
    877.00 ±  6%     -31.9%     597.50 ± 13%  numa-vmstat.node1.nr_page_table_pages
     77170 ± 42%     -49.3%      39127 ± 24%  numa-vmstat.node1.numa_foreign
    120916 ± 13%     +27.2%     153763 ±  6%  numa-vmstat.node1.numa_miss
    285504 ±  5%     +11.4%     318085 ±  3%  numa-vmstat.node1.numa_other
      1737 ± 10%     -34.5%       1137 ± 50%  sched_debug.cfs_rq:/.exec_clock.max
    281.24 ±  5%     -33.3%     187.70 ± 48%  sched_debug.cfs_rq:/.exec_clock.stddev
    417.83 ±  4%     -16.9%     347.04 ± 22%  sched_debug.cfs_rq:/.runnable_load_avg.max
    553.42 ± 45%     -37.5%     346.04 ± 22%  sched_debug.cpu.cpu_load[0].max
    129.77 ±  7%     -29.0%      92.14 ± 37%  sched_debug.cpu.cpu_load[0].stddev
    388.83 ± 14%     -24.7%     292.79 ± 26%  sched_debug.cpu.cpu_load[2].max
    347.25 ±  8%     -27.3%     252.50 ± 34%  sched_debug.cpu.cpu_load[3].max
    320.17 ±  2%     -30.7%     222.00 ± 42%  sched_debug.cpu.cpu_load[4].max
     62796 ±  6%     -31.8%      42829 ± 20%  sched_debug.cpu.sched_count.max
      7742 ±  6%     -26.2%       5714 ± 14%  sched_debug.cpu.sched_count.stddev
 1.374e+11 ±  3%     -11.8%  1.212e+11 ±  5%  perf-stat.branch-instructions
      0.93 ±  5%      +0.1        1.07 ±  7%  perf-stat.branch-miss-rate%
  4.55e+09 ±  3%      -6.8%  4.242e+09 ±  3%  perf-stat.cache-misses
 9.994e+09 ±  2%      -5.4%  9.455e+09 ±  3%  perf-stat.cache-references
   3386653            -1.4%    3340360        perf-stat.context-switches
 1.559e+12           -11.8%  1.375e+12 ±  4%  perf-stat.cpu-cycles
     16599           -25.5%      12360        perf-stat.cpu-migrations
 1.712e+11 ±  3%      -8.6%  1.564e+11        perf-stat.dTLB-loads
 7.092e+11 ±  3%     -11.7%  6.263e+11 ±  5%  perf-stat.instructions
      7701 ±  4%     -12.5%       6741 ±  5%  perf-stat.instructions-per-iTLB-miss
     39.52            -0.5       39.06        perf-stat.node-load-miss-rate%
 9.658e+08            -4.0%   9.27e+08        perf-stat.node-load-misses
     22.56            -0.6       21.96        perf-stat.node-store-miss-rate%
 5.902e+08            -3.8%  5.678e+08        perf-stat.node-store-misses
     12.98 ± 48%      -1.6       11.41 ± 32%  perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle
     13.80 ± 11%      -0.1       13.66 ± 10%  perf-profile.calltrace.cycles-pp.__irqentry_text_start.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle
     13.64 ± 10%      -0.1       13.51 ± 11%  perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter_state.cpuidle_enter.call_cpuidle
     87.54            -0.1       87.48        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle.cpu_startup_entry
     86.74            -0.0       86.71        perf-profile.calltrace.cycles-pp.cpuidle_enter.call_cpuidle.do_idle.cpu_startup_entry.start_secondary
     86.83            -0.0       86.82        perf-profile.calltrace.cycles-pp.call_cpuidle.do_idle.cpu_startup_entry.start_secondary.verify_cpu
     93.11            +0.0       93.12        perf-profile.calltrace.cycles-pp.verify_cpu
      5.83 ±  9%      +0.1        5.89 ± 14%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter_state.cpuidle_enter
     90.70            +0.1       90.79        perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.verify_cpu
     91.28            +0.2       91.44        perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.verify_cpu
     91.28            +0.2       91.44        perf-profile.calltrace.cycles-pp.start_secondary.verify_cpu
     57.64 ±  6%      +1.5       59.11 ±  3%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle
     13.04 ± 49%      -1.6       11.41 ± 32%  perf-profile.children.cycles-pp.poll_idle
     88.49            -0.2       88.33        perf-profile.children.cycles-pp.cpuidle_enter
     14.31 ± 10%      -0.1       14.17 ± 10%  perf-profile.children.cycles-pp.__irqentry_text_start
     14.06 ± 10%      -0.1       13.92 ± 10%  perf-profile.children.cycles-pp.smp_apic_timer_interrupt
     88.60            -0.1       88.49        perf-profile.children.cycles-pp.call_cpuidle
     87.57            -0.1       87.51        perf-profile.children.cycles-pp.cpuidle_enter_state
     92.53            -0.1       92.47        perf-profile.children.cycles-pp.do_idle
     93.11            +0.0       93.12        perf-profile.children.cycles-pp.verify_cpu
     93.11            +0.0       93.12        perf-profile.children.cycles-pp.cpu_startup_entry
      6.00 ±  9%      +0.1        6.09 ± 14%  perf-profile.children.cycles-pp.hrtimer_interrupt
     91.28            +0.2       91.44        perf-profile.children.cycles-pp.start_secondary
     57.72 ±  6%      +1.5       59.18 ±  3%  perf-profile.children.cycles-pp.intel_idle
     13.02 ± 49%      -1.6       11.41 ± 32%  perf-profile.self.cycles-pp.poll_idle
     57.72 ±  6%      +1.5       59.18 ±  3%  perf-profile.self.cycles-pp.intel_idle


                                                                                
                                  fsmark.app_overhead                           
                                                                                
  1.6e+08 +-+---------------------------------------------------------------+   
          | O                                                               |   
  1.4e+08 OO+OOOOOO                                                         |   
  1.2e+08 +-+      OOOOOOO O                                                |   
          |                                                                 |   
    1e+08 +-+                                                               |   
          |                                                                 |   
    8e+07 +-+                                                               |   
          |                                                                 |   
    6e+07 +-+                                                               |   
    4e+07 +-+                                                               |   
          |                                                                 |   
    2e+07 +-+  +                                                            |   
          |+ ++ + ++++++++.++++++++++++++++++++++++++++++++.++++++++++++++++|   
        0 +-+---------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                              perf-stat.cpu-migrations                          
                                                                                
  20000 +-+-----------------------------------------------------------------+   
  18000 +-+  +                                                              |   
        | ::+ :          ++++++++.  ++ ++++ ++++++.++  ++ +++++ +    +++++++|   
  16000 +-++  +  ++++++++         ++  +    +         ++  +     + ++.+       |   
  14000 +-+   :  :                                                          |   
        |    O:  :OOOOOOOO                                                  |   
  12000 OOOOO OO:O                                                          |   
  10000 +-+   : :                                                           |   
   8000 +-+   : :                                                           |   
        |      ::                                                           |   
   6000 +-+    ::                                                           |   
   4000 +-+    ::                                                           |   
        |      :                                                            |   
   2000 +-+    :                                                            |   
      0 +-+-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong

View attachment "config-4.14.0-rc5-00068-gdc4fd9a" of type "text/plain" (163335 bytes)

View attachment "job-script" of type "text/plain" (7670 bytes)

View attachment "job.yaml" of type "text/plain" (5275 bytes)

View attachment "reproduce" of type "text/plain" (2124 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ