lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Fri, 3 Dec 2021 10:09:52 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Nadav Amit <namit@...are.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...ux.intel.com, fengwei.yin@...el.com
Subject: [hugetlbfs]  a4a118f2ee:  will-it-scale.per_thread_ops -14.9%
 regression



Greeting,

FYI, we noticed a -14.9% regression of will-it-scale.per_thread_ops due to commit:


commit: a4a118f2eead1d6c49e00765de89878288d4b890 ("hugetlbfs: flush TLBs correctly after huge_pmd_unshare")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 104 threads 2 sockets Skylake with 192G memory
with following parameters:

	nr_task: 100%
	mode: thread
	test: context_switch1
	cpufreq_governor: performance
	ucode: 0x2006a0a

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-skl-fpga01/context_switch1/will-it-scale/0x2006a0a

commit: 
  v5.16-rc2
  a4a118f2ee ("hugetlbfs: flush TLBs correctly after huge_pmd_unshare")

       v5.16-rc2 a4a118f2eead1d6c49e00765de8 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  22094930           -14.9%   18801170        will-it-scale.104.threads
    212450           -14.9%     180780        will-it-scale.per_thread_ops
  22094930           -14.9%   18801170        will-it-scale.workload
    104.51            +6.4%     111.15        turbostat.RAMWatt
  21864416           -14.9%   18613340        vmstat.system.cs
      1.61 ± 14%     +42.6%       2.29 ± 11%  perf-stat.i.MPKI
 3.726e+10           -13.5%  3.224e+10        perf-stat.i.branch-instructions
 5.173e+08           -14.1%  4.441e+08        perf-stat.i.branch-misses
      1.71 ± 14%      +8.5       10.23 ±  7%  perf-stat.i.cache-miss-rate%
   4566699 ± 12%    +689.0%   36029296 ±  4%  perf-stat.i.cache-misses
  22042272           -14.9%   18767811        perf-stat.i.context-switches
      1.52           +16.1%       1.76        perf-stat.i.cpi
    170640 ± 18%     -95.0%       8502 ±  4%  perf-stat.i.cycles-between-cache-misses
  44430650           -14.6%   37926361        perf-stat.i.dTLB-load-misses
  5.32e+10           -13.6%  4.594e+10        perf-stat.i.dTLB-loads
      0.00 ±  4%      +0.0        0.00 ± 10%  perf-stat.i.dTLB-store-miss-rate%
  3.23e+10           -13.7%  2.786e+10        perf-stat.i.dTLB-stores
  68025283           -21.9%   53120420 ±  2%  perf-stat.i.iTLB-load-misses
 1.836e+11           -13.5%  1.589e+11        perf-stat.i.instructions
      2820            +9.5%       3089 ±  2%  perf-stat.i.instructions-per-iTLB-miss
      0.66           -13.2%       0.57        perf-stat.i.ipc
      1183           -13.5%       1023        perf-stat.i.metric.M/sec
    274656 ± 40%    +535.1%    1744238 ±  8%  perf-stat.i.node-load-misses
      1.59 ± 13%     +41.3%       2.25 ± 11%  perf-stat.overall.MPKI
      1.59 ± 16%      +8.6       10.18 ±  8%  perf-stat.overall.cache-miss-rate%
      1.51           +15.3%       1.74        perf-stat.overall.cpi
     61473 ± 10%     -87.5%       7707 ±  4%  perf-stat.overall.cycles-between-cache-misses
      0.08            -0.0        0.08        perf-stat.overall.dTLB-load-miss-rate%
      0.00 ±  4%      +0.0        0.00 ± 11%  perf-stat.overall.dTLB-store-miss-rate%
      2700 ±  2%     +10.8%       2992 ±  2%  perf-stat.overall.instructions-per-iTLB-miss
      0.66           -13.3%       0.57        perf-stat.overall.ipc
     32.91 ± 37%     +37.6       70.48 ±  5%  perf-stat.overall.node-load-miss-rate%
   2504472            +1.7%    2546759        perf-stat.overall.path-length
 3.714e+10           -13.5%  3.214e+10        perf-stat.ps.branch-instructions
 5.156e+08           -14.1%  4.427e+08        perf-stat.ps.branch-misses
   4556813 ± 12%    +687.7%   35896229 ±  4%  perf-stat.ps.cache-misses
  21967784           -14.8%   18706255        perf-stat.ps.context-switches
  44284414           -14.6%   37805127        perf-stat.ps.dTLB-load-misses
 5.302e+10           -13.6%   4.58e+10        perf-stat.ps.dTLB-loads
 3.219e+10           -13.7%  2.777e+10        perf-stat.ps.dTLB-stores
  67799006           -21.9%   52946940 ±  2%  perf-stat.ps.iTLB-load-misses
  1.83e+11           -13.5%  1.584e+11        perf-stat.ps.instructions
    274060 ± 40%    +534.0%    1737650 ±  8%  perf-stat.ps.node-load-misses
 5.534e+13           -13.5%  4.788e+13        perf-stat.total.instructions
     29.33            -0.8       28.53        perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write.ksys_write
     28.26            -0.8       27.48        perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
     28.70            -0.8       27.93        perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write
     28.51            -0.8       27.76        perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write
     32.31            -0.5       31.76        perf-profile.calltrace.cycles-pp.pipe_write.new_sync_write.vfs_write.ksys_write.do_syscall_64
     33.10            -0.5       32.56        perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     12.74            -0.5       12.20        perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.pipe_read.new_sync_read
     14.03            -0.5       13.50        perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
     13.95            -0.5       13.42        perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
     34.07            -0.4       33.64        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
      1.04 ±  2%      +0.1        1.16 ±  3%  perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.new_sync_read.vfs_read.ksys_read
      0.68 ±  4%      +0.1        0.81 ±  6%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.82            +0.2        1.04 ±  2%  perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.pipe_read.new_sync_read.vfs_read
      1.00            +0.3        1.32 ±  3%  perf-profile.calltrace.cycles-pp.touch_atime.pipe_read.new_sync_read.vfs_read.ksys_read
     37.78            +0.3       38.13        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
     38.34            +0.4       38.74        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
      1.38 ±  3%      -0.9        0.51 ±  5%  perf-profile.children.cycles-pp.__task_pid_nr_ns
      1.53 ±  3%      -0.9        0.67 ±  5%  perf-profile.children.cycles-pp.perf_event_pid_type
      2.36 ±  3%      -0.8        1.55 ±  4%  perf-profile.children.cycles-pp.__perf_event_header__init_id
     29.35            -0.8       28.55        perf-profile.children.cycles-pp.__wake_up_common_lock
     28.28            -0.8       27.50        perf-profile.children.cycles-pp.try_to_wake_up
     28.70            -0.8       27.94        perf-profile.children.cycles-pp.__wake_up_common
     28.52            -0.8       27.77        perf-profile.children.cycles-pp.autoremove_wake_function
     33.12            -0.5       32.58        perf-profile.children.cycles-pp.new_sync_write
     32.35            -0.5       31.80        perf-profile.children.cycles-pp.pipe_write
     12.75            -0.5       12.21        perf-profile.children.cycles-pp.dequeue_task_fair
     13.96            -0.5       13.43        perf-profile.children.cycles-pp.enqueue_task_fair
     14.03            -0.5       13.50        perf-profile.children.cycles-pp.ttwu_do_activate
     34.08            -0.4       33.66        perf-profile.children.cycles-pp.vfs_write
      0.12 ±  5%      -0.0        0.08 ±  3%  perf-profile.children.cycles-pp.fput
      0.12 ±  3%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.child
      0.37 ±  2%      -0.0        0.35 ±  2%  perf-profile.children.cycles-pp.tick_sched_handle
      0.10 ±  5%      +0.0        0.12 ±  5%  perf-profile.children.cycles-pp.__list_add_valid
      0.20 ±  3%      +0.0        0.23 ±  4%  perf-profile.children.cycles-pp.make_kgid
      0.09 ±  6%      +0.0        0.12 ±  3%  perf-profile.children.cycles-pp.clear_buddies
      0.13 ±  5%      +0.0        0.17 ±  5%  perf-profile.children.cycles-pp.local_clock
      0.05 ±  5%      +0.0        0.08 ±  7%  perf-profile.children.cycles-pp.rb_insert_color
      0.11 ±  4%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.check_cfs_rq_runtime
      0.28 ±  3%      +0.0        0.31 ±  3%  perf-profile.children.cycles-pp.map_id_range_down
      0.48 ±  3%      +0.0        0.53 ±  3%  perf-profile.children.cycles-pp.__might_sleep
      0.35 ±  4%      +0.1        0.40 ±  3%  perf-profile.children.cycles-pp.__might_fault
      0.83 ±  2%      +0.1        0.88 ±  2%  perf-profile.children.cycles-pp.set_next_entity
      0.00            +0.1        0.06 ±  6%  perf-profile.children.cycles-pp.default_wake_function
      0.51 ±  3%      +0.1        0.62 ±  3%  perf-profile.children.cycles-pp.pick_next_entity
      0.15 ±  6%      +0.1        0.26 ±  9%  perf-profile.children.cycles-pp.timestamp_truncate
      0.40 ±  7%      +0.1        0.52 ± 10%  perf-profile.children.cycles-pp.file_update_time
      1.07 ±  2%      +0.1        1.19 ±  2%  perf-profile.children.cycles-pp.copy_page_to_iter
      0.00            +0.1        0.12 ± 34%  perf-profile.children.cycles-pp.__mark_inode_dirty
      0.00            +0.1        0.12 ± 32%  perf-profile.children.cycles-pp.generic_update_time
      1.28 ±  3%      +0.2        1.45 ±  4%  perf-profile.children.cycles-pp.security_file_permission
      0.86            +0.2        1.06 ±  2%  perf-profile.children.cycles-pp.atime_needs_update
      2.51            +0.3        2.78 ±  2%  perf-profile.children.cycles-pp.pick_next_task_fair
      1.00            +0.3        1.32 ±  3%  perf-profile.children.cycles-pp.touch_atime
      1.37 ±  3%      -0.9        0.50 ±  5%  perf-profile.self.cycles-pp.__task_pid_nr_ns
      1.23 ±  4%      -0.4        0.86 ±  6%  perf-profile.self.cycles-pp.update_curr
      0.32 ±  3%      -0.0        0.27 ±  3%  perf-profile.self.cycles-pp.schedule
      0.20 ±  6%      -0.0        0.16 ±  7%  perf-profile.self.cycles-pp.current_time
      0.12 ±  2%      -0.0        0.10 ±  6%  perf-profile.self.cycles-pp.child
      0.06 ±  6%      +0.0        0.07 ±  5%  perf-profile.self.cycles-pp.__might_fault
      0.13 ±  3%      +0.0        0.14 ±  3%  perf-profile.self.cycles-pp.__cond_resched
      0.12 ±  3%      +0.0        0.14 ±  4%  perf-profile.self.cycles-pp.put_prev_entity
      0.12 ±  4%      +0.0        0.14 ±  6%  perf-profile.self.cycles-pp.touch_atime
      0.08 ±  6%      +0.0        0.10 ±  3%  perf-profile.self.cycles-pp.clear_buddies
      0.17 ±  4%      +0.0        0.20 ±  6%  perf-profile.self.cycles-pp.ksys_write
      0.06 ±  7%      +0.0        0.09 ±  4%  perf-profile.self.cycles-pp.check_cfs_rq_runtime
      0.05            +0.0        0.08 ±  7%  perf-profile.self.cycles-pp.rb_insert_color
      0.26 ±  4%      +0.0        0.29 ±  3%  perf-profile.self.cycles-pp.map_id_range_down
      0.12 ±  7%      +0.0        0.16 ±  5%  perf-profile.self.cycles-pp.local_clock
      0.17 ±  3%      +0.0        0.21 ±  3%  perf-profile.self.cycles-pp.set_next_entity
      0.41 ±  4%      +0.0        0.45 ±  3%  perf-profile.self.cycles-pp.__might_sleep
      0.00            +0.1        0.06 ±  5%  perf-profile.self.cycles-pp.default_wake_function
      0.37 ±  3%      +0.1        0.46 ± 13%  perf-profile.self.cycles-pp.vfs_write
      0.39 ±  4%      +0.1        0.49 ±  6%  perf-profile.self.cycles-pp.new_sync_read
      0.38 ±  3%      +0.1        0.48 ±  3%  perf-profile.self.cycles-pp.pick_next_entity
      0.14 ±  7%      +0.1        0.25 ± 10%  perf-profile.self.cycles-pp.timestamp_truncate
      0.00            +0.1        0.12 ± 34%  perf-profile.self.cycles-pp.__mark_inode_dirty
      0.86 ±  3%      +0.1        1.00 ±  4%  perf-profile.self.cycles-pp.pipe_write
      0.36 ±  4%      +0.1        0.50 ±  8%  perf-profile.self.cycles-pp.vfs_read
      0.26 ±  5%      +0.1        0.41 ±  5%  perf-profile.self.cycles-pp.atime_needs_update
      0.19 ± 11%      +0.2        0.35 ± 14%  perf-profile.self.cycles-pp.security_file_permission




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.16.0-rc2-00001-ga4a118f2eead" of type "text/plain" (173517 bytes)

View attachment "job-script" of type "text/plain" (7850 bytes)

View attachment "job.yaml" of type "text/plain" (5269 bytes)

View attachment "reproduce" of type "text/plain" (347 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ