lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202312052213.d20bec0a-oliver.sang@intel.com>
Date:   Tue, 5 Dec 2023 22:57:37 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        <linux-kernel@...r.kernel.org>, <ying.huang@...el.com>,
        <feng.tang@...el.com>, <fengwei.yin@...el.com>,
        <oliver.sang@...el.com>
Subject: [peterz-queue:locking/futex] [futex]  e1a4bd5d6d:
 will-it-scale.per_thread_ops -11.2% regression



Hello,

kernel test robot noticed a -11.2% regression of will-it-scale.per_thread_ops on:


commit: e1a4bd5d6d978ba147f823c669373e3596e0bbcc ("futex: Implement FUTEX2_NUMA")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git locking/futex

testcase: will-it-scale
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

	nr_task: 16
	mode: thread
	test: futex1
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202312052213.d20bec0a-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231205/202312052213.d20bec0a-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/futex1/will-it-scale

commit: 
  38d12f1c15 ("mm: Add vmalloc_huge_node()")
  e1a4bd5d6d ("futex: Implement FUTEX2_NUMA")

38d12f1c15069458 e1a4bd5d6d978ba147f823c6693 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      1.29            -0.1        1.16        mpstat.cpu.all.usr%
     16082 ± 47%    +268.8%      59317 ± 46%  numa-meminfo.node3.AnonHugePages
    443502 ± 10%     -21.7%     347354 ± 16%  numa-numastat.node3.numa_hit
    443856 ± 10%     -21.7%     347355 ± 16%  numa-vmstat.node3.numa_hit
      1821 ± 30%     -45.9%     985.13 ± 52%  sched_debug.cfs_rq:/.load_avg.stddev
   9224874 ±  5%     +54.4%   14242474 ±  5%  meminfo.DirectMap2M
    163286 ±  5%     +30.9%     213804 ±  5%  meminfo.DirectMap4k
      0.55 ±  7%     -14.3%       0.47        turbostat.IPC
     72.33            +1.8%      73.67        turbostat.PkgTmp
 1.155e+08           -11.2%  1.026e+08        will-it-scale.16.threads
   7220531           -11.2%    6414312        will-it-scale.per_thread_ops
 1.155e+08           -11.2%  1.026e+08        will-it-scale.workload
 2.035e+10            -8.9%  1.853e+10        perf-stat.i.branch-instructions
      0.31            -0.0        0.30        perf-stat.i.branch-miss-rate%
  62615280           -12.4%   54851709        perf-stat.i.branch-misses
      0.54            +9.3%       0.59        perf-stat.i.cpi
      0.00 ±  5%      +0.0        0.00 ±  2%  perf-stat.i.dTLB-load-miss-rate%
    139076 ±  5%    +104.7%     284748 ±  2%  perf-stat.i.dTLB-load-misses
 2.634e+10            -8.2%  2.418e+10        perf-stat.i.dTLB-loads
 1.927e+10            -8.8%  1.756e+10        perf-stat.i.dTLB-stores
  55538465           -10.4%   49774500 ±  4%  perf-stat.i.iTLB-load-misses
   2514504           -10.7%    2245869        perf-stat.i.iTLB-loads
  1.25e+11            -8.1%  1.149e+11        perf-stat.i.instructions
      1.85            -8.5%       1.69        perf-stat.i.ipc
    294.40            -8.6%     268.98        perf-stat.i.metric.M/sec
      0.31            -0.0        0.30        perf-stat.overall.branch-miss-rate%
      0.54            +9.3%       0.59        perf-stat.overall.cpi
      0.00 ±  5%      +0.0        0.00 ±  2%  perf-stat.overall.dTLB-load-miss-rate%
      0.00 ±  6%      +0.0        0.00 ±  5%  perf-stat.overall.dTLB-store-miss-rate%
      1.85            -8.5%       1.69        perf-stat.overall.ipc
    325727            +3.2%     336234        perf-stat.overall.path-length
 2.028e+10            -8.9%  1.847e+10        perf-stat.ps.branch-instructions
  62436489           -12.4%   54701854        perf-stat.ps.branch-misses
    138701 ±  5%    +104.7%     283927 ±  2%  perf-stat.ps.dTLB-load-misses
 2.625e+10            -8.2%  2.409e+10        perf-stat.ps.dTLB-loads
  1.92e+10            -8.8%   1.75e+10        perf-stat.ps.dTLB-stores
  55348676           -10.4%   49598644 ±  4%  perf-stat.ps.iTLB-load-misses
   2506036           -10.7%    2238080        perf-stat.ps.iTLB-loads
 1.246e+11            -8.1%  1.145e+11        perf-stat.ps.instructions
 3.763e+13            -8.3%  3.451e+13        perf-stat.total.instructions
     14.56 ±  2%      -1.5       13.06        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
     27.62 ±  2%      -1.4       26.24        perf-profile.calltrace.cycles-pp.get_user_pages_fast.get_futex_key.futex_wake.do_futex.__x64_sys_futex
     25.52 ±  2%      -1.0       24.52        perf-profile.calltrace.cycles-pp.internal_get_user_pages_fast.get_user_pages_fast.get_futex_key.futex_wake.do_futex
     11.08 ±  2%      -0.6       10.48 ±  2%  perf-profile.calltrace.cycles-pp.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast.get_user_pages_fast
      3.74 ±  2%      -0.5        3.26 ±  3%  perf-profile.calltrace.cycles-pp.try_grab_folio.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast
      1.04 ±  3%      -0.3        0.77 ±  3%  perf-profile.calltrace.cycles-pp.is_valid_gup_args.get_user_pages_fast.get_futex_key.futex_wake.do_futex
      2.05 ±  4%      -0.2        1.90 ±  2%  perf-profile.calltrace.cycles-pp.testcase
      1.64 ±  3%      -0.1        1.51        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      1.33 ±  3%      -0.1        1.21        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
      0.98 ±  3%      -0.1        0.87 ±  2%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
      1.02 ±  3%      -0.1        0.91 ±  2%  perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      0.69 ±  2%      -0.1        0.63 ±  3%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      3.88 ±  5%      +0.6        4.44 ±  5%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
      4.71 ±  6%      +0.6        5.31 ±  5%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
     47.98 ±  2%      +2.7       50.66        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     43.62 ±  2%      +3.1       46.74        perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      2.64 ±  3%      +3.2        5.87        perf-profile.calltrace.cycles-pp.futex_hash.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
     41.53 ±  2%      +3.3       44.86        perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     40.19 ±  2%      +3.5       43.64        perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     27.78 ±  2%      -1.4       26.37        perf-profile.children.cycles-pp.get_user_pages_fast
     25.77 ±  2%      -1.0       24.73        perf-profile.children.cycles-pp.internal_get_user_pages_fast
      9.17 ±  2%      -0.9        8.28        perf-profile.children.cycles-pp.entry_SYSCALL_64
     11.42 ±  2%      -0.7       10.77 ±  2%  perf-profile.children.cycles-pp.gup_pte_range
      5.61 ±  3%      -0.6        5.06        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      4.30 ±  2%      -0.5        3.85        perf-profile.children.cycles-pp.try_grab_folio
      1.11 ±  3%      -0.3        0.80 ±  3%  perf-profile.children.cycles-pp.is_valid_gup_args
      2.09 ±  4%      -0.2        1.91        perf-profile.children.cycles-pp.testcase
      2.05 ±  3%      -0.2        1.88        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      1.42 ±  3%      -0.1        1.29        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      1.12 ±  3%      -0.1        1.00        perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.02 ±  3%      -0.1        0.91 ±  2%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
      0.69 ±  2%      -0.1        0.63 ±  3%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.18 ±  9%      -0.0        0.13 ±  7%  perf-profile.children.cycles-pp.syscall@plt
      0.39 ±  5%      -0.0        0.35 ±  3%  perf-profile.children.cycles-pp.folio_fast_pin_allowed
      0.08 ± 12%      +0.0        0.12 ± 12%  perf-profile.children.cycles-pp.rcu_sched_clock_irq
      0.06 ± 17%      +0.0        0.10 ± 14%  perf-profile.children.cycles-pp.rcu_pending
      0.00            +0.1        0.06 ±  9%  perf-profile.children.cycles-pp.update_rq_clock
      0.00            +0.1        0.06 ± 17%  perf-profile.children.cycles-pp.check_cpu_stall
      0.04 ± 45%      +0.1        0.12 ±  6%  perf-profile.children.cycles-pp.sched_clock_cpu
      0.00            +0.1        0.08 ± 11%  perf-profile.children.cycles-pp.hrtimer_forward
      1.02 ±  7%      +0.3        1.29 ±  6%  perf-profile.children.cycles-pp.ktime_get
     48.20 ±  2%      +2.7       50.86        perf-profile.children.cycles-pp.do_syscall_64
     43.65 ±  2%      +3.1       46.74        perf-profile.children.cycles-pp.__x64_sys_futex
      2.65 ±  3%      +3.2        5.88        perf-profile.children.cycles-pp.futex_hash
     41.68 ±  2%      +3.3       44.99        perf-profile.children.cycles-pp.do_futex
     40.38 ±  2%      +3.4       43.81        perf-profile.children.cycles-pp.futex_wake
      7.80 ±  3%      -0.8        6.98        perf-profile.self.cycles-pp.syscall
      5.48 ±  3%      -0.5        4.94        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      4.24 ±  2%      -0.5        3.71        perf-profile.self.cycles-pp.futex_wake
      4.28 ±  2%      -0.5        3.80        perf-profile.self.cycles-pp.try_grab_folio
      2.60 ±  2%      -0.3        2.28 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.01 ±  2%      -0.3        0.70 ±  2%  perf-profile.self.cycles-pp.is_valid_gup_args
      3.79 ±  2%      -0.3        3.50        perf-profile.self.cycles-pp.entry_SYSCALL_64
      1.96 ±  3%      -0.2        1.74 ±  3%  perf-profile.self.cycles-pp.internal_get_user_pages_fast
      1.83 ±  4%      -0.2        1.63        perf-profile.self.cycles-pp.__x64_sys_futex
      1.79 ±  4%      -0.2        1.64        perf-profile.self.cycles-pp.testcase
      1.44 ±  2%      -0.1        1.29 ±  2%  perf-profile.self.cycles-pp.do_futex
      1.40 ±  3%      -0.1        1.25        perf-profile.self.cycles-pp.do_syscall_64
      1.42 ±  3%      -0.1        1.29        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      1.12 ±  3%      -0.1        1.00        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.96 ±  3%      -0.1        0.86 ±  2%  perf-profile.self.cycles-pp.syscall_enter_from_user_mode
      0.96 ±  3%      -0.1        0.87 ±  2%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.27 ±  6%      -0.0        0.24 ±  3%  perf-profile.self.cycles-pp.folio_fast_pin_allowed
      0.00            +0.1        0.06 ± 17%  perf-profile.self.cycles-pp.check_cpu_stall
      0.00            +0.1        0.08 ± 12%  perf-profile.self.cycles-pp.sched_clock_cpu
      0.00            +0.1        0.08 ± 11%  perf-profile.self.cycles-pp.hrtimer_forward
      0.97 ±  8%      +0.3        1.24 ±  6%  perf-profile.self.cycles-pp.ktime_get
      5.72 ±  3%      +2.1        7.83        perf-profile.self.cycles-pp.get_futex_key
      2.51 ±  3%      +3.2        5.73        perf-profile.self.cycles-pp.futex_hash




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ