lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20211229065640.GB6108@xsang-OptiPlex-9020>
Date:   Wed, 29 Dec 2021 14:56:41 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     "Chang S. Bae" <chang.seok.bae@...el.com>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>,
        kernel test robot <oliver.sang@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...ux.intel.com, fengwei.yin@...el.com
Subject: [signal]  6c3118c321:  will-it-scale.per_thread_ops 13.2% improvement


(for previous report "[x86/signal]  3aac3ebea0:  will-it-scale.per_thread_ops
-11.9% regression" [1] which this 6c3118c321 is targeting for, we found we only
tested tglx’s diff shown in here: https://lore.kernel.org/lkml/87bl1s357p.ffs@tglx/,
but didn't test this patch, so still send out this report FYI

[1] https://lore.kernel.org/lkml/20211207012128.GA16074@xsang-OptiPlex-9020/)

Greeting,

FYI, we noticed a 13.2% improvement of will-it-scale.per_thread_ops due to commit:


commit: 6c3118c32129b4197999a8928ba776bcabd0f5c4 ("signal: Skip the altstack update when not needed")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 144 threads 4 sockets Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
with following parameters:

	nr_task: 100%
	mode: thread
	test: signal1
	cpufreq_governor: performance
	ucode: 0x16

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-hsw-4ex1/signal1/will-it-scale/0x16

commit: 
  cabdc3a847 ("sched,x86: Don't use cluster topology for x86 hybrid CPUs")
  6c3118c321 ("signal: Skip the altstack update when not needed")

cabdc3a8475b918e 6c3118c32129b4197999a8928ba 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    252712           +13.2%     286175        will-it-scale.144.threads
      1754           +13.2%       1987        will-it-scale.per_thread_ops
    252712           +13.2%     286175        will-it-scale.workload
     45399 ± 49%     +65.3%      75024 ±  5%  numa-numastat.node1.other_node
      1461 ± 31%     -43.9%     820.50 ± 25%  numa-meminfo.node1.Active
      1461 ± 31%     -43.9%     820.50 ± 25%  numa-meminfo.node1.Active(anon)
   2010729 ± 44%     -99.8%       3939 ± 96%  numa-meminfo.node2.FilePages
   2006836 ± 44%    -100.0%     135.17 ± 88%  numa-meminfo.node2.Unevictable
    361.83 ± 30%     -43.3%     205.00 ± 26%  numa-vmstat.node1.nr_active_anon
    361.83 ± 30%     -43.3%     205.00 ± 26%  numa-vmstat.node1.nr_zone_active_anon
     67182 ± 32%     +43.4%      96361 ±  4%  numa-vmstat.node1.numa_other
    502682 ± 44%     -99.8%     984.83 ± 96%  numa-vmstat.node2.nr_file_pages
    501709 ± 44%    -100.0%      33.33 ± 89%  numa-vmstat.node2.nr_unevictable
    501709 ± 44%    -100.0%      33.33 ± 89%  numa-vmstat.node2.nr_zone_unevictable
  30244982            -3.0%   29346668        perf-stat.i.cache-references
      1689            +1.2%       1709        perf-stat.i.context-switches
  6.01e+08 ±  2%      +9.4%  6.572e+08        perf-stat.i.dTLB-stores
      0.33            -3.3%       0.32        perf-stat.overall.MPKI
 1.104e+08           -11.6%   97613125        perf-stat.overall.path-length
  30288439            -2.7%   29479951        perf-stat.ps.cache-references
      1680            +1.3%       1701        perf-stat.ps.context-switches
 5.995e+08 ±  2%      +9.4%  6.557e+08        perf-stat.ps.dTLB-stores
  10120822 ±  3%  -9.2e+05     9195947 ±  2%  syscalls.sys_getpid.noise.2%
   9318091 ±  4%  -1.1e+06     8255028 ±  3%  syscalls.sys_getpid.noise.25%
  10007277 ±  3%  -9.4e+05     9071322 ±  2%  syscalls.sys_getpid.noise.5%
  10229292 ±  3%  -7.6e+05     9466206 ±  3%  syscalls.sys_gettid.noise.2%
   9734831 ±  3%    -8e+05     8938625 ±  3%  syscalls.sys_gettid.noise.25%
  10103340 ±  3%  -7.8e+05     9322792 ±  3%  syscalls.sys_gettid.noise.5%
 1.597e+09 ±  9%  -7.3e+08   8.641e+08 ± 22%  syscalls.sys_rt_sigprocmask.noise.2%
 4.123e+08 ± 41%  -3.6e+08    52527108 ± 85%  syscalls.sys_rt_sigprocmask.noise.25%
 1.552e+09 ±  9%  -7.8e+08   7.753e+08 ± 27%  syscalls.sys_rt_sigprocmask.noise.5%
    349534           -10.1%     314361 ±  4%  syscalls.sys_tgkill.max
 1.551e+09 ±  6%  -6.2e+08   9.279e+08 ± 20%  syscalls.sys_tgkill.noise.2%
 3.251e+08 ± 35%  -2.6e+08    66880605 ± 79%  syscalls.sys_tgkill.noise.25%
 1.503e+09 ±  7%  -6.6e+08   8.453e+08 ± 24%  syscalls.sys_tgkill.noise.5%
     12.27           -12.3        0.00        perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
     12.24           -12.2        0.00        perf-profile.calltrace.cycles-pp.restore_altstack.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
     12.23           -12.2        0.00        perf-profile.calltrace.cycles-pp.do_sigaltstack.restore_altstack.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
     12.14           -12.1        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.do_sigaltstack.restore_altstack.__x64_sys_rt_sigreturn.do_syscall_64
     12.11           -12.1        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.do_sigaltstack.restore_altstack.__x64_sys_rt_sigreturn
     72.76            -4.8       67.94        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
     72.78            -4.8       67.96        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.raise
     72.97            -4.8       68.18        perf-profile.calltrace.cycles-pp.raise
     15.95            +1.1       17.02        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__lock_task_sighand.do_send_sig_info.do_send_specific
     15.99            +1.1       17.06        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__lock_task_sighand.do_send_sig_info.do_send_specific.do_tkill
     15.99            +1.1       17.06        perf-profile.calltrace.cycles-pp.__lock_task_sighand.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill
     16.22            +1.1       17.33        perf-profile.calltrace.cycles-pp.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64
     16.30            +1.1       17.40        perf-profile.calltrace.cycles-pp.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
     16.28            +1.1       17.39        perf-profile.calltrace.cycles-pp.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.30            +1.1       17.40        perf-profile.calltrace.cycles-pp.do_tkill.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
     14.11            +1.9       16.01        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.__set_current_blocked.signal_setup_done.arch_do_signal_or_restart
     14.14            +1.9       16.04        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__set_current_blocked.signal_setup_done.arch_do_signal_or_restart.exit_to_user_mode_prepare
     14.27            +1.9       16.20        perf-profile.calltrace.cycles-pp.signal_setup_done.arch_do_signal_or_restart.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
     14.26            +1.9       16.19        perf-profile.calltrace.cycles-pp.__set_current_blocked.signal_setup_done.arch_do_signal_or_restart.exit_to_user_mode_prepare.syscall_exit_to_user_mode
     14.29            +1.9       16.22        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
     14.34            +1.9       16.28        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
     14.53            +1.9       16.46        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.get_signal.arch_do_signal_or_restart.exit_to_user_mode_prepare
     14.34            +1.9       16.28        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.handler
     14.34            +1.9       16.28        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
     14.56            +1.9       16.50        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.get_signal.arch_do_signal_or_restart.exit_to_user_mode_prepare.syscall_exit_to_user_mode
     14.40            +1.9       16.35        perf-profile.calltrace.cycles-pp.handler
     14.88            +2.0       16.84        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
     14.74            +2.0       16.72        perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
     15.26            +2.0       17.30        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
     12.25            +2.8       15.03        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64
     12.28            +2.8       15.07        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
     12.39            +2.8       15.22        perf-profile.calltrace.cycles-pp.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.__restore_rt
     12.42            +2.8       15.25        perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.__restore_rt
     12.46            +2.8       15.30        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__restore_rt
     12.46            +2.8       15.30        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__restore_rt
     12.47            +2.8       15.31        perf-profile.calltrace.cycles-pp.__restore_rt
     29.10            +3.9       33.02        perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
     28.35            +4.2       32.52        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.__set_current_blocked.sigprocmask.__x64_sys_rt_sigprocmask
     28.42            +4.2       32.61        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__set_current_blocked.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64
     28.69            +4.2       32.92        perf-profile.calltrace.cycles-pp.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
     28.68            +4.2       32.90        perf-profile.calltrace.cycles-pp.__set_current_blocked.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe
     28.72            +4.2       32.96        perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
     12.24           -12.2        0.00        perf-profile.children.cycles-pp.restore_altstack
     12.24           -12.2        0.00        perf-profile.children.cycles-pp.do_sigaltstack
     24.70            -9.4       15.30        perf-profile.children.cycles-pp.__x64_sys_rt_sigreturn
     72.99            -4.8       68.20        perf-profile.children.cycles-pp.raise
     81.54            -1.3       80.22        perf-profile.children.cycles-pp._raw_spin_lock_irq
     97.30            -0.3       97.04        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     99.66            -0.1       99.61        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     99.64            -0.1       99.59        perf-profile.children.cycles-pp.do_syscall_64
      0.05            +0.0        0.06        perf-profile.children.cycles-pp.restore_sigcontext
      0.06            +0.0        0.07        perf-profile.children.cycles-pp.__setup_rt_frame
      0.11            +0.0        0.12 ±  3%  perf-profile.children.cycles-pp.__send_signal
      0.09 ±  5%      +0.0        0.11 ±  3%  perf-profile.children.cycles-pp.__rb_reserve_next
      0.11 ±  4%      +0.0        0.12 ±  3%  perf-profile.children.cycles-pp.__entry_text_start
      0.14 ±  3%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.recalc_sigpending
      0.13 ±  2%      +0.0        0.15 ±  3%  perf-profile.children.cycles-pp.__set_task_blocked
      0.13 ±  3%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.ring_buffer_lock_reserve
      0.15            +0.0        0.17 ±  4%  perf-profile.children.cycles-pp.ftrace_syscall_exit
      0.18 ±  2%      +0.0        0.20 ±  2%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.17 ±  2%      +0.0        0.20 ±  3%  perf-profile.children.cycles-pp.trace_buffer_lock_reserve
      0.20 ±  3%      +0.0        0.24 ±  2%  perf-profile.children.cycles-pp.syscall_trace_enter
      0.18 ±  3%      +0.0        0.21 ±  3%  perf-profile.children.cycles-pp.ftrace_syscall_enter
     16.00            +1.1       17.07        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     15.99            +1.1       17.06        perf-profile.children.cycles-pp.__lock_task_sighand
     16.23            +1.1       17.33        perf-profile.children.cycles-pp.do_send_sig_info
     16.30            +1.1       17.40        perf-profile.children.cycles-pp.__x64_sys_tgkill
     16.28            +1.1       17.39        perf-profile.children.cycles-pp.do_send_specific
     16.30            +1.1       17.40        perf-profile.children.cycles-pp.do_tkill
     14.27            +1.9       16.20        perf-profile.children.cycles-pp.signal_setup_done
     14.40            +1.9       16.35        perf-profile.children.cycles-pp.handler
     14.75            +2.0       16.73        perf-profile.children.cycles-pp.get_signal
     12.47            +2.8       15.31        perf-profile.children.cycles-pp.__restore_rt
     29.16            +3.9       33.06        perf-profile.children.cycles-pp.exit_to_user_mode_prepare
     29.10            +3.9       33.02        perf-profile.children.cycles-pp.arch_do_signal_or_restart
     29.62            +4.0       33.58        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
     28.69            +4.2       32.92        perf-profile.children.cycles-pp.sigprocmask
     28.72            +4.2       32.96        perf-profile.children.cycles-pp.__x64_sys_rt_sigprocmask
     55.35            +9.0       64.33        perf-profile.children.cycles-pp.__set_current_blocked
     97.30            -0.3       97.04        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.10 ±  4%      +0.0        0.11        perf-profile.self.cycles-pp.recalc_sigpending
      0.28            +0.0        0.32        perf-profile.self.cycles-pp.syscall_exit_to_user_mode




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.16.0-rc4-00002-g6c3118c32129" of type "text/plain" (173551 bytes)

View attachment "job-script" of type "text/plain" (7990 bytes)

View attachment "job.yaml" of type "text/plain" (5368 bytes)

View attachment "reproduce" of type "text/plain" (339 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ