lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202301301612.67c70c9-yujie.liu@intel.com>
Date:   Mon, 30 Jan 2023 17:41:46 +0800
From:   kernel test robot <yujie.liu@...el.com>
To:     Kishon Vijay Abraham I <kvijayab@....com>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        <linux-kernel@...r.kernel.org>, <x86@...nel.org>,
        Borislav Petkov <bp@...en8.de>, Leo Duran <leo.duran@....com>,
        Zhang Rui <rui.zhang@...el.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        <linux-pm@...r.kernel.org>, <ying.huang@...el.com>,
        <feng.tang@...el.com>, <zhengjun.xing@...ux.intel.com>,
        <fengwei.yin@...el.com>
Subject: [tip:x86/boot] [x86/acpi/boot] e2869bd7af:
 stress-ng.uprobe.ops_per_sec 29.4% improvement

Greeting,

FYI, we noticed a 29.4% improvement of stress-ng.uprobe.ops_per_sec due to commit:

commit: e2869bd7af608c343988429ceb1c2fe99644a01f ("x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/boot

in testcase: stress-ng
on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
with following parameters:

	nr_threads: 100%
	testtime: 60s
	class: cpu
	test: uprobe
	cpufreq_governor: performance


Details are as below:

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  cpu/gcc-11/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/uprobe/stress-ng/60s

commit: 
  5353fff29e ("scripts/head-object-list: Remove x86 from the list")
  e2869bd7af ("x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC")

5353fff29e42d0ef e2869bd7af608c343988429ceb1 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2628            -1.1%       2598        stress-ng.time.system_time
    217951           +29.4%     281951        stress-ng.uprobe.ops
      3562           +29.4%       4611        stress-ng.uprobe.ops_per_sec
      0.33 ±  6%      +0.0        0.37 ±  2%  mpstat.cpu.all.usr%
     12814           -14.9%      10907        meminfo.KernelStack
     77368           -72.7%      21099        meminfo.Percpu
    188037           -19.0%     152278        meminfo.VmallocUsed
     69774 ±  9%    +961.8%     740875 ±149%  numa-meminfo.node0.FilePages
     66328 ± 10%   +1012.1%     737664 ±149%  numa-meminfo.node0.Unevictable
      6155 ±  7%     -17.8%       5058 ± 11%  numa-meminfo.node1.KernelStack
     17443 ±  9%    +961.8%     185218 ±149%  numa-vmstat.node0.nr_file_pages
     16581 ± 10%   +1012.2%     184416 ±149%  numa-vmstat.node0.nr_unevictable
     16581 ± 10%   +1012.2%     184416 ±149%  numa-vmstat.node0.nr_zone_unevictable
      6154 ±  7%     -17.8%       5058 ± 11%  numa-vmstat.node1.nr_kernel_stack
     12821           -14.9%      10911        proc-vmstat.nr_kernel_stack
     23295            -3.3%      22529        proc-vmstat.nr_slab_reclaimable
     26823            -6.5%      25084        proc-vmstat.nr_slab_unreclaimable
    296402            -1.9%     290725        proc-vmstat.numa_hit
     52723            -1.3%      52059        proc-vmstat.numa_other
    271549 ±  2%      -2.9%     263567        proc-vmstat.pgfault
      0.45 ±  3%     -15.9%       0.38 ± 12%  sched_debug.cfs_rq:/.h_nr_running.stddev
     20849 ± 13%     -22.6%      16144 ±  6%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.43 ±  3%     -17.4%       0.35 ± 19%  sched_debug.cfs_rq:/.nr_running.stddev
     20867 ± 13%     -22.6%      16151 ±  6%  sched_debug.cfs_rq:/.spread0.stddev
    287.77 ±  2%     -20.9%     227.68 ± 10%  sched_debug.cfs_rq:/.util_est_enqueued.stddev
      1554 ±  4%     -17.3%       1285 ± 19%  sched_debug.cpu.curr->pid.stddev
      0.45 ±  2%     -14.6%       0.38 ± 12%  sched_debug.cpu.nr_running.stddev
  12671977            +8.9%   13802380        perf-stat.i.branch-misses
   2731612 ± 11%     +16.3%    3176876 ±  4%  perf-stat.i.cache-misses
  23841254 ±  4%     +16.4%   27741896        perf-stat.i.cache-references
     71372 ±  6%     -22.2%      55528 ±  5%  perf-stat.i.cycles-between-cache-misses
    340368 ±  7%     +23.4%     419908 ± 10%  perf-stat.i.dTLB-store-misses
 4.889e+08 ±  3%     +13.0%  5.527e+08        perf-stat.i.dTLB-stores
     26279 ±  6%     +15.0%      30234 ±  3%  perf-stat.i.iTLB-loads
    240619 ±  3%      -7.8%     221882 ±  5%  perf-stat.i.instructions-per-iTLB-miss
    521.14 ±  3%     +18.0%     615.09 ±  2%  perf-stat.i.metric.K/sec
    885932 ± 17%     +33.7%    1184229 ±  4%  perf-stat.i.node-load-misses
    997827 ± 17%     +33.1%    1328416 ±  4%  perf-stat.i.node-loads
    474784 ± 10%     +32.0%     626649 ±  5%  perf-stat.i.node-store-misses
    651001 ± 11%     +28.9%     839098 ±  4%  perf-stat.i.node-stores
      0.35 ±  6%     +14.1%       0.40 ±  2%  perf-stat.overall.MPKI
     94.34            -1.0       93.35        perf-stat.overall.iTLB-load-miss-rate%
  12473442            +8.8%   13570687        perf-stat.ps.branch-misses
   2690267 ± 11%     +16.2%    3125339 ±  4%  perf-stat.ps.cache-misses
  23486253 ±  4%     +16.3%   27310650        perf-stat.ps.cache-references
    335215 ±  7%     +23.3%     413395 ± 10%  perf-stat.ps.dTLB-store-misses
 4.817e+08 ±  3%     +13.0%  5.442e+08        perf-stat.ps.dTLB-stores
     25873 ±  6%     +15.0%      29754 ±  3%  perf-stat.ps.iTLB-loads
    873032 ± 17%     +33.5%    1165539 ±  4%  perf-stat.ps.node-load-misses
    983400 ± 17%     +33.0%    1307445 ±  4%  perf-stat.ps.node-loads
    467618 ± 10%     +31.9%     616759 ±  5%  perf-stat.ps.node-store-misses
    641166 ± 11%     +28.8%     825704 ±  4%  perf-stat.ps.node-stores
      1.44            -0.2        1.23        perf-profile.calltrace.cycles-pp.trace_find_next_entry_inc.tracing_read_pipe.vfs_read.ksys_read.do_syscall_64
      1.41            -0.2        1.22        perf-profile.calltrace.cycles-pp.__find_next_entry.trace_find_next_entry_inc.tracing_read_pipe.vfs_read.ksys_read
      0.94            +0.1        1.02 ±  2%  perf-profile.calltrace.cycles-pp.ring_buffer_empty_cpu.__find_next_entry.trace_find_next_entry_inc.tracing_read_pipe.vfs_read
      0.00            +0.6        0.55 ±  6%  perf-profile.calltrace.cycles-pp.tracing_wait_pipe.tracing_read_pipe.vfs_read.ksys_read.do_syscall_64
      0.00            +0.6        0.57        perf-profile.calltrace.cycles-pp.trace_print_context.print_trace_fmt.tracing_read_pipe.vfs_read.ksys_read
      0.00            +0.6        0.59 ±  2%  perf-profile.calltrace.cycles-pp.print_trace_fmt.tracing_read_pipe.vfs_read.ksys_read.do_syscall_64
      0.30 ±  2%      -0.2        0.08        perf-profile.children.cycles-pp._find_next_bit
      1.44            -0.2        1.23        perf-profile.children.cycles-pp.trace_find_next_entry_inc
      1.44            -0.2        1.23        perf-profile.children.cycles-pp.__find_next_entry
      0.07 ±  5%      +0.0        0.10 ±  5%  perf-profile.children.cycles-pp.memcpy_erms
      0.08 ±  5%      +0.0        0.11 ±  6%  perf-profile.children.cycles-pp.ring_buffer_empty
      0.10 ±  5%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.trace_print_lat_fmt
      0.11 ±  7%      +0.0        0.14 ±  6%  perf-profile.children.cycles-pp.number
      0.04 ± 57%      +0.0        0.07        perf-profile.children.cycles-pp.trace_event_buffer_reserve
      0.02 ±100%      +0.0        0.06        perf-profile.children.cycles-pp.trace_event_buffer_lock_reserve
      0.15 ±  5%      +0.0        0.18 ±  6%  perf-profile.children.cycles-pp.print_uprobe_event
      0.02 ±100%      +0.0        0.06 ±  6%  perf-profile.children.cycles-pp.ring_buffer_peek
      0.04 ± 58%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.peek_next_entry
      0.09 ±  7%      +0.0        0.13 ± 12%  perf-profile.children.cycles-pp.finish_wait
      0.01 ±173%      +0.0        0.06 ±  7%  perf-profile.children.cycles-pp.__select
      0.15 ±  4%      +0.0        0.20 ±  2%  perf-profile.children.cycles-pp.format_decode
      0.10 ±  4%      +0.0        0.14 ±  7%  perf-profile.children.cycles-pp.__uprobe_trace_func
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.ring_buffer_lock_reserve
      0.12 ±  5%      +0.1        0.17 ±  8%  perf-profile.children.cycles-pp.prepare_to_wait
      0.00            +0.1        0.05 ±  8%  perf-profile.children.cycles-pp.rb_buffer_peek
      0.00            +0.1        0.05 ±  8%  perf-profile.children.cycles-pp.trace_event_buffer_commit
      0.14 ±  3%      +0.1        0.20 ±  8%  perf-profile.children.cycles-pp.handler_chain
      0.14 ±  3%      +0.1        0.20 ±  6%  perf-profile.children.cycles-pp.uprobe_dispatcher
      0.16 ±  3%      +0.1        0.22 ±  6%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.16 ±  3%      +0.1        0.22 ±  5%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
      0.15            +0.1        0.21 ±  7%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
      0.15 ±  2%      +0.1        0.22 ±  6%  perf-profile.children.cycles-pp.asm_exc_int3
      0.14 ±  3%      +0.1        0.21 ±  7%  perf-profile.children.cycles-pp.uprobe_notify_resume
      0.18 ±  8%      +0.1        0.26 ±  3%  perf-profile.children.cycles-pp.trace_empty
      0.14 ±  3%      +0.1        0.21 ±  6%  perf-profile.children.cycles-pp.rb_set_head_page
      0.16 ±  3%      +0.1        0.23 ±  5%  perf-profile.children.cycles-pp.__getpid
      0.22 ±  6%      +0.1        0.31 ±  8%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.29 ±  3%      +0.1        0.41 ±  6%  perf-profile.children.cycles-pp.ring_buffer_wait
      0.30 ±  3%      +0.1        0.42 ±  4%  perf-profile.children.cycles-pp.rb_per_cpu_empty
      0.44 ±  2%      +0.1        0.57 ±  2%  perf-profile.children.cycles-pp.trace_print_context
      0.48 ±  3%      +0.1        0.61        perf-profile.children.cycles-pp.vsnprintf
      0.46 ±  3%      +0.1        0.59 ±  2%  perf-profile.children.cycles-pp.print_trace_fmt
      1.13            +0.1        1.26        perf-profile.children.cycles-pp.ring_buffer_empty_cpu
      0.48 ±  3%      +0.1        0.62        perf-profile.children.cycles-pp.seq_buf_vprintf
      0.33 ±  5%      +0.1        0.47 ±  7%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      0.51 ±  2%      +0.1        0.66        perf-profile.children.cycles-pp.trace_seq_printf
      0.40 ±  3%      +0.2        0.55 ±  6%  perf-profile.children.cycles-pp.tracing_wait_pipe
      0.51 ±  3%      +0.2        0.68        perf-profile.children.cycles-pp._raw_spin_lock
      0.29 ±  2%      -0.2        0.08 ±  5%  perf-profile.self.cycles-pp._find_next_bit
      0.39 ±  3%      -0.1        0.28        perf-profile.self.cycles-pp.ring_buffer_empty_cpu
      0.16 ±  2%      -0.1        0.07 ± 11%  perf-profile.self.cycles-pp.__find_next_entry
      0.07 ±  5%      +0.0        0.10 ±  5%  perf-profile.self.cycles-pp.memcpy_erms
      0.12 ±  3%      +0.0        0.14 ±  3%  perf-profile.self.cycles-pp.vsnprintf
      0.09 ±  8%      +0.0        0.12 ±  3%  perf-profile.self.cycles-pp.number
      0.14 ±  3%      +0.0        0.17 ±  2%  perf-profile.self.cycles-pp.format_decode
      0.17 ±  4%      +0.0        0.22 ±  3%  perf-profile.self.cycles-pp.rb_per_cpu_empty
      0.13 ±  3%      +0.1        0.20 ±  8%  perf-profile.self.cycles-pp.rb_set_head_page
      0.39 ±  3%      +0.1        0.51        perf-profile.self.cycles-pp._raw_spin_lock
      0.33 ±  6%      +0.1        0.47 ±  7%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

View attachment "config-6.2.0-rc3-00003-ge2869bd7af60" of type "text/plain" (166944 bytes)

View attachment "job-script" of type "text/plain" (8055 bytes)

View attachment "job.yaml" of type "text/plain" (5554 bytes)

View attachment "reproduce" of type "text/plain" (339 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ