lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20220113055122.GC18396@xsang-OptiPlex-9020>
Date:   Thu, 13 Jan 2022 13:51:22 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     Theodore Ts'o <tytso@....edu>, Ard Biesheuvel <ardb@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...ux.intel.com, fengwei.yin@...el.com
Subject: [random]  2ee25b6968:  stress-ng.getrandom.ops_per_sec 47.7%
 improvement



Greeting,

FYI, we noticed a 47.7% improvement of stress-ng.getrandom.ops_per_sec due to commit:


commit: 2ee25b6968b1b3c66ffa408de23d023c1bce81cf ("random: avoid superfluous call to RDRAND in CRNG extraction")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
with following parameters:

	nr_threads: 100%
	testtime: 60s
	class: cpu
	test: getrandom
	cpufreq_governor: performance
	ucode: 0x5003102


In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.getrandom.ops_per_sec 33.0% improvement                    |
| test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory |
| test parameters  | class=cpu                                                                       |
|                  | cpufreq_governor=performance                                                    |
|                  | nr_threads=100%                                                                 |
|                  | test=getrandom                                                                  |
|                  | testtime=60s                                                                    |
|                  | ucode=0x42e                                                                     |
+------------------+---------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
  cpu/gcc-9/performance/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/getrandom/stress-ng/60s/0x5003102

commit: 
  96562f2868 ("random: early initialization of ChaCha constants")
  2ee25b6968 ("random: avoid superfluous call to RDRAND in CRNG extraction")

96562f286884e2db 2ee25b6968b1b3c66ffa408de23 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1582098 ±  2%     +47.7%    2337043        stress-ng.getrandom.ops
     26367 ±  2%     +47.7%      38950        stress-ng.getrandom.ops_per_sec
     36932            +2.3%      37791        stress-ng.time.involuntary_context_switches
 3.365e+08 ±  3%     +24.5%  4.191e+08 ±  7%  cpuidle..time
    402.02            -1.8%     394.71        pmeter.Average_Active_Power
     59.00            -4.0%      56.67        turbostat.PkgTmp
    211108            -1.0%     209101        vmstat.system.in
      4.66            +1.9        6.53 ± 10%  mpstat.cpu.all.idle%
      0.50            +0.0        0.54 ±  2%  mpstat.cpu.all.irq%
      0.14 ±  3%      -0.0        0.12 ± 14%  mpstat.cpu.all.usr%
    191308 ± 11%     -55.2%      85772 ±  3%  numa-numastat.node0.local_node
    249395 ±  2%     -30.7%     172877        numa-numastat.node0.numa_hit
     58087 ± 39%     +50.0%      87104        numa-numastat.node0.other_node
    173851 ± 12%     +58.9%     276208        numa-numastat.node1.local_node
    202838 ±  3%     +36.2%     276198        numa-numastat.node1.numa_hit
      7408            +5.8%       7840        proc-vmstat.nr_active_anon
      2668 ±  3%      -5.2%       2528        proc-vmstat.nr_page_table_pages
     11807            +3.7%      12242        proc-vmstat.nr_shmem
      7408            +5.8%       7840        proc-vmstat.nr_zone_active_anon
     21247            +3.5%      21981        proc-vmstat.pgactivate
     75695 ±  6%     -26.3%      55749 ± 22%  numa-meminfo.node0.KReclaimable
     75695 ±  6%     -26.3%      55749 ± 22%  numa-meminfo.node0.SReclaimable
    182572 ±  5%     -14.6%     155921        numa-meminfo.node0.Slab
     26308 ± 17%     +74.9%      46018 ± 25%  numa-meminfo.node1.KReclaimable
   1145005 ±  7%     +67.8%    1921879 ± 45%  numa-meminfo.node1.MemUsed
     26308 ± 17%     +74.9%      46018 ± 25%  numa-meminfo.node1.SReclaimable
    111636 ±  7%     +24.4%     138864        numa-meminfo.node1.Slab
     18923 ±  6%     -26.4%      13936 ± 22%  numa-vmstat.node0.nr_slab_reclaimable
      3899 ±  4%      +8.3%       4221 ±  5%  numa-vmstat.node0.numa_interleave
   1525275 ±  7%     -14.6%    1302321 ± 12%  numa-vmstat.node0.numa_local
     59275 ± 41%     +53.7%      91128        numa-vmstat.node0.numa_other
      6576 ± 17%     +74.9%      11500 ± 25%  numa-vmstat.node1.nr_slab_reclaimable
      4170 ±  4%      -7.4%       3860 ±  6%  numa-vmstat.node1.numa_interleave
    722855 ± 17%     +30.3%     941837 ± 17%  numa-vmstat.node1.numa_local
     67164 ± 35%     -47.5%      35243        numa-vmstat.node1.numa_other
     98.00            -0.1       97.91        perf-profile.calltrace.cycles-pp._extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
      0.69 ±  7%      +0.7        1.36        perf-profile.calltrace.cycles-pp.chacha_block_generic._extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.17 ±141%      +1.0        1.12        perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic._extract_crng.urandom_read_nowarn.do_syscall_64
     98.76            -0.1       98.69        perf-profile.children.cycles-pp._extract_crng
      0.08 ±  6%      +0.0        0.11 ±  4%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.__might_fault
      0.12 ±  6%      +0.1        0.18        perf-profile.children.cycles-pp._copy_to_user
      0.48 ±  6%      +0.7        1.14        perf-profile.children.cycles-pp.chacha_permute
      0.69 ±  7%      +0.7        1.37        perf-profile.children.cycles-pp.chacha_block_generic
      0.89 ± 10%      -0.8        0.10 ±  8%  perf-profile.self.cycles-pp._extract_crng
      0.08 ±  6%      +0.0        0.11        perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      0.26 ±  5%      +0.1        0.40 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.48 ±  6%      +0.7        1.14        perf-profile.self.cycles-pp.chacha_permute
      4.65 ±  4%     +24.6%       5.79 ±  2%  perf-stat.i.MPKI
  5.33e+09            +1.4%  5.402e+09        perf-stat.i.branch-instructions
      0.95 ±  9%      -0.2        0.70 ± 10%  perf-stat.i.cache-miss-rate%
 1.189e+08 ±  3%     +35.5%  1.611e+08 ±  3%  perf-stat.i.cache-references
      9.83            -8.1%       9.03        perf-stat.i.cpi
    557463            -3.3%     539225        perf-stat.i.cycles-between-cache-misses
 5.913e+09            +5.5%  6.241e+09        perf-stat.i.dTLB-loads
 6.913e+08           +37.7%  9.518e+08        perf-stat.i.dTLB-stores
 2.548e+10            +8.4%  2.762e+10        perf-stat.i.instructions
      0.13            +6.5%       0.13        perf-stat.i.ipc
     12.24            -1.7%      12.03        perf-stat.i.major-faults
    125.53            +5.8%     132.84        perf-stat.i.metric.M/sec
     80.14            +2.6       82.73        perf-stat.i.node-load-miss-rate%
     55782           -10.1%      50128        perf-stat.i.node-loads
     57.92 ±  6%     +23.3       81.21        perf-stat.i.node-store-miss-rate%
     49315 ±  6%     +19.2%      58807 ±  3%  perf-stat.i.node-store-misses
     39223 ±  4%     -44.3%      21849 ±  2%  perf-stat.i.node-stores
      4.67 ±  4%     +24.9%       5.83 ±  3%  perf-stat.overall.MPKI
      0.57            -0.2        0.42 ±  5%  perf-stat.overall.cache-miss-rate%
     10.05            -8.0%       9.24        perf-stat.overall.cpi
      0.00 ±  2%      -0.0        0.00 ± 13%  perf-stat.overall.dTLB-store-miss-rate%
      0.10            +8.8%       0.11        perf-stat.overall.ipc
     70.08            +2.0       72.05        perf-stat.overall.node-load-miss-rate%
     55.46 ±  5%     +17.3       72.72        perf-stat.overall.node-store-miss-rate%
 5.246e+09            +1.4%  5.318e+09        perf-stat.ps.branch-instructions
 1.171e+08 ±  3%     +35.4%  1.586e+08 ±  3%  perf-stat.ps.cache-references
 5.821e+09            +5.5%  6.143e+09        perf-stat.ps.dTLB-loads
 6.808e+08           +37.7%  9.371e+08        perf-stat.ps.dTLB-stores
 2.508e+10            +8.4%  2.719e+10        perf-stat.ps.instructions
     57596           -11.8%      50795        perf-stat.ps.node-loads
     48478 ±  7%     +19.4%      57879 ±  3%  perf-stat.ps.node-store-misses
     38847 ±  4%     -44.1%      21704 ±  2%  perf-stat.ps.node-stores
 1.581e+12            +9.6%  1.733e+12        perf-stat.total.instructions


***************************************************************************************************
lkp-ivb-2ep1: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory
=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
  cpu/gcc-9/performance/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/getrandom/stress-ng/60s/0x42e

commit: 
  96562f2868 ("random: early initialization of ChaCha constants")
  2ee25b6968 ("random: avoid superfluous call to RDRAND in CRNG extraction")

96562f286884e2db 2ee25b6968b1b3c66ffa408de23 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1749360           +33.0%    2327260        stress-ng.getrandom.ops
     29155           +33.0%      38787        stress-ng.getrandom.ops_per_sec
     79229 ±  4%      +6.8%      84630 ±  3%  meminfo.AnonHugePages
      9206 ±  3%      -5.5%       8699 ±  2%  proc-vmstat.pgactivate
    112056 ±  3%      +8.1%     121114 ±  3%  softirqs.SCHED
     35.12            +2.0%      35.82        boot-time.boot
      1458            +2.1%       1489        boot-time.idle
  1.81e+08 ± 12%     +27.3%  2.304e+08 ± 13%  cpuidle..time
    401544 ± 14%     +29.3%     519074 ± 12%  cpuidle..usage
    129917 ±  4%      -7.8%     119774 ±  5%  numa-numastat.node0.local_node
    126626 ±  3%      +8.4%     137254 ±  5%  numa-numastat.node1.local_node
      1632 ±  2%      +3.9%       1695        vmstat.system.cs
    104682            -1.0%     103645        vmstat.system.in
      4.99 ± 13%      +1.8        6.82 ±  9%  mpstat.cpu.all.idle%
      0.85 ±  3%      +0.1        0.94 ±  4%  mpstat.cpu.all.irq%
      0.01 ±  7%      +0.0        0.02 ±  8%  mpstat.cpu.all.soft%
     11520 ± 96%     -86.7%       1534 ± 87%  numa-meminfo.node1.AnonHugePages
     52249 ± 86%     -72.2%      14547 ± 25%  numa-meminfo.node1.AnonPages
     71509 ± 64%     -53.2%      33434 ± 13%  numa-meminfo.node1.AnonPages.max
      2428 ±  2%     +12.8%       2740 ±  7%  numa-vmstat.node0.numa_interleave
     13085 ± 86%     -72.3%       3628 ± 25%  numa-vmstat.node1.nr_anon_pages
      2831 ±  2%     -11.4%       2509 ±  8%  numa-vmstat.node1.numa_interleave
    137674 ± 70%     +87.2%     257735 ± 13%  turbostat.C1E
    200554 ± 13%     +22.4%     245492 ± 13%  turbostat.C6
      4.55 ±  7%      +1.4        6.00 ±  8%  turbostat.C6%
      2.49 ± 22%     +40.9%       3.51 ±  6%  turbostat.CPU%c6
      1.05 ±  3%     +61.6%       1.70 ± 24%  turbostat.Pkg%pc2
      0.03 ± 77%    +475.0%       0.15 ± 54%  turbostat.Pkg%pc6
     71.00            -3.8%      68.33        turbostat.PkgTmp
      2.18 ±  6%     +40.0%       3.05 ± 15%  perf-stat.i.MPKI
      0.34 ± 10%      +0.2        0.51 ± 15%  perf-stat.i.branch-miss-rate%
  62441682 ±  3%     +20.3%   75094122 ±  2%  perf-stat.i.cache-references
      1387            +3.2%       1432 ±  2%  perf-stat.i.context-switches
      4.27            -4.0%       4.10        perf-stat.i.cpi
 1.366e+11            -1.2%  1.349e+11        perf-stat.i.cpu-cycles
 7.358e+09            +2.4%  7.535e+09        perf-stat.i.dTLB-loads
   1293946 ± 16%     +32.9%    1719408 ±  8%  perf-stat.i.dTLB-store-misses
 7.417e+08           +27.5%   9.46e+08        perf-stat.i.dTLB-stores
    278891           +12.7%     314323 ±  4%  perf-stat.i.iTLB-load-misses
 3.144e+10            +4.1%  3.274e+10        perf-stat.i.instructions
    184251           -14.1%     158318 ±  2%  perf-stat.i.instructions-per-iTLB-miss
     10.79            -2.7%      10.50        perf-stat.i.major-faults
      2.85            -1.2%       2.81        perf-stat.i.metric.GHz
     63.24 ±  6%     +33.4%      84.34 ±  7%  perf-stat.i.metric.K/sec
    310.37            +2.5%     318.16        perf-stat.i.metric.M/sec
      1.99 ±  3%     +15.5%       2.29 ±  2%  perf-stat.overall.MPKI
      4.34            -5.1%       4.12        perf-stat.overall.cpi
    112722            -7.4%     104351 ±  5%  perf-stat.overall.instructions-per-iTLB-miss
      0.23            +5.4%       0.24        perf-stat.overall.ipc
  61452519 ±  3%     +20.3%   73919165 ±  2%  perf-stat.ps.cache-references
      1365            +3.3%       1410 ±  2%  perf-stat.ps.context-switches
 1.344e+11            -1.2%  1.328e+11        perf-stat.ps.cpu-cycles
 7.241e+09            +2.4%  7.416e+09        perf-stat.ps.dTLB-loads
   1273486 ± 16%     +32.9%    1692449 ±  8%  perf-stat.ps.dTLB-store-misses
 7.299e+08           +27.6%  9.312e+08        perf-stat.ps.dTLB-stores
    274494           +12.8%     309549 ±  4%  perf-stat.ps.iTLB-load-misses
 3.094e+10            +4.1%  3.223e+10        perf-stat.ps.instructions
     10.63            -2.3%      10.38        perf-stat.ps.major-faults
 1.966e+12            +5.8%   2.08e+12        perf-stat.total.instructions
     97.38            -0.3       97.12        perf-profile.calltrace.cycles-pp._extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
     94.02            -0.1       93.93        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave._extract_crng.urandom_read_nowarn.do_syscall_64
     99.82            -0.0       99.80        perf-profile.calltrace.cycles-pp.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
     99.86            -0.0       99.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getrandom
     99.86            -0.0       99.84        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
      1.31            +0.7        2.03 ±  6%  perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic._extract_crng.urandom_read_nowarn.do_syscall_64
      1.74            +0.9        2.60        perf-profile.calltrace.cycles-pp.chacha_block_generic._extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
     98.15            -0.3       97.89        perf-profile.children.cycles-pp._extract_crng
     95.50            -0.1       95.42        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     99.84            -0.0       99.82        perf-profile.children.cycles-pp.urandom_read_nowarn
     99.92            -0.0       99.90        perf-profile.children.cycles-pp.do_syscall_64
     99.92            -0.0       99.90        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12            +0.0        0.13        perf-profile.children.cycles-pp.tick_sched_handle
      0.12            +0.0        0.13        perf-profile.children.cycles-pp.update_process_times
      0.13            +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.tick_sched_timer
      0.12 ±  6%      +0.0        0.14 ±  5%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      0.31 ±  2%      +0.0        0.33 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.09 ±  9%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.__check_object_size
      0.40 ±  2%      +0.0        0.43        perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.check_stack_object
      0.12 ± 13%      +0.1        0.23 ±  2%  perf-profile.children.cycles-pp.__might_resched
      0.18 ±  9%      +0.1        0.31        perf-profile.children.cycles-pp.__might_fault
      0.35 ±  5%      +0.2        0.50        perf-profile.children.cycles-pp._copy_to_user
      1.32            +0.7        2.05 ±  6%  perf-profile.children.cycles-pp.chacha_permute
      1.76            +0.9        2.63        perf-profile.children.cycles-pp.chacha_block_generic
      1.25 ±  4%      -1.1        0.12 ±  4%  perf-profile.self.cycles-pp._extract_crng
     95.50            -0.1       95.42        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.06 ±  7%      +0.0        0.08 ±  6%  perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
      0.05 ±  8%      +0.0        0.07        perf-profile.self.cycles-pp.urandom_read_nowarn
      0.11 ±  4%      +0.0        0.13 ±  3%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      0.05            +0.0        0.07 ± 12%  perf-profile.self.cycles-pp.__check_object_size
      0.00            +0.1        0.05 ±  8%  perf-profile.self.cycles-pp.__might_fault
      0.12 ± 13%      +0.1        0.22 ±  2%  perf-profile.self.cycles-pp.__might_resched
      0.39 ±  3%      +0.1        0.50 ±  6%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.44            +0.1        0.58 ± 18%  perf-profile.self.cycles-pp.chacha_block_generic
      1.32            +0.7        2.05 ±  6%  perf-profile.self.cycles-pp.chacha_permute





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.16.0-rc8-00065-g2ee25b6968b1" of type "text/plain" (173210 bytes)

View attachment "job-script" of type "text/plain" (8150 bytes)

View attachment "job.yaml" of type "text/plain" (5473 bytes)

View attachment "reproduce" of type "text/plain" (342 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ