lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20220222084441.GA18914@xsang-OptiPlex-9020>
Date:   Tue, 22 Feb 2022 16:44:41 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     Theodore Ts'o <tytso@....edu>,
        Dominik Brodowski <linux@...inikbrodowski.net>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Jann Horn <jannh@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...ux.intel.com, fengwei.yin@...el.com
Subject: [random]  f73c522c4c:  stress-ng.getrandom.ops_per_sec 8450.8%
 improvement



Greeting,

FYI, we noticed a 8450.8% improvement of stress-ng.getrandom.ops_per_sec due to commit:


commit: f73c522c4c2094d1c434083ae362bbd4a2ed7348 ("random: use simpler fast key erasure flow on per-cpu keys")
url: https://github.com/0day-ci/linux/commits/Yusuf-Khan/pga-dfl-pci-Make-sure-DMA-related-error-check-is-not-done-twice/20220222-123031

in testcase: stress-ng
on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory
with following parameters:

	nr_threads: 100%
	testtime: 60s
	class: cpu
	test: getrandom
	cpufreq_governor: performance
	ucode: 0x42e






Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
  cpu/gcc-9/performance/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/getrandom/stress-ng/60s/0x42e

commit: 
  a086a3a1cb ("random: absorb fast pool into input pool after fast load")
  f73c522c4c ("random: use simpler fast key erasure flow on per-cpu keys")

a086a3a1cbfe32bb f73c522c4c2094d1c434083ae36 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    760691         +8450.6%   65043882        stress-ng.getrandom.ops
     12678         +8450.8%    1084065        stress-ng.getrandom.ops_per_sec
     29172           +19.8%      34936 ±  3%  stress-ng.time.involuntary_context_switches
      2852            -1.3%       2817        stress-ng.time.system_time
      0.43 ±  6%   +6609.4%      28.63        stress-ng.time.user_time
    100146 ±  3%     +44.5%     144715 ±  3%  softirqs.RCU
      1635           +10.9%       1813 ±  2%  vmstat.system.cs
      0.76 ±  2%      +0.2        0.99        mpstat.cpu.all.irq%
      0.27 ±  4%      +0.9        1.21        mpstat.cpu.all.usr%
    161.88            +9.4%     177.12        turbostat.CorWatt
    190.93            +7.8%     205.84        turbostat.PkgWatt
     21.27 ±  3%     -10.3%      19.09 ±  3%  turbostat.RAMWatt
      1.18 ±  9%     -80.9%       0.23 ± 14%  perf-stat.i.MPKI
 6.717e+09           +47.1%  9.884e+09        perf-stat.i.branch-instructions
  10570014 ±  6%     +17.7%   12440119 ±  5%  perf-stat.i.branch-misses
     34.23           -26.5        7.78 ±  3%  perf-stat.i.cache-miss-rate%
   9666129           -92.4%     731662 ±  6%  perf-stat.i.cache-misses
  28512914           -82.0%    5132520 ±  3%  perf-stat.i.cache-references
      1361 ±  2%     +14.9%       1563 ±  2%  perf-stat.i.context-switches
      4.64           -84.7%       0.71        perf-stat.i.cpi
     14249         +5410.5%     785198 ±  3%  perf-stat.i.cycles-between-cache-misses
      0.03 ±  9%      -0.0        0.02 ±  2%  perf-stat.i.dTLB-load-miss-rate%
   1658772 ± 12%    +170.3%    4483462 ±  3%  perf-stat.i.dTLB-load-misses
 7.003e+09          +367.8%  3.276e+10        perf-stat.i.dTLB-loads
      0.10 ±  4%      -0.1        0.02 ±  2%  perf-stat.i.dTLB-store-miss-rate%
    372166 ±  3%    +654.1%    2806515        perf-stat.i.dTLB-store-misses
 3.729e+08         +5647.3%  2.143e+10        perf-stat.i.dTLB-stores
     92.41            +3.9       96.26        perf-stat.i.iTLB-load-miss-rate%
    203482 ± 11%    +688.4%    1604335        perf-stat.i.iTLB-load-misses
     19298 ±  5%    +226.7%      63041 ± 19%  perf-stat.i.iTLB-loads
  2.89e+10          +589.7%  1.993e+11        perf-stat.i.instructions
    280121 ± 16%     -55.5%     124599        perf-stat.i.instructions-per-iTLB-miss
      0.23          +532.6%       1.43        perf-stat.i.ipc
    927.95           -82.8%     159.69        perf-stat.i.metric.K/sec
    293.60          +354.6%       1334        perf-stat.i.metric.M/sec
     43.93            +1.3       45.22        perf-stat.i.node-load-miss-rate%
   5038505           -97.8%     109393 ±  3%  perf-stat.i.node-load-misses
   6287060           -97.5%     156525        perf-stat.i.node-loads
     39.30            -4.5       34.77 ±  2%  perf-stat.i.node-store-miss-rate%
   2061218           -94.0%     123404 ±  7%  perf-stat.i.node-store-misses
   3170200           -92.6%     236074 ±  5%  perf-stat.i.node-stores
      0.99           -97.4%       0.03 ±  3%  perf-stat.overall.MPKI
      0.16 ±  6%      -0.0        0.13 ±  5%  perf-stat.overall.branch-miss-rate%
     33.90           -19.7       14.24 ±  3%  perf-stat.overall.cache-miss-rate%
      4.73           -85.5%       0.69        perf-stat.overall.cpi
     14157         +1225.6%     187672 ±  6%  perf-stat.overall.cycles-between-cache-misses
      0.02 ± 12%      -0.0        0.01 ±  3%  perf-stat.overall.dTLB-load-miss-rate%
      0.10 ±  4%      -0.1        0.01        perf-stat.overall.dTLB-store-miss-rate%
     91.26            +5.0       96.22        perf-stat.overall.iTLB-load-miss-rate%
      0.21          +590.1%       1.46        perf-stat.overall.ipc
     44.49            -3.4       41.11 ±  2%  perf-stat.overall.node-load-miss-rate%
     39.40            -5.1       34.33 ±  6%  perf-stat.overall.node-store-miss-rate%
 6.611e+09           +47.1%  9.727e+09        perf-stat.ps.branch-instructions
  10407491 ±  6%     +17.6%   12240249 ±  5%  perf-stat.ps.branch-misses
   9512419           -92.4%     719946 ±  6%  perf-stat.ps.cache-misses
  28059987           -82.0%    5052170 ±  3%  perf-stat.ps.cache-references
      1339 ±  2%     +14.8%       1538 ±  2%  perf-stat.ps.context-switches
   1632412 ± 12%    +170.3%    4412272 ±  3%  perf-stat.ps.dTLB-load-misses
 6.891e+09          +367.8%  3.224e+10        perf-stat.ps.dTLB-loads
    366290 ±  3%    +654.0%    2762000        perf-stat.ps.dTLB-store-misses
  3.67e+08         +5646.9%  2.109e+10        perf-stat.ps.dTLB-stores
    200266 ± 11%    +688.4%    1578805        perf-stat.ps.iTLB-load-misses
     18995 ±  5%    +226.8%      62085 ± 19%  perf-stat.ps.iTLB-loads
 2.844e+10          +589.7%  1.962e+11        perf-stat.ps.instructions
   4958355           -97.8%     107622 ±  3%  perf-stat.ps.node-load-misses
   6187037           -97.5%     154096        perf-stat.ps.node-loads
   2028401           -94.0%     121256 ±  7%  perf-stat.ps.node-store-misses
   3119670           -92.6%     232039 ±  5%  perf-stat.ps.node-stores
 1.798e+12          +590.4%  1.242e+13        perf-stat.total.instructions
     97.64           -97.6        0.00        perf-profile.calltrace.cycles-pp.extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
     96.12           -96.1        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
     95.92           -95.9        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.extract_crng.urandom_read_nowarn.do_syscall_64
     99.86            -1.5       98.33        perf-profile.calltrace.cycles-pp.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
     99.88            -0.8       99.04        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
     99.88            -0.8       99.09        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getrandom
     99.89            -0.1       99.80        perf-profile.calltrace.cycles-pp.getrandom
      0.00            +0.6        0.55 ±  2%  perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.crng_fast_key_erasure.crng_make_state.get_random_bytes_user
      0.00            +0.6        0.56 ±  3%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
      0.00            +0.6        0.64        perf-profile.calltrace.cycles-pp.chacha_block_generic.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.urandom_read_nowarn
      0.00            +0.7        0.71        perf-profile.calltrace.cycles-pp.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.urandom_read_nowarn.do_syscall_64
      0.00            +0.8        0.76        perf-profile.calltrace.cycles-pp.crng_make_state.get_random_bytes_user.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.2        1.16        perf-profile.calltrace.cycles-pp.__might_sleep.__might_fault._copy_to_user.get_random_bytes_user.urandom_read_nowarn
      0.00            +1.3        1.25        perf-profile.calltrace.cycles-pp.check_stack_object.__check_object_size.get_random_bytes_user.urandom_read_nowarn.do_syscall_64
      0.00            +1.9        1.90        perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_to_user.get_random_bytes_user.urandom_read_nowarn
      0.00            +3.1        3.06        perf-profile.calltrace.cycles-pp.__check_object_size.get_random_bytes_user.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.5        3.54        perf-profile.calltrace.cycles-pp.__might_fault._copy_to_user.get_random_bytes_user.urandom_read_nowarn.do_syscall_64
      0.00            +6.3        6.27        perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_to_user.get_random_bytes_user.urandom_read_nowarn.do_syscall_64
      0.00           +11.6       11.56        perf-profile.calltrace.cycles-pp._copy_to_user.get_random_bytes_user.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00           +71.5       71.52        perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.get_random_bytes_user.urandom_read_nowarn.do_syscall_64
      0.00           +80.3       80.33        perf-profile.calltrace.cycles-pp.chacha_block_generic.get_random_bytes_user.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00           +97.9       97.86        perf-profile.calltrace.cycles-pp.get_random_bytes_user.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
     98.40           -98.4        0.00        perf-profile.children.cycles-pp.extract_crng
     97.63           -97.6        0.00        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     97.41           -97.4        0.00        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     99.86            -1.5       98.33        perf-profile.children.cycles-pp.urandom_read_nowarn
     99.93            -0.8       99.09        perf-profile.children.cycles-pp.do_syscall_64
     99.93            -0.8       99.14        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     99.90            -0.0       99.86        perf-profile.children.cycles-pp.getrandom
      0.06 ± 11%      +0.0        0.09 ±  9%  perf-profile.children.cycles-pp.task_tick_fair
      0.08 ±  5%      +0.0        0.12 ±  6%  perf-profile.children.cycles-pp.scheduler_tick
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.11 ±  3%      +0.1        0.17 ±  7%  perf-profile.children.cycles-pp.tick_sched_handle
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.copy_user_generic_unrolled
      0.10 ±  4%      +0.1        0.17 ±  8%  perf-profile.children.cycles-pp.update_process_times
      0.12 ±  4%      +0.1        0.18 ±  7%  perf-profile.children.cycles-pp.tick_sched_timer
      0.00            +0.1        0.07 ±  5%  perf-profile.children.cycles-pp.__x64_sys_getrandom
      0.16 ±  4%      +0.1        0.23 ±  5%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.31 ±  3%      +0.1        0.39 ±  4%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.23 ±  3%      +0.1        0.32 ±  4%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.23 ±  3%      +0.1        0.33 ±  4%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.26 ±  4%      +0.1        0.36 ±  4%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.00            +0.3        0.35        perf-profile.children.cycles-pp.__entry_text_start
      0.00            +0.4        0.36        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.00            +0.6        0.57 ±  3%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.00            +0.7        0.71        perf-profile.children.cycles-pp.crng_fast_key_erasure
      0.00            +0.8        0.76        perf-profile.children.cycles-pp.crng_make_state
      0.00            +1.2        1.21        perf-profile.children.cycles-pp.__might_sleep
      0.00            +1.5        1.55        perf-profile.children.cycles-pp.check_stack_object
      0.18 ±  4%      +1.8        1.96        perf-profile.children.cycles-pp.__might_resched
      0.00            +3.4        3.45        perf-profile.children.cycles-pp.__check_object_size
      0.21 ±  3%      +3.7        3.92        perf-profile.children.cycles-pp.__might_fault
      0.05 ±  8%      +6.4        6.46        perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      0.28 ±  2%     +11.7       12.00        perf-profile.children.cycles-pp._copy_to_user
      0.82 ±  2%     +71.4       72.20        perf-profile.children.cycles-pp.chacha_permute
      1.37 ±  2%     +79.8       81.17        perf-profile.children.cycles-pp.chacha_block_generic
      0.00           +98.3       98.32        perf-profile.children.cycles-pp.get_random_bytes_user
     97.41           -97.4        0.00        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.00            +0.1        0.06 ±  9%  perf-profile.self.cycles-pp.copy_user_generic_unrolled
      0.00            +0.1        0.06        perf-profile.self.cycles-pp.__x64_sys_getrandom
      0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.crng_fast_key_erasure
      0.00            +0.1        0.07 ± 11%  perf-profile.self.cycles-pp.do_syscall_64
      0.00            +0.1        0.08 ±  6%  perf-profile.self.cycles-pp.getrandom
      0.00            +0.3        0.30        perf-profile.self.cycles-pp.__entry_text_start
      0.00            +0.4        0.36        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.00            +0.5        0.53 ±  3%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.00            +0.9        0.88        perf-profile.self.cycles-pp.__might_fault
      0.00            +1.0        1.02 ±  2%  perf-profile.self.cycles-pp.__might_sleep
      0.00            +1.3        1.28        perf-profile.self.cycles-pp.check_stack_object
      0.18 ±  4%      +1.7        1.91        perf-profile.self.cycles-pp.__might_resched
      0.00            +1.8        1.75        perf-profile.self.cycles-pp._copy_to_user
      0.00            +2.0        2.02        perf-profile.self.cycles-pp.__check_object_size
      0.00            +2.0        2.04        perf-profile.self.cycles-pp.get_random_bytes_user
      0.05            +6.1        6.11        perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      0.55 ±  2%      +8.4        8.96        perf-profile.self.cycles-pp.chacha_block_generic
      0.82 ±  3%     +70.9       71.76        perf-profile.self.cycles-pp.chacha_permute




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.17.0-rc4-00015-gf73c522c4c20" of type "text/plain" (174732 bytes)

View attachment "job-script" of type "text/plain" (7958 bytes)

View attachment "job.yaml" of type "text/plain" (5422 bytes)

View attachment "reproduce" of type "text/plain" (342 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ