lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 19 May 2021 18:17:35 +0000
From:   Nadav Amit <namit@...are.com>
To:     kernel test robot <oliver.sang@...el.com>
CC:     Ingo Molnar <mingo@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "lkp@...ts.01.org" <lkp@...ts.01.org>,
        kernel test robot <lkp@...el.com>,
        "Huang, Ying" <ying.huang@...el.com>,
        "feng.tang@...el.com" <feng.tang@...el.com>,
        "zhengjun.xing@...el.com" <zhengjun.xing@...el.com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [smp]  a32a4d8a81:  netperf.Throughput_tps -2.1% regression

[ +PeterZ for reference ]


> On May 19, 2021, at 7:27 AM, kernel test robot <oliver.sang@...el.com> wrote:
> 
> 
> 
> Greeting,
> 
> FYI, we noticed a -2.1% regression of netperf.Throughput_tps due to commit:
> 
> 
> commit: a32a4d8a815c4eb6dc64b8962dc13a9dfae70868 ("smp: Run functions concurrently in smp_call_function_many_cond()")
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fcgit%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git&amp;data=04%7C01%7Cnamit%40vmware.com%7Ca49b22e928144aab039908d91acff8c4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637570302823256266%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h2VRetBNlEQBvOlkYrRCMCK6%2BukRqlCElYxM8UfVxqI%3D&amp;reserved=0 master
> 
> 
> in testcase: netperf
> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> with following parameters:
> 
> 	ip: ipv4
> 	runtime: 300s
> 	nr_threads: 1
> 	cluster: cs-localhost
> 	test: UDP_RR
> 	cpufreq_governor: performance
> 	ucode: 0x5003006
> 
> 

[snip]

> commit:
>  v5.12-rc2
>  a32a4d8a81 ("smp: Run functions concurrently in smp_call_function_many_cond()")
> 
>       v5.12-rc2 a32a4d8a815c4eb6dc64b8962dc
> ---------------- ---------------------------
>         %stddev     %change         %stddev
>             \          |                \
>    116903            -2.1%     114404        netperf.Throughput_total_tps
>    116903            -2.1%     114404        netperf.Throughput_tps
>  35066769            -2.1%   34317990        netperf.time.voluntary_context_switches
>  35071059            -2.1%   34321258        netperf.workload
>     67295            +1.5%      68333        proc-vmstat.nr_anon_pages
>    463520            -2.1%     453603        vmstat.system.cs
>    535.28 ±  6%      -8.3%     490.97 ± 10%  sched_debug.cfs_rq:/.util_est_enqueued.max
>      0.02 ±  8%     -10.8%       0.02 ±  4%  sched_debug.cpu.nr_running.avg
>  76309820 ±  4%    +320.0%  3.205e+08 ±158%  cpuidle.C1.time
>  23409116 ±  3%     +31.0%   30676822 ± 20%  cpuidle.C1.usage
>  46720133 ±  2%     -12.9%   40709940 ±  2%  cpuidle.POLL.usage
>      5282 ±110%    +317.0%      22029 ± 58%  numa-vmstat.node3.nr_anon_pages
>     11998 ± 55%    +138.7%      28637 ± 45%  numa-vmstat.node3.nr_inactive_anon
>     11998 ± 55%    +138.7%      28637 ± 45%  numa-vmstat.node3.nr_zone_inactive_anon
>      8397 ±136%    +588.7%      57827 ± 75%  numa-meminfo.node3.AnonHugePages
>     21162 ±110%    +316.7%      88189 ± 58%  numa-meminfo.node3.AnonPages
>     48780 ± 54%    +136.8%     115533 ± 45%  numa-meminfo.node3.Inactive
>     48780 ± 54%    +136.8%     115533 ± 45%  numa-meminfo.node3.Inactive(anon)
>    467040            -2.1%     457094        perf-stat.i.context-switches
>      0.01 ±138%      +0.0        0.03 ± 73%  perf-stat.i.dTLB-store-miss-rate%
> 9.415e+08            -2.4%  9.188e+08 ±  2%  perf-stat.i.dTLB-stores
>      0.01 ±137%      +0.0        0.03 ± 73%  perf-stat.overall.dTLB-store-miss-rate%
>    465472            -2.1%     455557        perf-stat.ps.context-switches
> 9.385e+08            -2.4%  9.158e+08 ±  2%  perf-stat.ps.dTLB-stores
>      1.21 ± 14%      +0.2        1.41 ±  5%  perf-profile.calltrace.cycles-pp.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
>      2.05 ± 10%      +0.3        2.33 ±  4%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>      0.06 ±  7%      +0.0        0.08 ± 14%  perf-profile.children.cycles-pp.__calc_delta
>      0.08 ± 19%      +0.0        0.10 ±  9%  perf-profile.children.cycles-pp._copy_to_user
>      0.09 ± 22%      +0.0        0.12 ±  8%  perf-profile.children.cycles-pp._copy_from_user
>      0.12 ± 20%      +0.0        0.17 ± 13%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>      0.14 ± 11%      +0.1        0.19 ±  9%  perf-profile.children.cycles-pp.skb_release_data
>      1.21 ± 14%      +0.2        1.41 ±  5%  perf-profile.children.cycles-pp.__ip_append_data
>      2.07 ± 11%      +0.3        2.33 ±  4%  perf-profile.children.cycles-pp.schedule_idle
>      0.06 ±  7%      +0.0        0.08 ± 11%  perf-profile.self.cycles-pp.__calc_delta
>      0.19 ±  8%      +0.0        0.24 ±  6%  perf-profile.self.cycles-pp.__softirqentry_text_start
>      0.24 ±  8%      +0.1        0.29 ±  4%  perf-profile.self.cycles-pp.__skb_recv_udp
>      0.14 ± 11%      +0.1        0.19 ±  9%  perf-profile.self.cycles-pp.skb_release_data
>      0.02 ±142%      +0.1        0.08 ± 17%  perf-profile.self.cycles-pp.sock_alloc_send_pskb
>      0.11 ± 17%      +0.1        0.19 ± 13%  perf-profile.self.cycles-pp.__ip_append_data
>      0.12 ± 34%      +0.1        0.26 ± 22%  perf-profile.self.cycles-pp.perf_mux_hrtimer_handler
>      0.87 ± 13%      +0.2        1.05 ±  6%  perf-profile.self.cycles-pp._raw_spin_lock
>      1287 ± 42%     +75.3%       2256 ± 14%  interrupts.CPU111.CAL:Function_call_interrupts
>      1326 ± 43%     +71.0%       2267 ± 13%  interrupts.CPU119.CAL:Function_call_interrupts
>      1300 ± 45%     +75.9%       2287 ± 37%  interrupts.CPU120.CAL:Function_call_interrupts
>      1299 ± 45%     +60.1%       2081 ± 28%  interrupts.CPU128.CAL:Function_call_interrupts
>      1305 ± 45%     +61.7%       2110 ± 29%  interrupts.CPU131.CAL:Function_call_interrupts
>      1299 ± 45%     +61.8%       2102 ± 28%  interrupts.CPU139.CAL:Function_call_interrupts
>     66.67 ±133%     -97.2%       1.83 ±155%  interrupts.CPU14.TLB:TLB_shootdowns
>      1299 ± 45%    +107.8%       2700 ± 33%  interrupts.CPU142.CAL:Function_call_interrupts
>    301.83 ±128%     -95.6%      13.17 ±140%  interrupts.CPU149.RES:Rescheduling_interrupts
>    389.17 ± 89%     -73.5%     103.17 ± 35%  interrupts.CPU164.NMI:Non-maskable_interrupts
>    389.17 ± 89%     -73.5%     103.17 ± 35%  interrupts.CPU164.PMI:Performance_monitoring_interrupts
>      1299 ± 45%     +60.2%       2081 ± 28%  interrupts.CPU35.CAL:Function_call_interrupts
>      1244 ± 50%     +66.8%       2076 ± 27%  interrupts.CPU45.CAL:Function_call_interrupts
>      1300 ± 44%     +59.5%       2075 ± 28%  interrupts.CPU46.CAL:Function_call_interrupts
>      1.50 ± 63%   +1422.2%      22.83 ±167%  interrupts.CPU47.RES:Rescheduling_interrupts
>    467.33 ± 85%     -64.6%     165.67 ± 74%  interrupts.CPU58.NMI:Non-maskable_interrupts
>    467.33 ± 85%     -64.6%     165.67 ± 74%  interrupts.CPU58.PMI:Performance_monitoring_interrupts
>    306.67 ± 75%     -59.9%     122.83 ± 16%  interrupts.CPU68.NMI:Non-maskable_interrupts
>    306.67 ± 75%     -59.9%     122.83 ± 16%  interrupts.CPU68.PMI:Performance_monitoring_interrupts
>      1131 ± 27%     +61.2%       1822 ± 35%  interrupts.CPU85.CAL:Function_call_interrupts
>      1180 ± 31%     +79.6%       2119 ± 24%  interrupts.CPU86.CAL:Function_call_interrupts
> 

Could it be a result of a regression that was resolved by commit
641acbf6fd6 ("smp: Micro-optimize smp_call_function_many_cond()")
or does this report mean that the performance regression also
happened on the -rc?


Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ