lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2bfbf7f-b77b-a66e-b256-1cd4cdf260c7@arm.com>
Date:   Mon, 22 Jun 2020 10:06:31 +0100
From:   Lukasz Luba <lukasz.luba@....com>
To:     Qais Yousef <qais.yousef@....com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Patrick Bellasi <patrick.bellasi@...bug.net>,
        Chris Redpath <chrid.redpath@....com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] sched: Optionally skip uclamp logic in fast path

Hi Qais,

On 6/18/20 8:55 PM, Qais Yousef wrote:
> This series attempts to address the report that uclamp logic could be expensive
> sometimes and shows a regression in netperf UDP_STREAM under certain
> conditions.
> 
> The first patch is a fix for how struct uclamp_rq is initialized which is
> required by the 2nd patch which contains the real 'fix'.
> 
> Worth noting that the root cause of the overhead is believed to be system
> specific or related to potential certain code/data layout issues, leading to
> worse I/D $ performance.
> 
> Different systems exhibited different behaviors and the regression did
> disappear in certain kernel version while attempting to reporoduce.
> 
> More info can be found here:
> 
> https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/
> 
> Having the static key seemed the best thing to do to ensure the effect of
> uclamp is minimized for kernels that compile it in but don't have a userspace
> that uses it, which will allow distros to distribute uclamp capable kernels by
> default without having to compromise on performance for some systems that could
> be affected.
> 
> Thanks
> 
> --
> Qais Yousef
> 

I have given it a try on my machine
(HP server 2 socket 24 CPUs X86 64bit 4 NUMA nodes, AMD Opteron 6174,
L2 512KB/cpu, L3 6MB/node, RAM 40GB/node).
Kernel v5.7-rc7 with Open Suse 15.1 distro config.
The numa control for pinning tasks has not been used.

The results are better than the last time I have checked this uclamp
issue [1]. Here are the results for kernel built based on Suse 15.1
distor config + uclamp tasks + task groups (similar to 3rd kernel from
[1] tests).
They are really good in terms of nteperf-udp performance and how they
deal with statistical noise due to context switch and/or tasks jumping
around CPUs.

The netperf-udp (100 tests for each udp size):
                         ./v5.7-rc7-base    ./v5.7-rc7-uclamp-tsk-grp-fix
Hmean     send-64         62.36 (   0.00%)       66.27 *   6.27%*
Hmean     send-128       124.24 (   0.00%)      132.03 *   6.27%*
Hmean     send-256       244.81 (   0.00%)      261.21 *   6.70%*
Hmean     send-1024      922.18 (   0.00%)      985.84 *   6.90%*
Hmean     send-2048      1716.61 (   0.00%)     1811.30 *   5.52%*
Hmean     send-3312      2564.73 (   0.00%)     2690.62 *   4.91%*
Hmean     send-4096      2967.01 (   0.00%)     3128.71 *   5.45%*
Hmean     send-8192      4834.31 (   0.00%)     5028.15 *   4.01%*
Hmean     send-16384     7569.17 (   0.00%)     7734.05 *   2.18%*
Hmean     recv-64         62.35 (   0.00%)       66.27 *   6.28%*
Hmean     recv-128       124.24 (   0.00%)      132.03 *   6.27%*
Hmean     recv-256       244.79 (   0.00%)      261.20 *   6.70%*
Hmean     recv-1024      922.10 (   0.00%)      985.82 *   6.91%*
Hmean     recv-2048      1716.61 (   0.00%)     1811.29 *   5.52%*
Hmean     recv-3312      2564.46 (   0.00%)     2690.60 *   4.92%*
Hmean     recv-4096      2967.00 (   0.00%)     3128.71 *   5.45%*
Hmean     recv-8192      4834.06 (   0.00%)     5028.05 *   4.01%*
Hmean     recv-16384     7568.70 (   0.00%)     7733.69 *   2.18%*

Statistics info showing performance when there is the context
switch noise and/or tasks are jumping around CPUs. This is from
netperf-udp benchmark but running only 64B test case once with
tracing, but repeated 100 times in bash loop.
Traced functions performance combined and presented in statistical form
(padas dataframe describe() output).
It can be compared with basic kernel results or the similar kernel
(but w/o this fix) results present at [1].
For completeness I also put them below.

kernel with uclamp task + task group + this fix
activate_task
              Hit    Time_us  Avg_us  s^2_us
count     101.00     101.00  101.00  101.00
mean   26,269.33  19,397.98    1.15    0.51
std   123,464.10  90,121.64    0.36    0.19
min       101.00     161.49    0.37    0.03
50%       408.00     479.45    1.26    0.50
75%       720.00     704.05    1.40    0.60
90%     1,795.00   1,071.86    1.57    0.72
95%     3,688.00   1,776.87    1.61    0.79
99%   733,737.00 518,448.60    1.73    1.03
max   756,631.00 537,865.40    1.76    1.06
deactivate_task
                Hit    Time_us  Avg_us  s^2_us
count       101.00     101.00  101.00  101.00
mean    111,714.44  55,791.32    0.80    0.27
std     307,358.56 153,230.31    0.26    0.14
min          88.00      91.73    0.31    0.00
50%         464.00     381.30    0.90    0.29
75%       1,118.00     622.70    1.01    0.36
90%     517,991.00 255,669.50    1.10    0.44
95%     997,663.00 484,013.20    1.12    0.47
99%   1,189,980.00 578,987.30    1.14    0.51
max   1,422,640.00 686,828.60    1.16    0.60


Basic kernel traced functions performance
(no uclamp, no fixes, just built based on distro config)

activate_task
                Hit    Time_us  Avg_us  s^2_us
count       138.00     138.00  138.00  138.00
mean     20,387.44  14,587.33    1.15    0.53
std     114,980.19  81,427.51    0.42    0.23
min         110.00     181.68    0.32    0.00
50%         411.00     461.55    1.32    0.54
75%         881.75     760.08    1.47    0.66
90%       2,885.60   1,302.03    1.61    0.80
95%      55,318.05  41,273.41    1.66    0.92
99%     501,660.04 358,939.04    1.77    1.09
max   1,131,457.00 798,097.30    1.80    1.42
deactivate_task
                Hit    Time_us  Avg_us  s^2_us
count       138.00     138.00  138.00  138.00
mean     81,828.83  39,991.61    0.81    0.28
std     260,130.01 126,386.89    0.28    0.14
min          97.00      92.35    0.26    0.00
50%         424.00     340.35    0.94    0.30
75%       1,062.25     684.98    1.05    0.37
90%     330,657.50 168,320.94    1.11    0.46
95%     748,920.70 359,498.23    1.15    0.51
99%   1,094,614.76 528,459.50    1.21    0.56
max   1,630,473.00 789,476.50    1.25    0.60

Old kernel (w/o fix, uclamp task + tsg grp) which had this uclamp issue

activate_task
                Hit      Time_us  Avg_us  s^2_us
count       273.00       273.00  273.00  273.00
mean     15,958.34    16,471.84    1.58    0.67
std     105,096.88   108,322.03    0.43    0.32
min           3.00         4.96    0.41    0.00
50%         245.00       400.23    1.70    0.64
75%         384.00       565.53    1.85    0.78
90%       1,602.00     1,069.08    1.95    0.95
95%       3,403.00     1,573.74    2.01    1.13
99%     589,484.56   604,992.57    2.11    1.75
max   1,035,866.00 1,096,975.00    2.40    3.08
deactivate_task
                Hit      Time_us  Avg_us  s^2_us
count       273.00       273.00  273.00  273.00
mean     94,607.02    63,433.12    1.02    0.34
std     325,130.91   216,844.92    0.28    0.16
min           2.00         2.79    0.29    0.00
50%         244.00       291.49    1.11    0.36
75%         496.00       448.72    1.19    0.43
90%     120,304.60    82,964.94    1.25    0.55
95%     945,480.60   626,793.58    1.33    0.60
99%   1,485,959.96 1,010,615.72    1.40    0.68
max   2,120,682.00 1,403,280.00    1.80    1.11


Based on these statistics this fix has better distribution in almost all
marked points and better performance results for netperf-udp.

I can run also tests for kernel just with uclamp tasks today, if it's
needed.

Regards,
Lukasz

[1] 
https://lore.kernel.org/lkml/d9c951da-87eb-ab20-9434-f15b34096d66@arm.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ