linux-kernel - Re: [RFC PATCH] sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <95f4ac27-490f-9189-8a3f-dace38153222@linux.vnet.ibm.com>
Date:   Fri, 17 Feb 2023 01:27:48 +0530
From:   shrikanth hegde <sshegde@...ux.vnet.ibm.com>
To:     Benjamin Segall <bsegall@...gle.com>
Cc:     mingo@...hat.com, peterz@...radead.org, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, tglx@...utronix.de,
        srikar@...ux.vnet.ibm.com, arjan@...ux.intel.com,
        svaidy@...ux.ibm.com, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Interleave cfs bandwidth timers for
 improved single thread performance at low utilization



On 2/16/23 3:02 AM, Benjamin Segall wrote:
> shrikanth hegde <sshegde@...ux.vnet.ibm.com> writes:
> 
>>>>
>>>>              6.2.rc5                           with patch
>>>>         1CG    power   2CG    power   | 1CG  power     2CG        power
>>>> 1Core   218     44     315      46    | 219    45    277(+12%)    47(-2%)
>>>>         219     43     315      45    | 219    44    244(+22%)    48(-6%)
>>>> 	                              |
>>>> 2Core   108     48     158      52    | 109    50    114(+26%)    59(-13%)
>>>>         109     49     157      52    | 109    49    136(+13%)    56(-7%)
>>>>                                       |
>>>> 4Core    60     59      89      65    |  62    58     72(+19%)    68(-5%)
>>>>          61     61      90      65    |  62    60     68(+24%)    73(-12%)
>>>>                                       |
>>>> 8Core    33     77      48      83    |  33    77     37(+23%)    91(-10%)
>>>>          33     77      48      84    |  33    77     38(+21%)    90(-7%)
>>>>
>>>> There is no benefit at higher utilization of 50% or more. There is no
>>>> degradation also.
>>>>
>>>> This is RFC PATCH V2, where the code has been shifted from hrtimer to
>>>> sched. This patch sets an initial value as multiple of period/10.
>>>> Here timers can still align if the time started the cgroup is within the
>>>> period/10 interval. On a real life workload, time gives sufficient
>>>> randomness. There can be a better interleaving by being more
>>>> deterministic. For example, when there are 2 cgroups, they should
>>>> have initial value of 0/50ms or 10/60ms so on. When there are 3 cgroups,
>>>> 0/3/6ms or 1/4/7ms etc. That is more complicated as it has to account
>>>> for cgroup addition/deletion and accuracy w.r.t to period/quota.
>>>> If that approach is better here, then will come up with that patch.
>>>
>>> This does seem vaguely reasonable, though the power argument of
>>> consolidating wakeups and such is something that we intentionally do in
>>> other situations.
>>>
>> Thank you Benjamin for taking a look and spending time in reviewing this.
>>> How reasonable do you think it is to just say (and what do the
>>> equivalent numbers look like on your particular benchmark) "put some
>>> variance on your period config if you want variance"?
>>> Run to run variance is expected with this patch as the patch depends
>> on time upto last period/10 as the basis for interleaving. 
>> What i could infer from this comment about variance. Please correct if not.
> 
> My question is what the numbers look like if you instead prepare the
> cgroups with periods that are something like 97 ms and 103ms instead of
> both 100ms (keeping the quota as the same proportion as the original).

oh ok. If the cgroups are prepared with slightly different timer values, then
timers does interleave. That is expected as the difference would be small at
the beginning, goes to max at some point, then again would align later. Like
below

	  	|    /\
	  	|   /  \
        timer   |  /    \
   	delta	| /      \
		|/________\____

	           time -->

Did a set of experiments with the these three timer values. Here in all the
cases, each cgroup is allocated 25% of the runtime. There are 8 Core with SMT=8
(64 CPU). Values of 100ms/100ms not same as before, since this is run on
different machine as the previous one was not available. Hence kept 100/100
numbers as well.

                       6.2.rc6                    6.2.rc6 + with patch
Period    1CG    power  2CG    power   |  1CG     power   2CG        power
97/103    27.8     78   32.9     98    |  27.5    75      33.4        102
97/103    27.3     78   33      101    |  27.9    71      32.8         97

100/100   27.5     82   40.2     93    |  27.5    80      34.2        105
100/100   28       86   40.1     94    |  27.7    78      30.1        110

75/125    27.3     89   32.7    102    |  27.3    84      33          106
75/125    27.1     87   33      105    |  27.1    90      33.1        100

Few observations.
1. We get improved performance when the timers are slightly different from
   100ms.
2. If the timers have slight variance, there is no difference with patch.
3. power numbers vary bit more, when the timers have variance. This maybe
   because the idle/exit aren't aligning.
4. The best interleaving is still not possible if the timers have variance.
   that can happen only with deterministic interleaving. patch can hope to
   achieve that. But not always.