lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 9 Mar 2023 19:51:43 +0530
From:   Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
To:     Benjamin Segall <bsegall@...gle.com>, mingo@...hat.com,
        peterz@...radead.org, vincent.guittot@...aro.org
Cc:     dietmar.eggemann@....com, tglx@...utronix.de,
        srikar@...ux.vnet.ibm.com, arjan@...ux.intel.com,
        svaidy@...ux.ibm.com, linux-kernel@...r.kernel.org,
        shrikanth hegde <sshegde@...ux.vnet.ibm.com>
Subject: Re: [PATCH V3] Interleave cfs bandwidth timers for improved single
 thread performance at low utilization



>> CPU cfs bandwidth controller uses hrtimer. Currently there is no initial
>> value set. Hence all period timers would align at expiry.
>> This happens when there are multiple CPU cgroup's.
>>
>> There is a performance gain that can be achieved here if the timers are
>> interleaved when the utilization of each CPU cgroup is low and total
>> utilization of all the CPU cgroup's is less than 50%. If the timers are
>> interleaved, then the unthrottled cgroup can run freely without many
>> context switches and can also benefit from SMT Folding. This effect will
>> be further amplified in SPLPAR environment.
>>
>> This commit adds a random offset after initializing each hrtimer. This
>> would result in interleaving the timers at expiry, which helps in achieving
>> the said performance gain.
>>
>> This was tested on powerpc platform with 8 core SMT=8. Socket power was
>> measured when the workload. Benchmarked the stress-ng with power
>> information. Throughput oriented benchmarks show significant gain up to
>> 25% while power consumption increases up to 15%.
>>
>> Workload: stress-ng --cpu=32 --cpu-ops=50000.
>> 1CG - 1 cgroup is running.
>> 2CG - 2 cgroups are running together.
>> Time taken to complete stress-ng in seconds and power is in watts.
>> each cgroup is throttled at 25% with 100ms as the period value.
>>            6.2-rc6                     |   with patch
>> 8 core   1CG    power   2CG     power  |  1CG    power  2 CG    power
>>         27.5    80.6    40      90     |  27.3    82    32.3    104
>>         27.5    81      40.2    91     |  27.5    81    38.7     96
>>         27.7    80      40.1    89     |  27.6    80    29.7    106
>>         27.7    80.1    40.3    94     |  27.6    80    31.5    105
>>
>> Latency might be affected by this change. That could happen if the CPU was
>> in a deep idle state which is possible if we interleave the timers. Used
>> schbench for measuring the latency. Each cgroup is throttled at 25% with
>> period value is set to 100ms. Numbers are when both the cgroups are
>> running simultaneously. Latency values don't degrade much. Some
>> improvement is seen in tail latencies.
>>
>> 		6.2-rc6        with patch
>> Groups: 16
>> 50.0th:          39.5            42.5
>> 75.0th:         924.0           922.0
>> 90.0th:         972.0           968.0
>> 95.0th:        1005.5           994.0
>> 99.0th:        4166.0          2287.0
>> 99.5th:        7314.0          7448.0
>> 99.9th:       15024.0         13600.0
>>
>> Groups: 32
>> 50.0th:         819.0           463.0
>> 75.0th:        1596.0           918.0
>> 90.0th:        5992.0          1281.5
>> 95.0th:       13184.0          2765.0
>> 99.0th:       21792.0         14240.0
>> 99.5th:       25696.0         18920.0
>> 99.9th:       33280.0         35776.0
>>
>> Groups: 64
>> 50.0th:        4806.0          3440.0
>> 75.0th:       31136.0         33664.0
>> 90.0th:       54144.0         58752.0
>> 95.0th:       66176.0         67200.0
>> 99.0th:       84736.0         91520.0
>> 99.5th:       97408.0        114048.0
>> 99.9th:      136448.0        140032.0
>>
>> Signed-off-by: Shrikanth Hegde<sshegde@...ux.vnet.ibm.com>
>> Suggested-by: Peter Zijlstra <peterz@...radead.org>
>> Suggested-by: Thomas Gleixner <tglx@...utronix.de>
> 
> Reviewed-by: Ben Segall <bsegall@...gle.com>

Ben, Thank you very much for the review.

Peter, Ingo, Vincent, 
Could you please provide your inputs? Any other benchmarks should be run? 


Sorry about the subject line. I should have included sched/fair. I realized it after 
mail was sent. I sent another mail with only subject line changed. 
They are same except subject line.
https://lore.kernel.org/lkml/20230223185918.1500132-1-sshegde@linux.vnet.ibm.com/

Should I resend the patch with better change log and a cover letter detailing 
test results instead?

> 
>>
>> Initial RFC PATCH, discussions and details on the problem:
>> Link1: https://lore.kernel.org/lkml/5ae3cb09-8c9a-11e8-75a7-cc774d9bc283@linux.vnet.ibm.com/
>> Link2: https://lore.kernel.org/lkml/9c57c92c-3e0c-b8c5-4be9-8f4df344a347@linux.vnet.ibm.com/
>>
>> ---
>>  kernel/sched/fair.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ff4dbbae3b10..2a4a0969e04f 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5923,6 +5923,10 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>>  	INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
>>  	hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
>>  	cfs_b->period_timer.function = sched_cfs_period_timer;
>> +
>> +	/* Add a random offset so that timers interleave */
>> +	hrtimer_set_expires(&cfs_b->period_timer,
>> +			    get_random_u32_below(cfs_b->period));
>>  	hrtimer_init(&cfs_b->slack_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>>  	cfs_b->slack_timer.function = sched_cfs_slack_timer;
>>  	cfs_b->slack_started = false;
>> --
>> 2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ