linux-kernel - [PATCH v5 1/5] sched/fair: Ignore SIS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri,  9 Sep 2022 13:53:00 +0800
From:   Abel Wu <wuyun.abel@...edance.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Josh Don <joshdon@...gle.com>, Chen Yu <yu.c.chen@...el.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        linux-kernel@...r.kernel.org, Abel Wu <wuyun.abel@...edance.com>
Subject: [PATCH v5 1/5] sched/fair: Ignore SIS_UTIL when has idle core

When SIS_UTIL is enabled, SIS domain scan will be skipped if the
LLC is overloaded even the has_idle_core hint is true. Since idle
load balancing is triggered at tick boundary, the idle cores can
stay cold for the whole tick period wasting time meanwhile some
of other cpus might be overloaded.

Give it a chance to scan for idle cores if the hint implies a
worthy effort.

Benchmark
=========

Tests are done in a dual socket (2 x 24C/48T) machine modeled Intel
Xeon(R) Platinum 8260, with SNC configuration:

	SNC on:  4 NUMA nodes each of which has 12C/24T
	SNC off: 2 NUMA nodes each of which has 24C/48T

All of the benchmarks are done inside a normal cpu cgroup in a clean
environment with cpu turbo disabled.

Based on tip sched/core 0fba527e959d (v5.19.0).

Results
=======

hackbench-process-pipes
                         vanilla		patched
(SNC on)
Amean     1        0.4480 (   0.00%)      0.4470 (   0.22%)
Amean     4        0.6137 (   0.00%)      0.5947 (   3.10%)
Amean     7        0.7530 (   0.00%)      0.7450 (   1.06%)
Amean     12       1.1230 (   0.00%)      1.1053 (   1.57%)
Amean     21       2.0567 (   0.00%)      1.9420 (   5.58%)
Amean     30       3.0847 (   0.00%)      2.9267 *   5.12%*
Amean     48       5.9043 (   0.00%)      4.7027 *  20.35%*
Amean     79       9.3477 (   0.00%)      7.7097 *  17.52%*
Amean     110     11.0647 (   0.00%)     10.0680 *   9.01%*
Amean     141     13.3297 (   0.00%)     12.5450 *   5.89%*
Amean     172     15.2210 (   0.00%)     15.0297 (   1.26%)
Amean     203     17.8510 (   0.00%)     16.8827 *   5.42%*
Amean     234     19.9263 (   0.00%)     19.1183 (   4.05%)
Amean     265     21.9117 (   0.00%)     20.9893 *   4.21%*
Amean     296     23.7683 (   0.00%)     23.3920 (   1.58%)
(SNC off)
Amean     1        0.2963 (   0.00%)      0.2717 (   8.32%)
Amean     4        0.6093 (   0.00%)      0.6257 (  -2.68%)
Amean     7        0.7837 (   0.00%)      0.7740 (   1.23%)
Amean     12       1.2703 (   0.00%)      1.2410 (   2.31%)
Amean     21       2.6260 (   0.00%)      2.6410 (  -0.57%)
Amean     30       4.3483 (   0.00%)      3.7620 (  13.48%)
Amean     48       7.9753 (   0.00%)      6.7757 (  15.04%)
Amean     79       9.6540 (   0.00%)      8.8827 *   7.99%*
Amean     110     11.2597 (   0.00%)     11.0583 (   1.79%)
Amean     141     13.8077 (   0.00%)     13.3387 (   3.40%)
Amean     172     16.3513 (   0.00%)     15.9583 *   2.40%*
Amean     203     19.0880 (   0.00%)     17.8757 *   6.35%*
Amean     234     21.7660 (   0.00%)     20.0543 *   7.86%*
Amean     265     23.0447 (   0.00%)     22.6643 *   1.65%*
Amean     296     25.4660 (   0.00%)     25.6677 (  -0.79%)

The more overloaded the system is, the more benefit can be seen due
to exploiting the cpu resources by more actively kicking idle cores
working, e.g. 21~48 groups. But once more workload are applied (79+
groups), the free cpu capacity that can be exploited becoming less,
thus improvement comes down to ~5%.

On the other hand when the load is relatively low (<12 groups), not
much benefit can be seen because in such case it's not hard to find
an idle cpu, so the benefit is picking up an idle core rather than
an idle cpu, but the cost of full scans will indeed negate lots of
benefit it brings.

The downside of full scan is that the cost gets bigger in larger
LLCs, but the test result seems still positive. One possible reason
is due to the low SIS success rate (~3.5%), so the cost doesn't
negate the benefit.

tbench4 Throughput
                         vanilla		patched
(SNC on)
Hmean     1        284.44 (   0.00%)      287.90 *   1.22%*
Hmean     2        564.10 (   0.00%)      575.52 *   2.02%*
Hmean     4       1120.93 (   0.00%)     1137.94 *   1.52%*
Hmean     8       2248.94 (   0.00%)     2250.42 *   0.07%*
Hmean     16      4360.10 (   0.00%)     4363.41 (   0.08%)
Hmean     32      7300.52 (   0.00%)     7338.06 *   0.51%*
Hmean     64      8912.37 (   0.00%)     8914.66 (   0.03%)
Hmean     128    19874.16 (   0.00%)    19978.59 *   0.53%*
Hmean     256    19759.42 (   0.00%)    20057.49 *   1.51%*
Hmean     384    19502.40 (   0.00%)    19846.74 *   1.77%*
(SNC off)
Hmean     1        300.70 (   0.00%)      309.43 *   2.90%*
Hmean     2        597.53 (   0.00%)      613.92 *   2.74%*
Hmean     4       1188.34 (   0.00%)     1227.84 *   3.32%*
Hmean     8       2336.22 (   0.00%)     2379.04 *   1.83%*
Hmean     16      4459.17 (   0.00%)     4634.66 *   3.94%*
Hmean     32      7606.69 (   0.00%)     7592.12 *  -0.19%*
Hmean     64      9009.48 (   0.00%)     9241.11 *   2.57%*
Hmean     128    19456.88 (   0.00%)    17870.37 *  -8.15%*
Hmean     256    19771.10 (   0.00%)    19370.92 *  -2.02%*
Hmean     384    20118.74 (   0.00%)    19413.92 *  -3.50%*

netperf-udp
                         vanilla		patched
(SNC on)
Hmean     send-64         209.06 (   0.00%)      211.69 *   1.26%*
Hmean     send-128        416.70 (   0.00%)      417.00 (   0.07%)
Hmean     send-256        819.65 (   0.00%)      827.61 *   0.97%*
Hmean     send-1024      3163.12 (   0.00%)     3191.16 *   0.89%*
Hmean     send-2048      5958.21 (   0.00%)     6045.20 *   1.46%*
Hmean     send-3312      9168.81 (   0.00%)     9282.21 *   1.24%*
Hmean     send-4096     11039.27 (   0.00%)    11130.55 (   0.83%)
Hmean     send-8192     17804.42 (   0.00%)    17816.25 (   0.07%)
Hmean     send-16384    28529.57 (   0.00%)    28812.09 (   0.99%)
Hmean     recv-64         209.06 (   0.00%)      211.69 *   1.26%*
Hmean     recv-128        416.70 (   0.00%)      417.00 (   0.07%)
Hmean     recv-256        819.65 (   0.00%)      827.61 *   0.97%*
Hmean     recv-1024      3163.12 (   0.00%)     3191.16 *   0.89%*
Hmean     recv-2048      5958.21 (   0.00%)     6045.18 *   1.46%*
Hmean     recv-3312      9168.81 (   0.00%)     9282.21 *   1.24%*
Hmean     recv-4096     11039.27 (   0.00%)    11130.55 (   0.83%)
Hmean     recv-8192     17804.32 (   0.00%)    17816.23 (   0.07%)
Hmean     recv-16384    28529.38 (   0.00%)    28812.04 (   0.99%)
(SNC off)
Hmean     send-64         211.39 (   0.00%)      213.24 (   0.87%)
Hmean     send-128        415.25 (   0.00%)      426.45 *   2.70%*
Hmean     send-256        814.75 (   0.00%)      835.33 *   2.53%*
Hmean     send-1024      3171.61 (   0.00%)     3173.84 (   0.07%)
Hmean     send-2048      6015.92 (   0.00%)     6046.41 (   0.51%)
Hmean     send-3312      9210.17 (   0.00%)     9309.65 (   1.08%)
Hmean     send-4096     11084.55 (   0.00%)    11250.86 *   1.50%*
Hmean     send-8192     17769.83 (   0.00%)    18101.50 *   1.87%*
Hmean     send-16384    27718.62 (   0.00%)    28152.58 *   1.57%*
Hmean     recv-64         211.39 (   0.00%)      213.24 (   0.87%)
Hmean     recv-128        415.25 (   0.00%)      426.45 *   2.70%*
Hmean     recv-256        814.75 (   0.00%)      835.32 *   2.53%*
Hmean     recv-1024      3171.61 (   0.00%)     3173.84 (   0.07%)
Hmean     recv-2048      6015.92 (   0.00%)     6046.41 (   0.51%)
Hmean     recv-3312      9210.17 (   0.00%)     9309.65 (   1.08%)
Hmean     recv-4096     11084.55 (   0.00%)    11250.86 *   1.50%*
Hmean     recv-8192     17769.76 (   0.00%)    18101.32 *   1.87%*
Hmean     recv-16384    27718.62 (   0.00%)    28152.46 *   1.57%*

netperf-tcp
                         vanilla		patched
(SNC on)
Hmean     64        1192.41 (   0.00%)     1253.72 *   5.14%*
Hmean     128       2354.50 (   0.00%)     2375.97 (   0.91%)
Hmean     256       4371.10 (   0.00%)     4412.90 (   0.96%)
Hmean     1024     13813.84 (   0.00%)    13987.31 (   1.26%)
Hmean     2048     21518.91 (   0.00%)    21677.74 (   0.74%)
Hmean     3312     25585.77 (   0.00%)    25943.95 *   1.40%*
Hmean     4096     27402.77 (   0.00%)    27700.88 *   1.09%*
Hmean     8192     31766.67 (   0.00%)    32187.68 *   1.33%*
Hmean     16384    36227.30 (   0.00%)    36542.97 (   0.87%)
(SNC off)
Hmean     64        1182.09 (   0.00%)     1219.15 *   3.14%*
Hmean     128       2316.35 (   0.00%)     2361.89 *   1.97%*
Hmean     256       4231.05 (   0.00%)     4314.53 *   1.97%*
Hmean     1024     13461.44 (   0.00%)    13543.85 (   0.61%)
Hmean     2048     21016.51 (   0.00%)    21270.62 *   1.21%*
Hmean     3312     24834.03 (   0.00%)    24960.05 (   0.51%)
Hmean     4096     26700.53 (   0.00%)    26959.99 (   0.97%)
Hmean     8192     31094.10 (   0.00%)    30989.89 (  -0.34%)
Hmean     16384    34953.23 (   0.00%)    35310.35 (   1.02%)

The netperf and tbench4 both have high SIS success rate, that is
~100% and ~50% respectively. So the effort paid for full scan for
idle cores is not very beneficial compared to its cost. This is
actually the case similar to the aforementioned <12 groups case
in hackbench.

Conclusion
==========

Taking a full scan for idle cores is generally good for making
better use of the cpu resources, yet there is still room for
improvement under certain circumstances.

Signed-off-by: Abel Wu <wuyun.abel@...edance.com>
Tested-by: Chen Yu <yu.c.chen@...el.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index efceb670e755..5af9bf246274 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6437,7 +6437,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
 		time = cpu_clock(this);
 	}
 
-	if (sched_feat(SIS_UTIL)) {
+	if (sched_feat(SIS_UTIL) && !has_idle_core) {
 		sd_share = rcu_dereference(per_cpu(sd_llc_shared, target));
 		if (sd_share) {
 			/* because !--nr is the condition to stop scan */
-- 
2.37.3