[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YFh60XwkqZ4G0y2b@hirez.programming.kicks-ass.net>
Date: Mon, 22 Mar 2021 12:09:05 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Barry Song <song.bao.hua@...ilicon.com>
Cc: vincent.guittot@...aro.org, mingo@...hat.com,
juri.lelli@...hat.com, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
valentin.schneider@....com, aubrey.li@...ux.intel.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
xuwei5@...wei.com, prime.zeng@...ilicon.com, guodong.xu@...aro.org,
yangyicong@...wei.com, liguozhu@...ilicon.com,
linuxarm@...neuler.org
Subject: Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
On Sun, Mar 21, 2021 at 11:14:32AM +1300, Barry Song wrote:
> update_idle_core() is only done for the case of sched_smt_present.
> but test_idle_cores() is done for all machines even those without
> smt.
> this could contribute to up 8%+ hackbench performance loss on a
> machine like kunpeng 920 which has no smt. this patch removes the
> redundant test_idle_cores() for non-smt machines.
>
> we run the below hackbench with different -g parameter from 2 to
> 14, for each different g, we run the command 10 times and get the
> average time:
> $ numactl -N 0 hackbench -p -T -l 20000 -g $1
>
> hackbench will report the time which is needed to complete a certain
> number of messages transmissions between a certain number of tasks,
> for example:
> $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> Running in threaded mode with 10 groups using 40 file descriptors each
> (== 400 tasks)
> Each sender will pass 20000 messages of 100 bytes
>
> The below is the result of hackbench w/ and w/o this patch:
> g= 2 4 6 8 10 12 14
> w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> +4.1% +8.3% +7.3% +6.3%
>
The patch looks obvious, but the Changelog and Subject needed a lot of
help.
I've changed it like so:
---
Subject: sched/fair: Optimize test_idle_cores() for !SMT
From: Barry Song <song.bao.hua@...ilicon.com>
Date: Sun, 21 Mar 2021 11:14:32 +1300
From: Barry Song <song.bao.hua@...ilicon.com>
update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
SMT.
This can contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no SMT. This patch removes the
redundant test_idle_cores() for !SMT machines.
Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get
an average.
$ numactl -N 0 hackbench -p -T -l 20000 -g $1
The below is the result of hackbench w/ and w/o this patch:
g= 2 4 6 8 10 12 14
w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
+4.1% +8.3% +7.3% +6.3%
Signed-off-by: Barry Song <song.bao.hua@...ilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Acked-by: Mel Gorman <mgorman@...e.de>
Link: https://lkml.kernel.org/r/20210320221432.924-1-song.bao.hua@hisilicon.com
---
kernel/sched/fair.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int c
{
struct sched_domain_shared *sds;
- sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
- if (sds)
- return READ_ONCE(sds->has_idle_cores);
+ if (static_branch_likely(&sched_smt_present)) {
+ sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+ if (sds)
+ return READ_ONCE(sds->has_idle_cores);
+ }
return def;
}
Powered by blists - more mailing lists