[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160518114721.GF3193@twins.programming.kicks-ass.net>
Date:	Wed, 18 May 2016 13:47:21 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	"Gaurav Jindal (Gaurav Jindal)" <Gaurav.Jindal@...eadtrum.com>
Cc:	"mingo@...hat.com" <mingo@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Sanjeev Yadav (Sanjeev Kumar Yadav)" <Sanjeev.Yadav@...eadtrum.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Patch]cpuidle: Save current cpu as local variable instead of
 calling smp_processor_id() in loop
Much better; but the below does not in fact apply.
> ---
> 
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 1214f0a..82698e5 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -185,6 +185,8 @@ exit_idle:
>   */
>  static void cpu_idle_loop(void)
>  {
> +	int cpu_id;
> 
> +	cpu_id = smp_processor_id();
This hunk assumes there's an empty line between the '{' and 'while
(1)', this isn't actually there.
>  	while (1) {
>  		/*
>  		 * If the arch has a polling bit, we maintain an invariant:
> @@ -202,7 +204,7 @@ static void cpu_idle_loop(void)
>  			check_pgt_cache();
>  			rmb();
> 
> -			if (cpu_is_offline(smp_processor_id()))
> +			if (cpu_is_offline(cpu_id))
>                                 arch_cpu_idle_dead();
And this hunk fails to apply because the recent cpu hotplug work added
another call here.
> 
>  			local_irq_disable();
I've taken your patch and modified it; see below.
Another optimization you could look at is removing that rmb(); I don't
actually think its needed, but you'd need to find why it was added and
then check it was in fact needed back then, and then check if it is in
fact still needed now.
Its a bit of a trek through git history, but lfence / dsb ld are
expensive instructions.
---
Subject: sched,idle: Optimize idle loop
From: "Gaurav Jindal (Gaurav Jindal)" <Gaurav.Jindal@...eadtrum.com>
Date: Thu, 12 May 2016 10:13:33 +0000
Currently, smp_processor_id() is used to fetch the current CPU in
cpu_idle_loop. Every time the idle thread runs, it fetches current CPU
using smp_processor_id().
Since the idle thread is per CPU, the current CPU is constant, so we
can lift the load out of the loop, saving execution cycles/time in the
loop.
x86-64:
Before patch (execution in loop):
	148:    0f ae e8                lfence
	14b:    65 8b 04 25 00 00 00 00 mov %gs:0x0,%eax
	152:    00
	153:    89 c0                   mov %eax,%eax
	155:    49 0f a3 04 24          bt %rax,(%r12)
After patch (execution in loop):
	150:    0f ae e8                lfence
	153:    4d 0f a3 34 24          bt %r14,(%r12)
ARM64:
Before patch (execution in loop):
	168:    d5033d9f        dsb     ld
	16c:    b9405661        ldr     w1,[x19,#84]
	170:    1100fc20        add     w0,w1,#0x3f
	174:    6b1f003f        cmp     w1,wzr
	178:    1a81b000        csel    w0,w0,w1,lt
	17c:    130c7000        asr     w0,w0,#6
	180:    937d7c00        sbfiz   x0,x0,#3,#32
	184:    f8606aa0        ldr     x0,[x21,x0]
	188:    9ac12401        lsr     x1,x0,x1
	18c:    36000e61        tbz     w1,#0,358
After patch (execution in loop):
	1a8:    d50339df        dsb     ld
	1ac:    f8776ac0        ldr     x0,[x22,x23]
	ab0:    ea18001f        tst     x0,x24
	1b4:    54000ea0        b.eq    388
Further observance on ARM64 for 4 seconds shows that cpu_idle_loop is
called 8672 times. Shifting the code will save instructions executed
in loop and eventually time as well.
Cc: "mingo@...hat.com" <mingo@...hat.com>
Signed-off-by: Gaurav Jindal <gaurav.jindal@...eadtrum.com>
Reviewed-by: Sanjeev Yadav <sanjeev.yadav@...eadtrum.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Link: http://lkml.kernel.org/r/20160512101330.GA488@gauravjindalubtnb.del.spreadtrum.com
---
 kernel/sched/idle.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -201,6 +201,8 @@ static void cpuidle_idle_call(void)
  */
 static void cpu_idle_loop(void)
 {
+	int cpu = smp_processor_id();
+
 	while (1) {
 		/*
 		 * If the arch has a polling bit, we maintain an invariant:
@@ -219,7 +221,7 @@ static void cpu_idle_loop(void)
 			check_pgt_cache();
 			rmb();
 
-			if (cpu_is_offline(smp_processor_id())) {
+			if (cpu_is_offline(cpu)) {
 				cpuhp_report_idle_dead();
 				arch_cpu_idle_dead();
 			}
Powered by blists - more mailing lists