[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160511093248.GA29725@gauravjindalubtnb.del.spreadtrum.com>
Date: Wed, 11 May 2016 09:32:57 +0000
From: "Gaurav Jindal (Gaurav Jindal)" <Gaurav.Jindal@...eadtrum.com>
To: "peterz@...radead.org" <peterz@...radead.org>,
"mingo@...hat.com" <mingo@...hat.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Sanjeev Yadav (Sanjeev Kumar Yadav)" <Sanjeev.Yadav@...eadtrum.com>
Subject: [Patch]cpuidle: Save current cpu as local once instead of calling
smp_processor_id() in loop
Hi
Currently, smp_processor_id() is used to fetch the current cpu in cpu_idle_loop.
Everytime the idle thread runs, it fetches the current cpu using
smp_processor_id().
For idle thread which is per cpu, current cpu is constant and cannot
change at runtime. So moving the smp_processor_id() before the loop
saves execution cycles/time in loop.
Patch:
----------------------------------------------------------------------
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 1214f0a..82698e5 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -185,6 +185,8 @@ exit_idle:
*/
static void cpu_idle_loop(void)
{
+ int cpu_id;
+ cpu_id = smp_processor_id();
while(1) {
/*
* If the arch has a polling bit, we maintain an invariant:
@@ -202,7 +204,7 @@ static void cpu_idle_loop(void)
check_pgt_cache();
rmb();
- if (cpu_is_offline(smp_processor_id()))
+ if (cpu_is_offline(cpu_id))
arch_cpu_idle_dead();
local_irq_disable();
--------------------------------------------------------------------
With patch I observed the assembly code(x-86 and ARM64), it saves
instructions related to smp_processor_id().
For x-86:
Before patch(execution in loop):
148: 0f ae e8 lfence
14b: 65 8b 04 25 00 00 00 mov %gs:0x0,%eax
152: 00
153: 89 c0 mov %eax,%eax
155: 49 0f a3 04 24 bt %rax,(%r12)
After patch(execution in loop):
150: 0f ae e8 lfence
153: 4d 0f a3 34 24 bt %r14,(%r12)
For ARM64:
Before patch(execution in loop):
168: d5033d9f dsb ld
16c: b9405661 ldr w1, [x19,#84]
170: 1100fc20 add w0, w1, #0x3f
174: 6b1f003f cmp w1, wzr
178: 1a81b000 csel w0, w0, w1, lt
17c: 13067c00 asr w0, w0, #6
180: 937d7c00 sbfiz x0, x0, #3, #32
184: f8606aa0 ldr x0, [x21,x0]
188: 9ac12401 lsr x1, x0, x1
18c: 36000e61 tbz w1, #0, 358
After patch(execution in loop):
1a8: d5033d9f dsb ld
1ac: f8776ac0 ldr x0, [x22,x23]
1b0: ea18001f tst x0, x24
1b4: 54000ea0 b.eq 388
Further observance for 4 seconds on ARM64 architecture shows that cpu_idle_loop is
hit 8672 times. If calculation mechanism is changed it will save
instructions and eventually time as well.
Signed-off-by: gaurav jindal<gaurav.jindal@...eadtrum.com>
Reviewed-by: sanjeev yadav<sanjeev.yadav@...eadtrum.com>
Powered by blists - more mailing lists