[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <145EA9B5-A40F-417E-93A9-DFABA54EA638@gmail.com>
Date: Sat, 3 Aug 2013 15:37:46 +0800
From: ethan <ethan.kernel@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, johlstei@...eaurora.org,
Yinghai Lu <yinghai@...nel.org>, Jin Feng <joe.jin@...cle.com>
Subject: Re: [PATCH V3]hrtimer: Fix a performance regression by disable reprogramming in remove_hrtimer
Peter and tglx,
Some other tough hacking and testing with result FYI,
With the default kernel 2.6.32-279.19.1.el6.x86_64 in CentOS 6.3 running on my ASUS 4 core Intel i5 server, almost got the best performance of
tool http://people.redhat.com/mingo/cfs-scheduler/tools/pipe-test-1m.c
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.704s
user 0m0.047s
sys 0m4.815s
[root@...alhost ~]# time ./pipe-test-1m
real 0m8.000s
user 0m0.071s
sys 0m5.035s
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.386s
user 0m0.086s
sys 0m4.591s
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.919s
user 0m0.064s
sys 0m4.912s
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.949s
user 0m0.083s
sys 0m4.917s
[root@...alhost ~]# time ./pipe-test-1m
rrr
real 0m7.913s
user 0m0.070s
sys 0m4.903s
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.953s
user 0m0.092s
sys 0m4.881s
[root@...alhost ~]# time ./pipe-test-1m
real 0m8.059s
user 0m0.108s
sys 0m5.037s
[root@...alhost ~]#
Then compiled and boot stable 3.11.0-rc3 with default configuration, redid the same test. got very bad performance:
root@...alhost ~]# uname -a
Linux localhost 3.11.0-rc3 #4 SMP Wed Jul 31 16:10:56 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
real 0m10.730s
user 0m0.245s
sys 0m6.596s
[root@...alhost ~]# time ./pipe-test-1m
real 0m10.661s
user 0m0.218s
sys 0m6.520s
[root@...alhost ~]# time ./pipe-test-1m
real 0m10.699s
user 0m0.233s
sys 0m6.534s
[root@...alhost ~]# time ./pipe-test-1m
real 0m10.616s
user 0m0.191s
sys 0m6.505s
[root@...alhost ~]# time ./pipe-test-1m
real 0m10.546s
user 0m0.214s
sys 0m6.441s
[root@...alhost ~]# time ./pipe-test-1m
real 0m10.631s
user 0m0.204s
sys 0m6.509s
First 'tough' hacking is disable the reprogramming in _remove_hrtimer() within 3.11-rc3 code and redo the test.
much better.
root@...alhost ~]# time ./pipe-test-1m
real 0m9.447s
user 0m0.227s
sys 0m5.900s
[root@...alhost ~]# time ./pipe-test-1m
real 0m9.507s
user 0m0.226s
sys 0m5.922s
[root@...alhost ~]# time ./pipe-test-1m
real 0m9.495s
user 0m0.228s
sys 0m5.916s
[root@...alhost ~]# time ./pipe-test-1m
real 0m9.470s
user 0m0.229s
sys 0m5.938s
[root@...alhost ~]# time ./pipe-test-1m
real 0m9.484s
user 0m0.269s
sys 0m5.875s
[root@...alhost ~]# time ./pipe-test-1m
real 0m9.328s
user 0m0.242s
sys 0m5.767s
While I monitor the wake-up with powertop, got
Top causes for wakeups:
98.5% ( inf) <kernel IPI> : Rescheduling interrupts
0.5% ( inf) swapper/3 : hrtimer_start_range_ns (tick_sched_timer)
0.3% ( inf) swapper/2 : hrtimer_start_range_ns (tick_sched_timer)
0.2% ( inf) swapper/1 : hrtimer_start_range_ns (tick_sched_timer)
0.2% ( inf) swapper/0 : hrtimer_start_range_ns (tick_sched_timer)
So I did the second tough hacking, commented out the rescheduling IPI sending in following function and re-did the test.
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 4137890..c27f04f 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -137,7 +137,7 @@ static inline void play_dead(void)
static inline void smp_send_reschedule(int cpu)
{
- smp_ops.smp_send_reschedule(cpu);
+ /* smp_ops.smp_send_reschedule(cpu); */
}
Got the performance as best as 2.6.32 kernel and the scheduling seems also OK.
root@...alhost ~]# time ./pipe-test-1m
real 0m7.661s
user 0m0.179s
sys 0m4.880s
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.473s
user 0m0.189s
sys 0m4.782s
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.658s
user 0m0.195s
sys 0m4.899s
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.644s
user 0m0.194s
sys 0m4.941s
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.694s
user 0m0.189s
sys 0m4.925s
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.694s
user 0m0.197s
sys 0m4.915s
[root@...alhost ~]# time ./pipe-test-1m
real 0m7.597s
user 0m0.190s
sys 0m4.886s
The the two processes of pipe-test-1m and its child seem could be balanced from cpu0 to cpu3 well,
#top
f J
14888 root 20 0 68 0 R 73.2 0.0 0:03.22 2 pip1m
14887 root 20 0 284 224 S 63.4 0.0 0:03.23 0 pip1m
And so the above tough hacking and test basicly show the No.1 expensive thing is the rescheduling IPI, and
the No.2 expensive thing is the extra hrtimer reprogramming/tick in Linux 3.11-rc3 code.
We need manage to do as less as possible rescheduling IPI and reprogramming to get better performance.
Does it(the tough hacking and the test) make sense ? and the result rational ?
Thanks,
Ethan
在 2013-7-30,下午7:59,Peter Zijlstra <peterz@...radead.org> 写道:
> On Tue, Jul 30, 2013 at 07:44:03PM +0800, Ethan Zhao wrote:
>> Got it.
>> what tglx and you mean
>>
>>
>> So the expensive thing maybe not inside the schedule(), but could
>> outside the scheduler(), the more bigger forever loop.
>>
>> This is one part of what I am facing.
>
> Right, so it would be good if you could further diagnose the problem so
> we can come up with a solution that cures the problem while retaining
> the current 'desired' properties.
>
> The patch you pinpointed caused a regression in that it would wake from
> NOHZ mode far too often. Could it be that the now longer idle sections
> cause your CPU to go into deeper idle modes and you're suffering from
> idle-exit latencies?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists