[<prev] [next>] [day] [month] [year] [list]
Message-ID: <000001d066fb$384fc240$a8ef46c0$@memblaze.com>
Date: Wed, 25 Mar 2015 20:57:15 +0800
From: "Wenbo Wang" <wenbo.wang@...blaze.com>
To: <linux-kernel@...r.kernel.org>, <tglx@...utronix.de>
Cc: "'Chong Yuan'" <chong.yuan@...blaze.com>
Subject: Unexpected long wakeup latency on an idle core [3.10 centos 7]
Hi,
I am testing wakeup latency on a 40 core server with Centos 7 installed. I
used cpuset to isolate cores 10-19 and they are almost idle except some
kernel threads. Since currently the latest kernel I have is 3.10, I am not
sure if this issue still exists.
Problem
=======
Below is the result after testing wakeup latency on idle cores 10 - 13. It
shows that the wakeup latency is not consistent, sometimes it is huge (1000+
us) while in most cases it is about 3us.
[root@...alhost rt-tests]# ./cyclictest -a 10-13 -t 4 -p 99
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 1.00 1.02 0.91 2/633 36355
T: 0 (36098) P:99 I:1000 C:1588612 Min: 2 Act: 2 Avg: 2 Max:
11
T: 1 (36099) P:99 I:1500 C:1059075 Min: 3 Act: 4 Avg: 3 Max:
1089 <- huge latency
T: 2 (36100) P:99 I:2000 C: 794306 Min: 3 Act: 4 Avg: 3 Max:
1485
T: 3 (36101) P:99 I:2500 C: 635445 Min: 3 Act: 3 Avg: 3 Max:
11
Kernel Configuration
=================
Dynamic tick is enable.
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000
Analysis
=======
The long latency is caused by the execution of @run_timer_softirq. ftrace
shows that sometimes it takes 500+ us to complete.
13) 0.902 us | run_timer_softirq();
10) ! 461.167 us | run_timer_softirq();
10) + 48.572 us | run_timer_softirq();
12) ! 566.923 us | run_timer_softirq();
12) 7.564 us | run_timer_softirq();
10) ! 130.498 us | run_timer_softirq();
Because dynamic tick feature is enable, the idle thread stops tick on the
idle core, it makes the gap between local cpu @tvec_bases->jiffies and
global @jiffies very large. As a result, it takes lots of time for
@__run_timers (invoked by run_timer_softirq) to catch up.
Following result shows that sometimes the jiffie gap is over 200000.
[root@...alhost tracing]# stap -e 'probe kernel.function("__run_timers")
{printf("cpu=%d, %d\n", cpu(), @var("jiffies_64@...nel/timer.c") -
$base->timer_jiffies)}' | grep "cpu=1[0123]" | grep -v "cpu=10, 0"
cpu=13, 259616
cpu=12, 299999
cpu=11, 254670
cpu=12, 170722
cpu=12, 23780
cpu=11, 215427
cpu=12, 21767
cpu=12, 141
cpu=13, 299998
cpu=12, 83584
cpu=11, 84569
cpu=11, 59424
I tried to run an infinite loop on core 10 to keep timer tick alive, the
wakeup latency is much better on this core.
Solution
=======
The following two method may help:
1. Speed up the handling of __run_timers. Currently it increases the local
jiffies one by one, which is very slow.
2. Increate local jiffies in the idle thread.
Thanks,
-Wenbo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists