[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <52A800DD.3070201@symas.com>
Date: Tue, 10 Dec 2013 22:06:21 -0800
From: Howard Chu <hyc@...as.com>
To: Li Zefan <lizefan@...wei.com>
CC: Linux Kernel Mailing List <Linux-Kernel@...r.Kernel.ORG>
Subject: Re: sched: RT throttling activated, 3.12.3
Howard Chu wrote:
> Howard Chu wrote:
>> Li Zefan wrote:
>>> On 2013/12/11 10:59, Howard Chu wrote:
>>>> I just upgraded a system from a 3.5 kernel to 3.12.3 and attempted to run some new benchmarks on it. I see my test program ramps up in CPU usage for a few seconds and then it gradually tails off. There's nothing obvious in the user code to trigger this behavior, so I check dmesg, and see this:
>>>>
>>>> [ 55.037057] JFS: nTxBlock = 8192, nTxLock = 65536
>>>> [163591.807470] perf samples too long (2758 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
>>>> [164061.362762] perf samples too long (5204 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
>>>> [167969.339513] [sched_delayed] sched: RT throttling activated
>>>> [182741.484637] perf samples too long (294588 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
>>>> [182741.484726] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 36.665 msecs
>>>> [182822.633084] perf samples too long (292359 > 20000), lowering kernel.perf_event_max_sample_rate to 6250
>>>> [182905.606119] perf samples too long (290291 > 40000), lowering kernel.perf_event_max_sample_rate to 3250
>>>> [199384.293514] perf samples too long (288142 > 76923), lowering kernel.perf_event_max_sample_rate to 1750
>>>> [208507.301027] perf samples too long (285964 > 142857), lowering kernel.perf_event_max_sample_rate to 1000
>>>> [208528.976208] perf samples too long (283799 > 250000), lowering kernel.perf_event_max_sample_rate to 500
>>>>
>>>> Why is the kernel throttling my server?
>>>>
>>>
>>> Because that is the default setting of the kernel.
>>
>> Apparently a "new" default that didn't exist in 3.5? The code in question is
>> not a realtime process. This behavior also wasn't seen in 3.10 or any older
>> kernels.
>
> I just downgraded to 3.10.23 to doublecheck - everything is running normally
> there, although a few percent slower than I expected. (Last time I tried 3.10
> it was 3.10.11.)
>
For comparison, here's a "normally" behaving benchmark run:
http://highlandsun.com/hyc/linux3.10/
The result is a fairly steady 15,000 ops/sec and CPU usage is around 190%
(this is a quadcore machine).
On the 3.12.3 kernel:
http://highlandsun.com/hyc/linux3.12/
The CPU usage is initially around 180% but quickly plummets to about 7% and
stays there. This is a pretty major regression for a "default" kernel setting.
And given that the target process isn't running with realtime scheduling
priority, this can only be considered a bug. (Btw, setting both
sched_rt_period_us and sched_rt_runtime_us to -1 has no effect on this behavior.)
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists