[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50B93D3D.3030106@iskon.hr>
Date: Sat, 01 Dec 2012 00:11:57 +0100
From: Zlatko Calusic <zlatko.calusic@...on.hr>
To: Tejun Heo <tj@...nel.org>
CC: linux-kernel@...r.kernel.org
Subject: Re: High context switch rate, ksoftirqd's chewing cpu
On 30.11.2012 23:52, Tejun Heo wrote:
> Hello, Zlatko.
>
> Sorry about the delay. Your message was in my spam folder. The
> attachment seems to have confused the filter.
>
> On Sat, Nov 17, 2012 at 02:01:29PM +0100, Zlatko Calusic wrote:
>> This week I spent some hours tracking a regression in 3.7 kernel
>> that was producing high context switch rate on one of my machines. I
>> carefully bisected between 3.6 and 3.7-rc1 and eventually found this
>> commit a culprit:
>>
>> commit e7c2f967445dd2041f0f8e3179cca22bb8bb7f79
>> Author: Tejun Heo <tj@...nel.org>
>> Date: Tue Aug 21 13:18:24 2012 -0700
>>
>> workqueue: use mod_delayed_work() instead of __cancel + queue
> ...
>>
>> Then I carefully reverted chunk by chunk to find out what exact
>> change is responsible for the regression. You can find it attached
>> as wq.patch (to preserve whitespace). Very simple modification with
>> wildly different behavior on only one of my machines, weird. I'm
>> also attaching ctxt/s graph that shows the impact nicely. I'll
>> gladly provide any additional info that could help you resolve this.
>>
>> Please Cc: on reply (not subscribed to lkml).
>>
>> Regards,
>> --
>> Zlatko
>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 4b4dbdf..4b8b606 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -319,10 +319,8 @@ EXPORT_SYMBOL(__blk_run_queue);
>> */
>> void blk_run_queue_async(struct request_queue *q)
>> {
>> - if (likely(!blk_queue_stopped(q))) {
>> - __cancel_delayed_work(&q->delay_work);
>> - queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
>> - }
>> + if (likely(!blk_queue_stopped(q)))
>> + mod_delayed_work(kblockd_workqueue, &q->delay_work, 0);
>> }
>> EXPORT_SYMBOL(blk_run_queue_async);
>
> That's intersting. Is there anything else noticeably different than
> the ctxsw counts? e.g. CPU usage, IO throughput / latency, etc...
> Also, can you please post the kernel boot log from the machine? I
> assume that the issue is readily reproducible? Are you up for trying
> some debug patches?
>
> Thanks.
>
Hey Tejun! Thanks for replying.
It's an older C2D machine, I've attached the kernel boot log. Funny
thing is that on the other half a dozen machines I don't observe any
problems, only on this one. And it's reproducible every time. I don't
see any other anomalies beside the two I already mentioned, high context
switch rate and ksoftirqd daemons eating more CPU, probably as a
consequence.
I'll gladly try your patch and send my observations tommorow, as I've
just started md resync on the machine, which will take couple of hours.
Regards,
--
Zlatko
View attachment "dmesg.txt" of type "text/plain" (23703 bytes)
Powered by blists - more mailing lists