[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110225145708.GB2994@redhat.com>
Date: Fri, 25 Feb 2011 09:57:08 -0500
From: Vivek Goyal <vgoyal@...hat.com>
To: Tejun Heo <tj@...nel.org>
Cc: Dominik Klein <dk@...telegence.net>,
linux kernel mailing list <linux-kernel@...r.kernel.org>,
libvir-list@...hat.com
Subject: Re: Is it a workqueue related issue in 2.6.37 (Was: Re: [libvirt]
blkio cgroup [solved])
On Fri, Feb 25, 2011 at 02:18:50PM +0100, Tejun Heo wrote:
> Hello,
>
> On Fri, Feb 25, 2011 at 12:46:16PM +0100, Dominik Klein wrote:
> > With 2.6.37 (also tried .1 and .2) it does not work but end up like I
> > documented. With 2.6.38-rc1, it does work. With deadline scheduler, it
> > also works in 2.6.37.
>
> Okay, here's the problematic part.
>
> <idle>-0 [013] 1640.975562: workqueue_queue_work: work struct=ffff88080f14f270 function=blk_throtl_work workqueue=ffff88102c8fc700 req_cpu=13 cpu=13
> <idle>-0 [013] 1640.975564: workqueue_activate_work: work struct ffff88080f14f270
> <...>-477 [013] 1640.975574: workqueue_execute_start: work struct ffff88080f14f270: function blk_throtl_work
> <idle>-0 [013] 1641.087450: workqueue_queue_work: work struct=ffff88080f14f270 function=blk_throtl_work workqueue=ffff88102c8fc700 req_cpu=13 cpu=13
>
> The workqueue is per-cpu, so we only need to follow cpu=13 cases.
> @1640, blk_throtl_work() is queued, activated and starts executing but
> never finishes. The same work item is never executed more than once
> at the same on the same CPU, so when the next work item is queued, it
> doesn't get activated until the previous execution is complete.
>
> The next thing to do would be finding out why blk_throtl_work() isn't
> finishing. sysrq-t or /proc/PID/stack should show us where it's
> stalled.
Hi Tejun,
blk_throtl_work() calls generic_make_request() to dispatch some bios and I
guess blk_throtl_work() has been put to sleep because threre are no request
descriptors available and CFQ is frozen so no requests descriptors get freed
hence blk_throtl_work() never finishes.
Following caught my eye.
ksoftirqd/0-3 [000] 1640.983585: 8,16 m N cfq4810 slice
expired t=0
ksoftirqd/0-3 [000] 1640.983588: 8,16 m N cfq4810
sl_used=2 disp=6 charge=2 iops=0 sect=2080
ksoftirqd/0-3 [000] 1640.983589: 8,16 m N cfq4810
del_from_rr
ksoftirqd/0-3 [000] 1640.983591: 8,16 m N cfq schedule
dispatch
sshd-3125 [004] 1640.983597: workqueue_queue_work: work
struct=ffff88102c3a3110 function=flush_to_ldisc workqueue=ffff88182c834a00
req_cpu=4 cpu=4
sshd-3125 [004] 1640.983598: workqueue_activate_work: work
struct ffff88102c3a3110
CFQ tries to schedule a work and but there is no associated
"workqueue_queue_work" trace. So it looks like that work never got queued.
CFQ calls following.
cfq_log(cfqd, "schedule dispatch");
kblockd_schedule_work(cfqd->queue, &cfqd->unplug_work);
We do see "schedule dispatch" message and kblockd_schedule_work() calls
queue_work(). So what happended here? This is strange. I will put one
more trace after kblockd_schedule_work() to trace that function returned.
Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists