lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110225151113.GD2994@redhat.com>
Date:	Fri, 25 Feb 2011 10:11:13 -0500
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Dominik Klein <dk@...telegence.net>,
	linux kernel mailing list <linux-kernel@...r.kernel.org>,
	libvir-list@...hat.com
Subject: Re: Is it a workqueue related issue in 2.6.37 (Was: Re: [libvirt]
 blkio cgroup [solved])

On Fri, Feb 25, 2011 at 04:03:29PM +0100, Tejun Heo wrote:
> Hello,
> 
> On Fri, Feb 25, 2011 at 09:57:08AM -0500, Vivek Goyal wrote:
> > blk_throtl_work() calls generic_make_request() to dispatch some bios and I
> > guess blk_throtl_work() has been put to sleep because threre are no request
> > descriptors available and CFQ is frozen so no requests descriptors get freed
> > hence blk_throtl_work() never finishes.
> > 
> > Following caught my eye.
> > 
> >      ksoftirqd/0-3     [000]  1640.983585:   8,16   m   N cfq4810 slice
> > expired t=0
> >      ksoftirqd/0-3     [000]  1640.983588:   8,16   m   N cfq4810
> > sl_used=2 disp=6 charge=2 iops=0 sect=2080
> >      ksoftirqd/0-3     [000]  1640.983589:   8,16   m   N cfq4810
> > del_from_rr
> >      ksoftirqd/0-3     [000]  1640.983591:   8,16   m   N cfq schedule
> > dispatch
> >             sshd-3125  [004]  1640.983597: workqueue_queue_work: work
> > struct=ffff88102c3a3110 function=flush_to_ldisc workqueue=ffff88182c834a00
> > req_cpu=4 cpu=4
> >             sshd-3125  [004]  1640.983598: workqueue_activate_work: work
> > struct ffff88102c3a3110
> > 
> > CFQ tries to schedule a work and but there is no associated
> > "workqueue_queue_work" trace. So it looks like that work never got queued.
> > 
> > CFQ calls following.
> > 
> > cfq_log(cfqd, "schedule dispatch");
> > kblockd_schedule_work(cfqd->queue, &cfqd->unplug_work);
> > 
> > We do see "schedule dispatch" message and kblockd_schedule_work() calls
> > queue_work(). So what happended here? This is strange. I will put one
> > more trace after kblockd_schedule_work() to trace that function returned.
> 
> It could be that the unplug work was already queued and in pending
> state.  The second queueing request will be ignored then.  So, I think
> the problem is that blk_throtl_work() occupies kblockd but requires
> another work item (unplug_work) to make forward progress.  In such
> cases, forward progress cannot be guaranteed.  Either
> blk_throtl_work() or cfq unplug work should use a separate workqueue.

Ok, that would make sense. So blk_throtl_work() can not finish as CFQ
is not making progress and no request descriptors are being freed and
unplug_work() is not being called because blk_throtl_work() has not finished.
So that's cyclic dependency and I should use a separate work queue for
queueing throttle related work. I will write a patch.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ