lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1252398006.7746.3.camel@twins>
Date:	Tue, 08 Sep 2009 10:20:06 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...e.hu>,
	linux-mm <linux-mm@...ck.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Oleg Nesterov <onestero@...hat.com>,
	lkml <linux-kernel@...r.kernel.org>
Subject: Re: [rfc] lru_add_drain_all() vs isolation

On Tue, 2009-09-08 at 08:56 +0900, KOSAKI Motohiro wrote:
> Hi Peter,
> 
> > On Mon, 2009-09-07 at 10:17 +0200, Mike Galbraith wrote:
> > 
> > > [  774.651779] SysRq : Show Blocked State
> > > [  774.655770]   task                        PC stack   pid father
> > > [  774.655770] evolution.bin D ffff8800bc1575f0     0  7349   6459 0x00000000
> > > [  774.676008]  ffff8800bc3c9d68 0000000000000086 ffff8800015d9340 ffff8800bb91b780
> > > [  774.676008]  000000000000dd28 ffff8800bc3c9fd8 0000000000013340 0000000000013340
> > > [  774.676008]  00000000000000fd ffff8800015d9340 ffff8800bc1575f0 ffff8800bc157888
> > > [  774.676008] Call Trace:
> > > [  774.676008]  [<ffffffff812c4a11>] schedule_timeout+0x2d/0x20c
> > > [  774.676008]  [<ffffffff812c4891>] wait_for_common+0xde/0x155
> > > [  774.676008]  [<ffffffff8103f1cd>] ? default_wake_function+0x0/0x14
> > > [  774.676008]  [<ffffffff810c0e63>] ? lru_add_drain_per_cpu+0x0/0x10
> > > [  774.676008]  [<ffffffff810c0e63>] ? lru_add_drain_per_cpu+0x0/0x10
> > > [  774.676008]  [<ffffffff812c49ab>] wait_for_completion+0x1d/0x1f
> > > [  774.676008]  [<ffffffff8105fdf5>] flush_work+0x7f/0x93
> > > [  774.676008]  [<ffffffff8105f870>] ? wq_barrier_func+0x0/0x14
> > > [  774.676008]  [<ffffffff81060109>] schedule_on_each_cpu+0xb4/0xed
> > > [  774.676008]  [<ffffffff810c0c78>] lru_add_drain_all+0x15/0x17
> > > [  774.676008]  [<ffffffff810d1dbd>] sys_mlock+0x2e/0xde
> > > [  774.676008]  [<ffffffff8100bc1b>] system_call_fastpath+0x16/0x1b
> > 
> > FWIW, something like the below (prone to explode since its utterly
> > untested) should (mostly) fix that one case. Something similar needs to
> > be done for pretty much all machine wide workqueue thingies, possibly
> > also flush_workqueue().
> 
> Can you please explain reproduce way and problem detail?
> 
> AFAIK, mlock() call lru_add_drain_all() _before_ grab semaphoe. Then,
> it doesn't cause any deadlock.

Suppose you have 2 cpus, cpu1 is busy doing a SCHED_FIFO-99 while(1),
cpu0 does mlock()->lru_add_drain_all(), which does
schedule_on_each_cpu(), which then waits for all cpus to complete the
work. Except that cpu1, which is busy with the RT task, will never run
keventd until the RT load goes away.

This is not so much an actual deadlock as a serious starvation case.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ