linux-kernel - Re: [PATCH] fix-flush_workqueue-vs-cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070107125603.GA74@tv-sign.ru>
Date:	Sun, 7 Jan 2007 15:56:03 +0300
From:	Oleg Nesterov <oleg@...sign.ru>
To:	Srivatsa Vaddagiri <vatsa@...ibm.com>
Cc:	Andrew Morton <akpm@...l.org>, David Howells <dhowells@...hat.com>,
	Christoph Hellwig <hch@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...l.org>,
	linux-kernel@...r.kernel.org, Gautham shenoy <ego@...ibm.com>
Subject: Re: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update

On 01/07, Srivatsa Vaddagiri wrote:
>
> On Sat, Jan 06, 2007 at 08:34:16PM +0300, Oleg Nesterov wrote:
> > I suspect this can't help either.
> > 
> > The problem is that flush_workqueue() may be called while cpu hotplug event
> > in progress and CPU_DEAD waits for kthread_stop(), so we have the same dead
> > lock if work->func() does flush_workqueue(). This means that Andrew's change
> > to use preempt_disable() is good and anyway needed.
> 
> Well ..a lock_cpu_hotplug() in run_workqueue() and support for recursive
> calls to lock_cpu_hotplug() by the same thread will avoid the problem
> you mention.

Srivatsa, I'm completely new to cpu-hotplug, so please correct me if I'm
wrong (in fact I _hope_ I am wrong) but as I see it, the hotplug/workqueue
interaction is broken by design, it can't be fixed by changing just locking.

Once again. CPU dies, CPU_DEAD calls kthread_stop() and sleeps until
cwq->thread exits. To do so, this thread must at least complete the
currently running work->func().

work->func() calls flush_workque(WQ), it does lock_cpu_hotplug() or
_whatever_. Now the question, does it block?

if YES:
	This is what the stable tree does - deadlock.

if NOT:
	This is what we have with Andrew's s/mutex_lock/preempt_disable/
	patch - race or deadlock, we have a choice.

	Suppose that WQ has pending works on that dead CPU. Note that
	at this point this CPU does not present on cpu_online_map.
	This means that (without other changes) we have lost.

		- flush_workque(WQ) can't return until CPU_DEAD transfers
		  these works to some another CPU on the cpu_online_map.

		- CPU_DEAD can't do take_over_work() untill flush_workque()
		  returns.

Andrew, Ingo, this also means that freezer can't solve this particular
problem either (if i am right).

Thoughts?

Oleg.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/