linux-kernel - Re: INFO: possible circular locking dependency at cleanup_workqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090519120010.GA14782@redhat.com>
Date:	Tue, 19 May 2009 14:00:10 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	Johannes Berg <johannes@...solutions.net>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Zdenek Kabelac <zdenek.kabelac@...il.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: INFO: possible circular locking dependency at
	cleanup_workqueue_thread

On 05/19, Johannes Berg wrote:
>
> On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
>
> > > Maybe it shouldn't do that from the CPU_POST_DEAD
> > > notifier?
> >
> > Well, in any case we should understand why we have the problem, before
> > changing the code. And CPU_POST_DEAD is not special, why should we treat
> > it specially and skip lock_map_acquire(wq->lockdep_map) ?
>
> I'm not familiar enough with the code -- but what are we really trying
> to do in CPU_POST_DEAD? It seems to me that at that time things must
> already be off the CPU, so ...?

Yes, this cpu is dead, we should do cleanup_workqueue_thread() to kill
cwq->thread.

> On the other hand that calls
> flush_cpu_workqueue() so it seems it would actually wait for the work to
> be executed on some other CPU, within the CPU_POST_DEAD notification?

Yes. Because we can't just kill cwq->thread, we can have the pending
work_structs so we have to flush.

Why can't we move these works to another CPU? We can, but this doesn't
really help. Because in any case we should at least wait for
cwq->current_work to complete.

Why do we use CPU_POST_DEAD, and not (say) CPU_DEAD to flush/kill ?
Because work->func() can sleep in get_online_cpus(), we can't flush
until we drop cpu_hotplug.lock.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/