linux-kernel - Re: Failure to release lock after CPU hot-unplug canceled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070109121738.GC9563@osiris.boeblingen.de.ibm.com>
Date:	Tue, 9 Jan 2007 13:17:38 +0100
From:	Heiko Carstens <heiko.carstens@...ibm.com>
To:	Benjamin Gilbert <bgilbert@...cmu.edu>
Cc:	linux-kernel@...r.kernel.org, vatsa@...ibm.com,
	Ingo Molnar <mingo@...e.hu>, Gautham shenoy <ego@...ibm.com>,
	Andrew Morton <akpm@...l.org>
Subject: Re: Failure to release lock after CPU hot-unplug canceled

On Mon, Jan 08, 2007 at 12:07:19PM -0500, Benjamin Gilbert wrote:
> If a module returns NOTIFY_BAD to a CPU_DOWN_PREPARE callback, subsequent
> attempts to take a CPU down cause the write into sysfs to wedge.
> 
> This is reproducible in 2.6.20-rc4, but was originally found in 2.6.18.5.
> 
> Steps to reproduce:
> 
> 1.  Load the test module included below
> 2.  Run the following shell commands as root:
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> The second echo command hangs in uninterruptible sleep during the write()
> call, and the following appears in dmesg:
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.20-rc4-686 #1
> -------------------------------------------------------
> bash/1699 is trying to acquire lock:
>  (cpu_add_remove_lock){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
> 
> but task is already holding lock:
>  (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
> 
> which lock already depends on the new lock.

There is something like this

raw_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED, (void *)(long)cpu);

missing in kernel cpu.c in _cpu_down() in case CPU_DOWN_PREPARE
returned with NOTIFY_BAD. However... this reveals that there is just a
more fundamental problem.

The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
returns NOTIFY_BAD the rest of the entries in the callchain won't be
called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
entry.
So we might even end up with a mutex_unlock(&workqueue_mutex) even if
mutex_lock(&workqueue_mutex) hasn't been called...

Maybe this will be addressed by somebody else since cpu hotplug locking
is being worked on (again).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/