linux-kernel - Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20121003141311.09fb3ffc.akpm@linux-foundation.org>
Date:	Wed, 3 Oct 2012 14:13:11 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Cc:	Jiri Kosina <jkosina@...e.cz>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Pekka Enberg <penberg@...nel.org>,
	"Paul E. McKenney" <paul.mckenney@...aro.org>,
	Josh Triplett <josh@...htriplett.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] CPU hotplug, debug: Detect imbalance between
 get_online_cpus() and put_online_cpus()

On Wed, 03 Oct 2012 18:23:09 +0530
"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com> wrote:

> The synchronization between CPU hotplug readers and writers is achieved by
> means of refcounting, safe-guarded by the cpu_hotplug.lock.
> 
> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
> it. If we ever hit an imbalance between the two, we end up compromising the
> guarantees of the hotplug synchronization i.e, for example, an extra call to
> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
> where the refcount can go negative.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
> ---
> 
>  kernel/cpu.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index f560598..00d29bc 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>  	if (cpu_hotplug.active_writer == current)
>  		return;
>  	mutex_lock(&cpu_hotplug.lock);
> +	BUG_ON(cpu_hotplug.refcount == 0);
>  	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>  		wake_up_process(cpu_hotplug.active_writer);
>  	mutex_unlock(&cpu_hotplug.lock);

I think calling BUG() here is a bit harsh.  We should only do that if
there's a risk to proceeding: a risk of data loss, a reduced ability to
analyse the underlying bug, etc.

But a cpu-hotplug locking imbalance is a really really really minor
problem!  So how about we emit a warning then try to fix things up? 
This should increase the chance that the machine will keep running and
so will increase the chance that a user will be able to report the bug
to us.


--- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
+++ a/kernel/cpu.c
@@ -80,9 +80,12 @@ void put_online_cpus(void)
 	if (cpu_hotplug.active_writer == current)
 		return;
 	mutex_lock(&cpu_hotplug.lock);
-	BUG_ON(cpu_hotplug.refcount == 0);
-	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-		wake_up_process(cpu_hotplug.active_writer);
+	if (!--cpu_hotplug.refcount) {
+		if (WARN_ON(cpu_hotplug.refcount == -1))
+			cpu_hotplug.refcount++;	/* try to fix things up */
+		if (unlikely(cpu_hotplug.active_writer))
+			wake_up_process(cpu_hotplug.active_writer);
+	}
 	mutex_unlock(&cpu_hotplug.lock);
 
 }
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/