lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <fbfb6488-8d65-85eb-e764-d18c9589a0cc@linux.vnet.ibm.com>
Date:   Wed, 7 Jun 2017 16:13:15 -0500
From:   John Allen <jallen@...ux.vnet.ibm.com>
To:     tglx@...utronix.de, linux-kernel@...r.kernel.org
Cc:     Nathan Fontenot <nfont@...ux.vnet.ibm.com>,
        Michael Bringmann <mwb@...ux.vnet.ibm.com>,
        John Allen <jallen@...ux.vnet.ibm.com>
Subject: [RFC] cpu/hotplug: Modify lock status before making cpu hotplug
 callbacks

A deadlock has been observed during a cpu hot add on powerpc machines.

The situation is as follows:
First, in _cpu_up, we take the cpu_hotplug lock and towards the end of _cpu_up,
we make the cpu hotplug callbacks, one of which is arch_update_cpu_topology.
For most other architectures, making this call while the parent thread has the
cpu_hotplug lock seems harmless as most implementations of
arch_update_cpu_topology appear to be quite simple. However, the powerpc
implementation is significantly more complex and requires us to make a call to
stop_machine which in turn attempts to take the cpu_hotplug lock in
get_online_cpus. We then deadlock as the parent thread is waiting on the child
to complete and the child is waiting on the parent to release the lock.

This solution attempts to resolve the issue by incrementing the cpu_hotplug
refcount and releasing the cpu_hotplug mutex in _cpu_up so that the subsequent
call to get_online_cpus can take the mutex while other invocations of _cpu_up
are still prevented from executing as the refcount is non-zero. From my testing,
this seems to resolve the deadlock, but I'm not sure if there is a reason that
get_online_cpus *should* be locked out at this point.

Previous RFC for an alternate solution from Michael Bringmann with additional
details about the bug can be seen here:
https://patchwork.ozlabs.org/patch/771293/

Signed-off-by: John Allen <jallen@...ux.vnet.ibm.com>
---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 9ae6fbe..fc19437 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -776,6 +776,10 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 	cpu_hotplug_begin();

 	cpuhp_tasks_frozen = tasks_frozen;
+	
+	cpu_hotplug.active_writer = NULL;
+	atomic_inc(&cpu_hotplug.refcount);
+	mutex_unlock(&cpu_hotplug.lock);

 	prev_state = st->state;
 	st->target = target;
@@ -810,6 +814,9 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 		cpuhp_kick_ap_work(cpu);
 	}

+	put_online_cpus();
+
+	return ret;
 out:
 	cpu_hotplug_done();
 	return ret;
@@ -939,7 +946,14 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
 	 * responsible for bringing it up to the target state.
 	 */
 	target = min((int)target, CPUHP_BRINGUP_CPU);
+
+	cpu_hotplug.active_writer = NULL;
+	atomic_inc(&cpu_hotplug.refcount);
+	mutex_unlock(&cpu_hotplug.lock);
 	ret = cpuhp_up_callbacks(cpu, st, target);
+	put_online_cpus();
+
+	return ret;
 out:
 	cpu_hotplug_done();
 	return ret;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ