lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140122055239.GA29418@iris.ozlabs.ibm.com>
Date:	Wed, 22 Jan 2014 16:52:39 +1100
From:	Paul Mackerras <paulus@...ba.org>
To:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock

This arises out of a report from a tester that offlining a CPU never
finished on a system they were testing.  This was on a POWER8 running
a 3.10.x kernel, but the issue is still present in mainline AFAICS.

What I found when I looked at the system was this:

* There was a ppc64_cpu process stuck inside cpu_hotplug_begin(),
  called from _cpu_down(), from cpu_down().  This process was holding
  the cpu_add_remove_lock mutex, since cpu_down() calls
  cpu_maps_update_begin() before calling _cpu_down().  It was stuck
  there because cpu_hotplug.refcount == 1.

* There was a mdadm process trying to acquire the cpu_add_remove_lock
  mutex inside register_cpu_notifier(), called from
  raid5_alloc_percpu() in drivers/md/raid5.c.  That process had
  previously called get_online_cpus, which is why cpu_hotplug.refcount
  was 1.

Result: deadlock.

Thus it seems that the following code is not safe:

	get_online_cpus();
	register_cpu_notifier(&...);
	put_online_cpus();

There are a few different places that do that sort of thing; besides
drivers/md/raid5.c, there are instances in arch/x86/kernel/cpu,
arch/x86/oprofile, drivers/cpufreq/acpi-cpufreq.c,
drivers/oprofile/nmi_timer_int.c and kernel/trace/ring_buffer.c.

My question is this: is it reasonable to call register_cpu_notifier
inside a get/put_online_cpus block?  If so, the deadlock needs to be
fixed; if not, the callers need to be fixed, and the restriction
should be documented.

Regards,
Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ