linux-kernel - Re: Deadlock between cpu_hotplug_begin and cpu_add_remove

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <52DF81B0.7020700@linux.vnet.ibm.com>
Date:	Wed, 22 Jan 2014 14:00:40 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	Paul Mackerras <paulus@...ba.org>
CC:	linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...nel.org>,
	Oleg Nesterov <oleg@...hat.com>, Tejun Heo <tj@...nel.org>,
	Michel Lespinasse <walken@...gle.com>, ego@...ux.vnet.ibm.com,
	"rusty@...tcorp.com.au" <rusty@...tcorp.com.au>,
	Thomas Gleixner <tglx@...utronix.de>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Subject: Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock

Hi Paul,

On 01/22/2014 11:22 AM, Paul Mackerras wrote:
> This arises out of a report from a tester that offlining a CPU never
> finished on a system they were testing.  This was on a POWER8 running
> a 3.10.x kernel, but the issue is still present in mainline AFAICS.
> 
> What I found when I looked at the system was this:
> 
> * There was a ppc64_cpu process stuck inside cpu_hotplug_begin(),
>   called from _cpu_down(), from cpu_down().  This process was holding
>   the cpu_add_remove_lock mutex, since cpu_down() calls
>   cpu_maps_update_begin() before calling _cpu_down().  It was stuck
>   there because cpu_hotplug.refcount == 1.
> 
> * There was a mdadm process trying to acquire the cpu_add_remove_lock
>   mutex inside register_cpu_notifier(), called from
>   raid5_alloc_percpu() in drivers/md/raid5.c.  That process had
>   previously called get_online_cpus, which is why cpu_hotplug.refcount
>   was 1.
> 
> Result: deadlock.
> 
> Thus it seems that the following code is not safe:
> 
> 	get_online_cpus();
> 	register_cpu_notifier(&...);
> 	put_online_cpus();
>

Yes, this is a known problem, and I had proposed an elaborate solution
some time ago: https://lkml.org/lkml/2012/3/1/39
But that won't work for all cases, so that solution is a no-go.

If we forget the CPU_POST_DEAD stage for a moment, we can just replace the
calls to cpu_maps_update_begin/done() with get/put_online_cpus() in both
register_cpu_notifier() as well as unregister_cpu_notifier(). After all,
the callback registration code needs to synchronize only with the actual
hotplug operations, and not the update of cpu-maps. So they don't really
need to acquire the cpu_add_remove_lock.

However, CPU_POST_DEAD notifications run with the hotplug lock dropped.
So we can't simply replace cpu_add_remove_lock with hotplug lock in the
registration routines, because notifier invocations and notifier registration
needs to be synchronized.

Hmm...

> There are a few different places that do that sort of thing; besides
> drivers/md/raid5.c, there are instances in arch/x86/kernel/cpu,
> arch/x86/oprofile, drivers/cpufreq/acpi-cpufreq.c,
> drivers/oprofile/nmi_timer_int.c and kernel/trace/ring_buffer.c.
> 
> My question is this: is it reasonable to call register_cpu_notifier
> inside a get/put_online_cpus block?

Ideally, we would want that to work. Because there is no other race-free
way of registering a notifier.

>  If so, the deadlock needs to be
> fixed; if not, the callers need to be fixed, and the restriction
> should be documented.

Fixing the callers is a last resort. I'm thinking of ways to fix the
deadlock itself, and allow the callers to call register_cpu_notifier
within a get/put_online_cpus() block...

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/