linux-kernel - Re: WARNING: possible circular locking dependency detected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170828150617.wp6hh7flfjjjsu4m@hirez.programming.kicks-ass.net>
Date:   Mon, 28 Aug 2017 17:06:17 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: WARNING: possible circular locking dependency detected

On Mon, Aug 28, 2017 at 04:58:08PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 25, 2017 at 12:03:04PM +0200, Borislav Petkov wrote:
> > Hey,
> > 
> > tglx says I have something for ya :-)
> > 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.13.0-rc6+ #1 Not tainted
> > ------------------------------------------------------
> > watchdog/3/27 is trying to acquire lock:
> >  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8100c489>] release_ds_buffers+0x29/0xd0
> > 
> > but now in release context of a crosslock acquired at the following:
> >  ((complete)&self->parked){+.+.}, at: [<ffffffff810895f6>] kthread_park+0x46/0x60
> 
> 
> So I'm thinking this one is an actual deadlock.
> 
> So, as far as I can tell this ends up being:
> 
> CPU0                    CPU1
> 
> (smpboot_regiser_percpu_thread_cpumask)
> 
> get_online_cpus()
> __smpboot_create_thread()
>   kthread_park();
>     wait_for_completion(&X)
> 
> 
>                         (smpboot_thread_fn)
> 
>                         ->park() := watchdog_disable()
>                           watchdog_nmi_disable()
>                             perf_event_release_kernel();
>                               put_event()
>                                 _free_event()
>                                   ->destroy() := hw_perf_event_destroy()
>                                     x86_release_hardware()
>                                       release_ds_buffers()
>                                         get_online_cpus()
> 
> 
>                         kthread_parkme()
>                           complete(&X)
> 
> 
> 
> So CPU0 holds cpus_hotplug_lock while wait_for_completion() and CPU1
> needs to acquire before complete().
> 
> So if, in between, CPU2 does down_write(), things will get unstuck.
> 
> What's worse, there's also:
> 
>   cpus_write_lock()
>     ...
>       takedown_cpu()
>         smpboot_park_threads()
> 	  smpboot_park_thread()
> 	    kthread_park()
> 	      ->park() := watchdog_disable()
> 		watchdog_nmi_disable()
> 		  perf_event_release_kernel();
> 		    put_event()
> 		      _free_event()
> 			->destroy() := hw_perf_event_destroy()
> 			  x86_release_hardware()
> 			    release_ds_buffers()
> 			      get_online_cpus()
> 
> which as far as I can tell, spells instant deadlock..

Aah, but that latter will never happen.. because each CPU will have a
&pmc_refcount and we can't unplug _all_ CPUs.

So the first one will only ever happen on boot, where we park() the very
first watchdog thread and is a potential deadlock, but won't happen
because nobody is around to do down_write() just yet.

argh!