lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20191210225645.GW2889@paulmck-ThinkPad-P72>
Date:   Tue, 10 Dec 2019 14:56:45 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Tejun Heo <tj@...nel.org>, jiangshanlai@...il.com,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Workqueues splat due to ending up on wrong CPU

On Tue, Dec 10, 2019 at 10:08:39AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 09, 2019 at 10:59:08AM -0800, Paul E. McKenney wrote:
> > And it survived!  ;-)
> > 
> > Peter, could I please have your Signed-off-by?  Or take my Tested-by if
> > you would prefer to send it up some other way.
> 
> How's this?

Very good, thank you!  I have queued it on -rcu, but please let me
know if you would rather that it go in via some other path.

							Thanx, Paul

> ---
> Subject: cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
> From: Peter Zijlstra <peterz@...radead.org>
> Date: Tue Dec 10 09:34:54 CET 2019
> 
> Paul reported a very sporadic, rcutorture induced, workqueue failure.
> When the planets align, the workqueue rescuer's self-migrate fails and
> then triggers a WARN for running a work on the wrong CPU.
> 
> Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call
> could be ignored! When stopper->enabled is false, stop_machine will
> insta complete the work, without actually doing the work. Worse, it
> will not WARN about this (we really should fix this).
> 
> It turns out there is a small window where a freshly online'ed CPU is
> marked 'online' but doesn't yet have the stopper task running:
> 
> 	BP				AP
> 
> 	bringup_cpu()
> 	  __cpu_up(cpu, idle)	 -->	start_secondary()
> 					...
> 					cpu_startup_entry()
> 	  bringup_wait_for_ap()
> 	    wait_for_ap_thread() <--	  cpuhp_online_idle()
> 					  while (1)
> 					    do_idle()
> 
> 					... available to run kthreads ...
> 
> 	    stop_machine_unpark()
> 	      stopper->enable = true;
> 
> Close this by moving the stop_machine_unpark() into
> cpuhp_online_idle(), such that the stopper thread is ready before we
> start the idle loop and schedule.
> 
> Reported-by: "Paul E. McKenney" <paulmck@...nel.org>
> Debugged-by: Tejun Heo <tj@...nel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -525,8 +525,7 @@ static int bringup_wait_for_ap(unsigned
>  	if (WARN_ON_ONCE((!cpu_online(cpu))))
>  		return -ECANCELED;
>  
> -	/* Unpark the stopper thread and the hotplug thread of the target cpu */
> -	stop_machine_unpark(cpu);
> +	/* Unpark the hotplug thread of the target cpu */
>  	kthread_unpark(st->thread);
>  
>  	/*
> @@ -1089,8 +1088,8 @@ void notify_cpu_starting(unsigned int cp
>  
>  /*
>   * Called from the idle task. Wake up the controlling task which brings the
> - * stopper and the hotplug thread of the upcoming CPU up and then delegates
> - * the rest of the online bringup to the hotplug thread.
> + * hotplug thread of the upcoming CPU up and then delegates the rest of the
> + * online bringup to the hotplug thread.
>   */
>  void cpuhp_online_idle(enum cpuhp_state state)
>  {
> @@ -1100,6 +1099,12 @@ void cpuhp_online_idle(enum cpuhp_state
>  	if (state != CPUHP_AP_ONLINE_IDLE)
>  		return;
>  
> +	/*
> +	 * Unpart the stopper thread before we start the idle loop (and start
> +	 * scheduling); this ensures the stopper task is always available.
> +	 */
> +	stop_machine_unpark(smp_processor_id());
> +
>  	st->state = CPUHP_AP_ONLINE_IDLE;
>  	complete_ap_thread(st, true);
>  }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ