lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121007171109.GA23613@shutemov.name>
Date:	Sun, 7 Oct 2012 20:11:09 +0300
From:	"Kirill A. Shutemov" <kirill@...temov.name>
To:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org, Dipankar Sarma <dipankar@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Steffen Klassert <steffen.klassert@...unet.com>,
	linux-crypto@...r.kernel.org
Subject: Re: Deadlock on poweroff

On Sun, Oct 07, 2012 at 10:35:01PM +0530, Srivatsa S. Bhat wrote:
> On 10/07/2012 10:20 PM, Kirill A. Shutemov wrote:
> > On Sun, Oct 07, 2012 at 09:03:11AM -0700, Paul E. McKenney wrote:
> >> On Sun, Oct 07, 2012 at 05:47:11AM +0300, Kirill A. Shutemov wrote:
> >>> Hi Paul and all,
> >>>
> >>> With commit 755609a9087fa983f567dc5452b2fa7b089b591f I've got deadlock on
> >>> poweroff.
> >>>
> >>> It guess it happens because of race for cpu_hotplug.lock:
> >>>
> >>> 	CPU A					CPU B
> >>> disable_nonboot_cpus()
> >>> _cpu_down()
> >>> cpu_hotplug_begin()
> >>>  mutex_lock(&cpu_hotplug.lock);
> >>> __cpu_notify()
> >>> padata_cpu_callback()
> >>> __padata_remove_cpu()
> >>> padata_replace()
> >>> synchronize_rcu()
> >>> 					rcu_gp_kthread()
> >>> 					get_online_cpus();
> >>> 					mutex_lock(&cpu_hotplug.lock);
> >>>
> >>> Have you seen the issue before?
> >>
> >> This is a new one for me.  Does the following (very lightly tested)
> >> patch help?
> > 
> > Works for me. Thanks.
> > 
> 
> Could you share the patch please? Looks like it didn't hit the mailing
> lists.. 

Sure. Here's original  mail from Paul:

Date: Sun, 7 Oct 2012 09:03:11 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: "Kirill A. Shutemov" <kirill@...temov.name>
Cc: linux-kernel@...r.kernel.org, Dipankar Sarma <dipankar@...ibm.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Steffen Klassert <steffen.klassert@...unet.com>,
        ".linux-crypto"@vger.kernel.org
Subject: Re: Deadlock on poweroff
Message-ID: <20121007160311.GE2485@...ux.vnet.ibm.com>
Reply-To: paulmck@...ux.vnet.ibm.com
References: <20121007024711.GA21403@...temov.name>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20121007024711.GA21403@...temov.name>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12100716-7408-0000-0000-000009194B17
Status: RO
Content-Length: 6055
Lines: 173

On Sun, Oct 07, 2012 at 05:47:11AM +0300, Kirill A. Shutemov wrote:
> Hi Paul and all,
> 
> With commit 755609a9087fa983f567dc5452b2fa7b089b591f I've got deadlock on
> poweroff.
> 
> It guess it happens because of race for cpu_hotplug.lock:
> 
> 	CPU A					CPU B
> disable_nonboot_cpus()
> _cpu_down()
> cpu_hotplug_begin()
>  mutex_lock(&cpu_hotplug.lock);
> __cpu_notify()
> padata_cpu_callback()
> __padata_remove_cpu()
> padata_replace()
> synchronize_rcu()
> 					rcu_gp_kthread()
> 					get_online_cpus();
> 					mutex_lock(&cpu_hotplug.lock);
> 
> Have you seen the issue before?

This is a new one for me.  Does the following (very lightly tested)
patch help?

							Thanx, Paul

------------------------------------------------------------------------

rcu: Grace-period initialization excludes only RCU notifier

Kirill noted the following deadlock cycle on shutdown involving padata:

> With commit 755609a9087fa983f567dc5452b2fa7b089b591f I've got deadlock on
> poweroff.
>
> It guess it happens because of race for cpu_hotplug.lock:
>
>       CPU A                                   CPU B
> disable_nonboot_cpus()
> _cpu_down()
> cpu_hotplug_begin()
>  mutex_lock(&cpu_hotplug.lock);
> __cpu_notify()
> padata_cpu_callback()
> __padata_remove_cpu()
> padata_replace()
> synchronize_rcu()
>                                       rcu_gp_kthread()
>                                       get_online_cpus();
>                                       mutex_lock(&cpu_hotplug.lock);

It would of course be good to eliminate grace-period delays from
CPU-hotplug notifiers, but that is a separate issue.  Deadlock is
not an appropriate diagnostic for excessive CPU-hotplug latency.

Fortunately, grace-period initialization does not actually need to
exclude all of the CPU-hotplug operation, but rather only RCU's own
CPU_UP_PREPARE and CPU_DEAD CPU-hotplug notifiers.  This commit therefore
introduces a new per-rcu_state onoff_mutex that provides the required
concurrency control in place of the get_online_cpus() that was previously
in rcu_gp_init().

Reported-by: "Kirill A. Shutemov" <kirill@...temov.name>
Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index fb63d7b..5eece12 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -74,6 +74,7 @@ static struct lock_class_key rcu_fqs_class[RCU_NUM_LVLS];
 	.orphan_nxttail = &sname##_state.orphan_nxtlist, \
 	.orphan_donetail = &sname##_state.orphan_donelist, \
 	.barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \
+	.onoff_mutex = __MUTEX_INITIALIZER(sname##_state.onoff_mutex), \
 	.name = #sname, \
 }
 
@@ -1229,7 +1230,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
 	raw_spin_unlock_irq(&rnp->lock);
 
 	/* Exclude any concurrent CPU-hotplug operations. */
-	get_online_cpus();
+	mutex_lock(&rsp->onoff_mutex);
 
 	/*
 	 * Set the quiescent-state-needed bits in all the rcu_node
@@ -1266,7 +1267,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
 		cond_resched();
 	}
 
-	put_online_cpus();
+	mutex_unlock(&rsp->onoff_mutex);
 	return 1;
 }
 
@@ -1754,6 +1755,7 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 	/* Remove the dead CPU from the bitmasks in the rcu_node hierarchy. */
 
 	/* Exclude any attempts to start a new grace period. */
+	mutex_lock(&rsp->onoff_mutex);
 	raw_spin_lock_irqsave(&rsp->onofflock, flags);
 
 	/* Orphan the dead CPU's callbacks, and adopt them if appropriate. */
@@ -1798,6 +1800,7 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 	init_callback_list(rdp);
 	/* Disallow further callbacks on this CPU. */
 	rdp->nxttail[RCU_NEXT_TAIL] = NULL;
+	mutex_unlock(&rsp->onoff_mutex);
 }
 
 #else /* #ifdef CONFIG_HOTPLUG_CPU */
@@ -2708,6 +2711,9 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
 	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
+	/* Exclude new grace periods. */
+	mutex_lock(&rsp->onoff_mutex);
+
 	/* Set up local state, ensuring consistent view of global state. */
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	rdp->beenonline = 1;	 /* We have now been online. */
@@ -2722,14 +2728,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
 	rcu_prepare_for_idle_init(cpu);
 	raw_spin_unlock(&rnp->lock);		/* irqs remain disabled. */
 
-	/*
-	 * A new grace period might start here.  If so, we won't be part
-	 * of it, but that is OK, as we are currently in a quiescent state.
-	 */
-
-	/* Exclude any attempts to start a new GP on large systems. */
-	raw_spin_lock(&rsp->onofflock);		/* irqs already disabled. */
-
 	/* Add CPU to rcu_node bitmasks. */
 	rnp = rdp->mynode;
 	mask = rdp->grpmask;
@@ -2753,8 +2751,9 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
 		raw_spin_unlock(&rnp->lock); /* irqs already disabled. */
 		rnp = rnp->parent;
 	} while (rnp != NULL && !(rnp->qsmaskinit & mask));
+	local_irq_restore(flags);
 
-	raw_spin_unlock_irqrestore(&rsp->onofflock, flags);
+	mutex_unlock(&rsp->onoff_mutex);
 }
 
 static void __cpuinit rcu_prepare_cpu(int cpu)
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 5faf05d..a240f03 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -394,11 +394,17 @@ struct rcu_state {
 	struct rcu_head **orphan_donetail;	/* Tail of above. */
 	long qlen_lazy;				/* Number of lazy callbacks. */
 	long qlen;				/* Total number of callbacks. */
+	/* End of fields guarded by onofflock. */
+
+	struct mutex onoff_mutex;		/* Coordinate hotplug & GPs. */
+
 	struct mutex barrier_mutex;		/* Guards barrier fields. */
 	atomic_t barrier_cpu_count;		/* # CPUs waiting on. */
 	struct completion barrier_completion;	/* Wake at barrier end. */
 	unsigned long n_barrier_done;		/* ++ at start and end of */
 						/*  _rcu_barrier(). */
+	/* End of fields guarded by barrier_mutex. */
+
 	unsigned long jiffies_force_qs;		/* Time at which to invoke */
 						/*  force_quiescent_state(). */
 	unsigned long n_force_qs;		/* Number of calls to */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ