lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180517140345.GI3803@linux.vnet.ibm.com>
Date:   Thu, 17 May 2018 07:03:45 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Mike Galbraith <efault@....de>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
        Michal Hocko <mhocko@...e.com>
Subject: Re: cpu stopper threads and load balancing leads to deadlock

On Tue, May 15, 2018 at 06:30:26AM +0200, Mike Galbraith wrote:
> On Thu, 2018-05-03 at 18:45 +0200, Peter Zijlstra wrote:
> > On Thu, May 03, 2018 at 09:12:31AM -0700, Paul E. McKenney wrote:
> > > On Thu, May 03, 2018 at 04:44:50PM +0200, Peter Zijlstra wrote:
> > > > On Thu, May 03, 2018 at 04:16:55PM +0200, Mike Galbraith wrote:
> > > > > On Thu, 2018-05-03 at 15:56 +0200, Peter Zijlstra wrote:
> > > > > > On Thu, May 03, 2018 at 03:32:39PM +0200, Mike Galbraith wrote:
> > > > > > 
> > > > > > > Dang.  With $subject fix applied as well..
> > > > > > 
> > > > > > That's a NO then... :-(
> > > > > 
> > > > > Could say who cares about oddball offline wakeup stat. <cringe>
> > > > 
> > > > Yeah, nobody.. but I don't want to have to change the wakeup code to
> > > > deal with this if at all possible. That'd just add conditions that are
> > > > 'always' false, except in this exceedingly rare circumstance.
> > > > 
> > > > So ideally we manage to tell RCU that it needs to pay attention while
> > > > we're doing this here thing, which is what I thought RCU_NONIDLE() was
> > > > about.
> > > 
> > > One straightforward approach would be to provide a arch-specific
> > > Kconfig option that tells notify_cpu_starting() not to bother invoking
> > > rcu_cpu_starting().  Then x86 selects this Kconfig option and invokes
> > > rcu_cpu_starting() itself early enough to avoid splats.
> > > 
> > > See the (untested, probably does not even build) patch below.
> > > 
> > > I have no idea where to insert either the "select" or the call to
> > > rcu_cpu_starting(), so I left those out.  I know that putting the
> > > call too early will cause trouble, but I have no idea what constitutes
> > > "too early".  :-/
> > 
> > Something like so perhaps? Mike, can you play around with that? Could
> > burn your granny and eat your cookies.
> 
> Did this get queued anywhere?

I have not queued it, but given Peter's Signed-off-by and your Tested-by
I would be happy to do so.

							Thanx, Paul

> > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> > index 7468de429087..07360523c3ce 100644
> > --- a/arch/x86/kernel/cpu/mtrr/main.c
> > +++ b/arch/x86/kernel/cpu/mtrr/main.c
> > @@ -793,6 +793,9 @@ void mtrr_ap_init(void)
> >  
> >  	if (!use_intel() || mtrr_aps_delayed_init)
> >  		return;
> > +
> > +	rcu_cpu_starting(smp_processor_id());
> > +
> >  	/*
> >  	 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
> >  	 * changed, but this routine will be called in cpu boot time,
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 2a734692a581..4dab46950fdb 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3775,6 +3775,8 @@ int rcutree_dead_cpu(unsigned int cpu)
> >  	return 0;
> >  }
> >  
> > +static DEFINE_PER_CPU(int, rcu_cpu_started);
> > +
> >  /*
> >   * Mark the specified CPU as being online so that subsequent grace periods
> >   * (both expedited and normal) will wait on it.  Note that this means that
> > @@ -3796,6 +3798,11 @@ void rcu_cpu_starting(unsigned int cpu)
> >  	struct rcu_node *rnp;
> >  	struct rcu_state *rsp;
> >  
> > +	if (per_cpu(rcu_cpu_started, cpu))
> > +		return;
> > +
> > +	per_cpu(rcu_cpu_started, cpu) = 1;
> > +
> >  	for_each_rcu_flavor(rsp) {
> >  		rdp = per_cpu_ptr(rsp->rda, cpu);
> >  		rnp = rdp->mynode;
> > @@ -3852,6 +3859,8 @@ void rcu_report_dead(unsigned int cpu)
> >  	preempt_enable();
> >  	for_each_rcu_flavor(rsp)
> >  		rcu_cleanup_dying_idle_cpu(cpu, rsp);
> > +
> > +	per_cpu(rcu_cpu_started, cpu) = 0;
> >  }
> >  
> >  /* Migrate the dead CPU's callbacks to the current CPU. */
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ