linux-kernel - Re: [tip: locking/core] lockdep: Fix lockdep recursion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201013193025.GA2424@paulmck-ThinkPad-P72>
Date:   Tue, 13 Oct 2020 12:30:25 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Boqun Feng <boqun.feng@...il.com>, Qian Cai <cai@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...nel.org>, x86 <x86@...nel.org>,
        linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
        Linux Next Mailing List <linux-next@...r.kernel.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>
Subject: Re: [tip: locking/core] lockdep: Fix lockdep recursion

On Tue, Oct 13, 2020 at 09:26:50AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 13, 2020 at 01:25:44PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 13, 2020 at 12:44:50PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 13, 2020 at 12:34:06PM +0200, Peter Zijlstra wrote:
> > > > On Mon, Oct 12, 2020 at 02:28:12PM -0700, Paul E. McKenney wrote:
> > > > > It is certainly an accident waiting to happen.  Would something like
> > > > > the following make sense?
> > > > 
> > > > Sadly no.
> > > > 
> > > > > ------------------------------------------------------------------------
> > > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index bfd38f2..52a63bc 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -4067,6 +4067,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > > > >  
> > > > >  	rnp = rdp->mynode;
> > > > >  	mask = rdp->grpmask;
> > > > > +	lockdep_off();
> > > > >  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > >  	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
> > > > >  	newcpu = !(rnp->expmaskinitnext & mask);
> > > > > @@ -4086,6 +4087,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > > > >  	} else {
> > > > >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > >  	}
> > > > > +	lockdep_on();
> > > > >  	smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
> > > > >  }
> > > > 
> > > > This will just shut it up, but will not fix the actual problem of that
> > > > spin-lock ending up in trace_lock_acquire() which relies on RCU which
> > > > isn't looking.
> > > > 
> > > > What we need here is to supress tracing not lockdep. Let me consider.
> > > 
> > > We appear to have a similar problem with rcu_report_dead(), it's
> > > raw_spin_unlock()s can end up in trace_lock_release() while we just
> > > killed RCU.
> > 
> > So we can deal with the explicit trace_*() calls like the below, but I
> > really don't like it much. It also doesn't help with function tracing.
> > This is really early/late in the hotplug cycle and should be considered
> > entry, we shouldn't be tracing anything here.
> > 
> > Paul, would it be possible to use a scheme similar to IRQ/NMI for
> > hotplug? That seems to mostly rely on atomic ops, not locks.
> 
> The rest of the rcu_node tree and the various grace-period/hotplug races
> makes that question non-trivial.  I will look into it, but I have no
> reason for optimism.
> 
> But there is only one way to find out...  ;-)

The aforementioned races get really ugly really fast.  So I do not
believe that a lockless approach is a strategy to win here.

But why not use something sort of like a sequence counter, but adapted
for local on-CPU use?  This should quiet the diagnostics for the full
time that RCU needs its locks.  Untested patch below.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1d42909..5b06886 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1152,13 +1152,15 @@ bool rcu_lockdep_current_cpu_online(void)
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 	bool ret = false;
+	unsigned long seq;
 
 	if (in_nmi() || !rcu_scheduler_fully_active)
 		return true;
 	preempt_disable_notrace();
 	rdp = this_cpu_ptr(&rcu_data);
 	rnp = rdp->mynode;
-	if (rdp->grpmask & rcu_rnp_online_cpus(rnp))
+	seq = READ_ONCE(rnp->ofl_seq) & ~0x1;
+	if (rdp->grpmask & rcu_rnp_online_cpus(rnp) || seq != READ_ONCE(rnp->ofl_seq))
 		ret = true;
 	preempt_enable_notrace();
 	return ret;
@@ -4065,6 +4067,8 @@ void rcu_cpu_starting(unsigned int cpu)
 
 	rnp = rdp->mynode;
 	mask = rdp->grpmask;
+	WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+	WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
 	newcpu = !(rnp->expmaskinitnext & mask);
@@ -4084,6 +4088,8 @@ void rcu_cpu_starting(unsigned int cpu)
 	} else {
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
+	WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+	WARN_ON_ONCE(rnp->ofl_seq & 0x1);
 	smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
 }
 
@@ -4111,6 +4117,8 @@ void rcu_report_dead(unsigned int cpu)
 
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
 	mask = rdp->grpmask;
+	WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+	WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
 	raw_spin_lock(&rcu_state.ofl_lock);
 	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
 	rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
@@ -4123,6 +4131,8 @@ void rcu_report_dead(unsigned int cpu)
 	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	raw_spin_unlock(&rcu_state.ofl_lock);
+	WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+	WARN_ON_ONCE(rnp->ofl_seq & 0x1);
 
 	rdp->cpu_started = false;
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 805c9eb..7d802b6 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -57,6 +57,7 @@ struct rcu_node {
 				/*  beginning of each grace period. */
 	unsigned long qsmaskinitnext;
 				/* Online CPUs for next grace period. */
+	unsigned long ofl_seq;	/* CPU-hotplug operation sequence count. */
 	unsigned long expmask;	/* CPUs or groups that need to check in */
 				/*  to allow the current expedited GP */
 				/*  to complete. */