linux-kernel - Re: One potential issue with concurrent execution of RCU callbacks...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20201210005056.GM2657@paulmck-ThinkPad-P72>
Date:   Wed, 9 Dec 2020 16:50:56 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Boqun Feng <boqun.feng@...il.com>
Cc:     Frederic Weisbecker <frederic@...nel.org>, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: One potential issue with concurrent execution of RCU callbacks...

On Wed, Dec 09, 2020 at 10:14:49AM +0800, Boqun Feng wrote:
> Hi Frederic,
> 
> On Tue, Dec 08, 2020 at 11:04:38PM +0100, Frederic Weisbecker wrote:
> > On Tue, Dec 08, 2020 at 10:24:09AM -0800, Paul E. McKenney wrote:
> > > > It reduces the code scope running with BH disabled.
> > > > Also narrowing down helps to understand what it actually protects.
> > > 
> > > I thought that you would call out unnecessarily delaying other softirq
> > > handlers.  ;-)
> > > 
> > > But if such delays are a problem (and they might well be), then to
> > > avoid them on non-rcu_nocb CPUs would instead/also require changing the
> > > early-exit checks to check for other pending softirqs to the existing
> > > checks involving time, need_resched, and idle.  At which point, entering and
> > > exiting BH-disabled again doesn't help, other than your point about the
> > > difference in BH-disabled scopes on rcu_nocb and non-rcu_nocb CPUs.
> > 
> > Wise observation!
> > 
> > > 
> > > Would it make sense to exit rcu_do_batch() if more than some amount
> > > of time had elapsed and there was some non-RCU softirq pending?
> > > 
> > > My guess is that the current tlimit checks in rcu_do_batch() make this
> > > unnecessary.
> > 
> > Right and nobody has complained about it so far.
> > 
> > But I should add a comment explaining the reason for the BH-disabled
> > section in my series.
> > 
> 
> Some background for the original question: I'm revisiting the wait
> context checking feature of lockdep (which can detect bugs like
> acquiring a spinlock_t lock inside a raw_spinlock_t), I've post my first
> version:
> 
> 	https://lore.kernel.org/lkml/20201208103112.2838119-1-boqun.feng@gmail.com/	
> 
> , and will surely copy you in the next version ;-)
> 
> The reason I asked for the RCU callback context requirement is that we
> have the virtual lock (rcu_callback_map) that marks a RCU callback
> context, so if RCU callback contexts have special restrictions on the
> locking usage inside, we can use the wait context checking to do the
> check (like what I did in the patch #3 of the above series).
> 
> My current summary is that since in certain configs (use_softirq is true
> and nocb is disabled) RCU callbacks are executed in a softirq context,
> so the least requirement for any RCU callbacks is they need to obey the
> rules in softirq contexts. And yes, I'm aware that in some configs, RCU
> callbacks are not executed in a softirq context (sometimes, even the BH
> is not disabled), but we need to make all the callback work in the
> "worst" (or strictest) case (callbacks executing in softirq contexts).
> Currently, the effect of using wait context for rcu_callback_map in my
> patchset is that lockdep will complain if a RCU callback use a mutex or
> other sleepable locks, but using spinlock_t (even in PREEMPT_RT) won't
> cause lockdep to complain. Am I getting this correct?

It matches what I know.  And yes, in PREEMPT_RT, softirq is preemptible,
allowing spinlock_t to be used, but there are restrictions that
lopckdep enforces.  I am not going to claim to remember the exact set
of restrictions.

							Thanx, Paul