linux-kernel - Re: One potential issue with concurrent execution of RCU callbacks...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201208182409.GT2657@paulmck-ThinkPad-P72>
Date:   Tue, 8 Dec 2020 10:24:09 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     boqun.feng@...il.com, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: One potential issue with concurrent execution of RCU callbacks...

On Tue, Dec 08, 2020 at 06:52:30PM +0100, Frederic Weisbecker wrote:
> On Tue, Dec 08, 2020 at 09:19:27AM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 08, 2020 at 04:54:57PM +0100, Frederic Weisbecker wrote:
> > > On Tue, Dec 08, 2020 at 06:58:10AM -0800, Paul E. McKenney wrote:
> > > > Hello, Frederic,
> > > > 
> > > > Boqun just asked if RCU callbacks ran in BH-disabled context to avoid
> > > > concurrent execution of the same callback.  Of course, this raises the
> > > > question of whether a self-posting callback can have two instances of
> > > > itself running concurrently while a CPU is in the process of transitioning
> > > > between softirq and rcuo invocation of callbacks.
> > > > 
> > > > I believe that the answer is "no" because BH-disabled context is
> > > > an implicit RCU read-side critical section.  Therefore, the initial
> > > > invocation of the RCU callback must complete in order for a new grace
> > > > period to complete, and a new grace period must complete before the
> > > > second invocation of that same callback to start.
> > > > 
> > > > Does that make sense, or am I missing something?
> > > 
> > > Sounds like a good explanation. But then why are we actually calling
> > > the entire rcu_do_batch() under BH-disabled context? Was it to quieten
> > > lockdep against rcu_callback_map ?
> > 
> > Inertia and lack of complaints about it.  ;-)
> > 
> > Plus when called from softirq, neither local_bh_disable() nor
> > rcu_read_lock() is necessary, and so represents pointless overhead.
> > 
> > > Wouldn't rcu_read_lock() around callbacks invocation be enough? Or is
> > > there another reason for the BH-disabled context that I'm missing?
> > 
> > There are likely to be callback functions that use spin_lock() instead
> > of spin_lock_bh() because they know that they are invoked in BH-disabled
> > context.
> 
> Ah right. So perhaps we can keep local_bh_disable() instead.
> 
> > But what does this change help?
> 
> It reduces the code scope running with BH disabled.
> Also narrowing down helps to understand what it actually protects.

I thought that you would call out unnecessarily delaying other softirq
handlers.  ;-)

But if such delays are a problem (and they might well be), then to
avoid them on non-rcu_nocb CPUs would instead/also require changing the
early-exit checks to check for other pending softirqs to the existing
checks involving time, need_resched, and idle.  At which point, entering and
exiting BH-disabled again doesn't help, other than your point about the
difference in BH-disabled scopes on rcu_nocb and non-rcu_nocb CPUs.

Would it make sense to exit rcu_do_batch() if more than some amount
of time had elapsed and there was some non-RCU softirq pending?

My guess is that the current tlimit checks in rcu_do_batch() make this
unnecessary.

Thoughts?

							Thanx, Paul