linux-kernel - Re: One potential issue with concurrent execution of RCU callbacks...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201209021449.GK3025@boqun-archlinux>
Date:   Wed, 9 Dec 2020 10:14:49 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     "Paul E. McKenney" <paulmck@...nel.org>, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: One potential issue with concurrent execution of RCU callbacks...

Hi Frederic,

On Tue, Dec 08, 2020 at 11:04:38PM +0100, Frederic Weisbecker wrote:
> On Tue, Dec 08, 2020 at 10:24:09AM -0800, Paul E. McKenney wrote:
> > > It reduces the code scope running with BH disabled.
> > > Also narrowing down helps to understand what it actually protects.
> > 
> > I thought that you would call out unnecessarily delaying other softirq
> > handlers.  ;-)
> > 
> > But if such delays are a problem (and they might well be), then to
> > avoid them on non-rcu_nocb CPUs would instead/also require changing the
> > early-exit checks to check for other pending softirqs to the existing
> > checks involving time, need_resched, and idle.  At which point, entering and
> > exiting BH-disabled again doesn't help, other than your point about the
> > difference in BH-disabled scopes on rcu_nocb and non-rcu_nocb CPUs.
> 
> Wise observation!
> 
> > 
> > Would it make sense to exit rcu_do_batch() if more than some amount
> > of time had elapsed and there was some non-RCU softirq pending?
> > 
> > My guess is that the current tlimit checks in rcu_do_batch() make this
> > unnecessary.
> 
> Right and nobody has complained about it so far.
> 
> But I should add a comment explaining the reason for the BH-disabled
> section in my series.
> 

Some background for the original question: I'm revisiting the wait
context checking feature of lockdep (which can detect bugs like
acquiring a spinlock_t lock inside a raw_spinlock_t), I've post my first
version:

	https://lore.kernel.org/lkml/20201208103112.2838119-1-boqun.feng@gmail.com/	

, and will surely copy you in the next version ;-)

The reason I asked for the RCU callback context requirement is that we
have the virtual lock (rcu_callback_map) that marks a RCU callback
context, so if RCU callback contexts have special restrictions on the
locking usage inside, we can use the wait context checking to do the
check (like what I did in the patch #3 of the above series).

My current summary is that since in certain configs (use_softirq is true
and nocb is disabled) RCU callbacks are executed in a softirq context,
so the least requirement for any RCU callbacks is they need to obey the
rules in softirq contexts. And yes, I'm aware that in some configs, RCU
callbacks are not executed in a softirq context (sometimes, even the BH
is not disabled), but we need to make all the callback work in the
"worst" (or strictest) case (callbacks executing in softirq contexts).
Currently, the effect of using wait context for rcu_callback_map in my
patchset is that lockdep will complain if a RCU callback use a mutex or
other sleepable locks, but using spinlock_t (even in PREEMPT_RT) won't
cause lockdep to complain. Am I getting this correct?

Regards,
Boqun

> Thanks.
> 
> > 
> > Thoughts?
> > 
> > 							Thanx, Paul