linux-kernel - Re: [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v9496v06.mognet@arm.com>
Date:   Fri, 13 Aug 2021 19:48:41 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     paulmck@...nel.org
Cc:     linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        rcu@...r.kernel.org, linux-rt-users@...r.kernel.org,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Steven Rostedt <rostedt@...dmis.org>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Frederic Weisbecker <frederic@...nel.org>,
        Josh Triplett <josh@...htriplett.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Vincenzo Frascino <vincenzo.frascino@....com>,
        Steven Price <steven.price@....com>,
        Ard Biesheuvel <ardb@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Mike Galbraith <efault@....de>
Subject: Re: [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT

On 12/08/21 17:20, Paul E. McKenney wrote:
> On Wed, Aug 11, 2021 at 09:13:53PM +0100, Valentin Schneider wrote:
>> rcu_core() preemptability considerations
>> ========================================
>> 
>> As pointed out by Paul [2], keeping rcu_check_quiescent_state() preemptible
>> (which is the case under CONFIG_PREEMPT_RT) requires some consideration.
>> 
>> note_gp_changes() itself runs with irqs off, and enters
>> __note_gp_changes() with rnp->lock held (raw_spinlock), thus is safe vs
>> preemption.
>> 
>> rdp->core_needs_qs *could* change after being read by the RCU core
>> kthread if it then gets preempted. Consider, with
>> CONFIG_RCU_STRICT_GRACE_PERIOD:
>> 
>>   rcuc/x                                   task_foo
>> 
>>     rcu_check_quiescent_state()
>>     `\
>>       rdp->core_needs_qs == true
>> 			     <PREEMPT>
>> 					   rcu_read_unlock()
>> 					   `\
>> 					     rcu_preempt_deferred_qs_irqrestore()
>> 					     `\
>> 					       rcu_report_qs_rdp()
>> 					       `\
>> 						 rdp->core_needs_qs := false;
>> 
>> This would let rcuc/x's rcu_check_quiescent_state() proceed further down to
>> rcu_report_qs_rdp(), but if task_foo's earlier rcu_report_qs_rdp()
>> invocation would have cleared the rdp grpmask from the rnp mask, so
>> rcuc/x's invocation would simply bail.
>> 
>> Since rcu_report_qs_rdp() can be safely invoked, even if rdp->core_needs_qs
>> changed, it appears safe to keep rcu_check_quiescent_state() preemptible.
>
> Another concern...
>
> During the preemption of rcu_check_quiescent_state() someone might report
> a quiescent state on behalf of CPU x (perhaps due to its having recently
> been idle) and then the RCU grace-period kthread might start running on
> CPU x, where it might initialize a new grace period in rcu_gp_init().
> It can then invoke __note_gp_changes(), also on CPU x.
>

(this is me "writing out loud" to make sure I can follow)

I take it the preemption of rcuc/x itself would count as a quiescent state,
if the current grace period started before rcuc/x's rcu_read_lock()
(rcuc/x would end up in ->blkd_tasks but wouldn't affect ->gp_tasks).

Then if we get something to run rcu_report_qs_rnp() with a mask spanning
CPU x - I think the GP kthread doing quiescent state forcing might fit the
bill - we'll let the GP kthread initialize a new GP.

> If preempted as shown above just after checking >core_needs_qs, the
> ->cpu_no_qs.b.norm field will be set by the grace-period kthread, which
> will cause the rcu_check_quiescent_state() function's subsequent check
> of ->cpu_no_qs.b.norm to take an early exit.  So OK here.
>

Right

> On the other hand, if preempted just after the rcu_check_quiescent_state()
> function's check of ->cpu_no_qs.b.norm, the later invocation of
> rcu_report_qs_rdp() should take an early exit due to ->gp_seq mismatch.
> So OK here.
>

If, as described in your scenario above, the GP kthread has preempted
rcuc/x, initialized a new GP and has run __note_gp_changes() (on CPU x),
then wouldn't the rdp->gp_seq and rnp->gp_seq match when rcuc/x gets to run
again?

And because we would've then had a context switch between the GP kthread
and rcuc/x, we would've noted a quiescent state for CPU x, which would let
rcuc/x's rcu_report_qs_rdp() continue - or have I erred on the way there?

> However, this should be added to the commit log.  Might be a big commit
> log, but mass storage is cheap these days.  ;-)
>

No objections here!

> This needs a review of each and every manipulation of ->core_needs_qs
> and ->cpu_no_qs.b.norm.  For example, the preemptions will cause the
> scheduler to invoke RCU's context-switch hooks, which also mess with
> ->cpu_no_qs.b.norm.  I can get to that some time next week (or tomorrow,
> if things go better than expected), but it would be good for you (and
> others) to check as well.
>

I have some notes scribbled down regarding those that I need to go through
again, but that won't be before a week's time - I'll be away next week.

> Frederic should look this over, but I am taking a quick pass in the
> meantime.  Please see below.
>

Thanks!

>> @@ -210,6 +210,8 @@ struct rcu_data {
>>  	struct timer_list nocb_timer;	/* Enforce finite deferral. */
>>  	unsigned long nocb_gp_adv_time;	/* Last call_rcu() CB adv (jiffies). */
>>  
>> +	local_lock_t nocb_local_lock;
>
> Should this go near the beginning of the structure, given that code
> paths taking this lock tend to access ->cpu_no_qs, ->core_needs_qs,
> and so on?
>
> Given that it is used to protect core processing (not just offloaded
> callbacks), might ->core_local_lock be a better name?
>

It would still have to be in a #define CONFIG_RCU_NOCB_CPU region IMO, as
it only exists to protect access of the offloading state - hence the name,
but I'm not particularly attached to it.

> Please keep in mind that you can build kernels that offload callbacks
> but that still use softirq for RCU core processing.  And vice versa,
> that is, kernels that use rcuc kthreads but do not offload callbacks.
>

AFAICT the problem only stands if core processing becomes preemptible,
which is only the case under PREEMPT_RT (at least for now), and that implies
core processing done purely via kthreads.

>> +
>>  	/* The following fields are used by call_rcu, hence own cacheline. */
>>  	raw_spinlock_t nocb_bypass_lock ____cacheline_internodealigned_in_smp;
>>  	struct rcu_cblist nocb_bypass;	/* Lock-contention-bypass CB list. */
>> @@ -445,6 +447,8 @@ static void rcu_nocb_unlock(struct rcu_data *rdp);
>>  static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
>>  				       unsigned long flags);
>>  static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
>> +static void rcu_nocb_local_lock(struct rcu_data *rdp);
>> +static void rcu_nocb_local_unlock(struct rcu_data *rdp);
>>  #ifdef CONFIG_RCU_NOCB_CPU
>>  static void __init rcu_organize_nocb_kthreads(void);
>>  #define rcu_nocb_lock_irqsave(rdp, flags)				\
>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>> index 0ff5e4fb933e..11c4ff00afde 100644
>> --- a/kernel/rcu/tree_plugin.h
>> +++ b/kernel/rcu/tree_plugin.h
>> @@ -21,6 +21,17 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
>>  	return lockdep_is_held(&rdp->nocb_lock);
>>  }
>>  
>> +static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp)
>> +{
>> +	return lockdep_is_held(
>> +#ifdef CONFIG_PREEMPT_RT
>> +		&rdp->nocb_local_lock.lock
>> +#else
>> +		&rdp->nocb_local_lock
>> +#endif
>
> It would be good if this was abstracted.  Or is this the only place
> in the kernel that needs this #ifdef?  Maybe a lockdep_is_held_rt()
> or lockdep_is_held_local(), as the case may be?
>

It does look like the first/only place that tries to access a local_lock's
dep_map regardless of CONFIG_PREEMPT_RT. Looking at it some more, I'm not
sure if it's really useful: !PREEMPT_RT local_locks disable preemption, so
the preemption check in rcu_rdp_is_offloaded() will short circuit the
lockdep check when !PREEMPT_RT...

One thing I should perhaps point out - the local lock is only really needed
for PREEMPT_RT; for !PREEMPT_RT it will just disable preemption, and it's
only used in places that *already* disabled preemption (when !PREEMPT_RT).
I initially gated the lock under CONFIG_PREEMPT_RT, but changed that as I
found it reduced the ifdeffery.

>> +	);
>> +}
>> +
>>  static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>>  {
>>  	/* Race on early boot between thread creation and assignment */
>> @@ -38,7 +49,10 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp)
>>  {
>>  	return 0;
>>  }
>> -
>> +static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp)
>> +{
>> +	return 0;
>
> This is backwards of normal lockdep practice, which defaults to locks
> always held.  And that will be what happens once lockdep has detected
> its first deadlock, correct?  At which point, this function and its
> earlier instance will be in conflict.
>
> Or is there some subtle reason why this conflict would be OK?
>

This follows the !CONFIG_RCU_NOCB definition of rcu_lockdep_is_held_nocb(),
which looked fine to me.

Actually with the way I wrote rcu_local_offload_access_safe(), we don't
even access that function when !CONFIG_RCU_NOCB...

>> +}
>>  static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>>  {
>>  	return false;
>> @@ -46,23 +60,44 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp)
>>  
>>  #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
>>  
>> +/*
>> + * Is a local read of the rdp's offloaded state safe and stable?
>> + * See rcu_nocb_local_lock() & family.
>> + */
>> +static inline bool rcu_local_offload_access_safe(struct rcu_data *rdp)
>> +{
>> +	if (!preemptible())
>> +		return true;
>> +
>> +	if (!migratable()) {
>> +		if (!IS_ENABLED(CONFIG_RCU_NOCB))
>
> Do we also need to consult the use_softirq module parameter that controls
> whether or not there are rcuc kthreads?
>
> Actually, if !IS_ENABLED(CONFIG_RCU_NOCB) then rcu_rdp_is_offloaded()
> can simply return false without bothering with the RCU_LOCKDEP_WARN().
> Might be worth getting that out of the way of the RCU_LOCKDEP_WARN()
> condition.  ;-)
>

I _assumed_ the check was there even for !CONFIG_RCU_NOCB to provide a wider
test coverage of the calling conditions.

If that RCU_LOCKDEP_WARN() becomes conditionnal on CONFIG_RCU_NOCB then
yes, we could clean that up.

>> +			return true;
>> +
>> +		return rcu_lockdep_is_held_nocb_local(rdp);
>> +	}
>> +
>> +	return false;
>> +}
>> +