lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31d50051-e42c-4ef2-a1ac-e45370c3752e@paulmck-laptop>
Date:   Mon, 20 Nov 2023 21:34:05 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Ankur Arora <ankur.a.arora@...cle.com>
Cc:     linux-kernel@...r.kernel.org, tglx@...utronix.de,
        peterz@...radead.org, torvalds@...ux-foundation.org,
        linux-mm@...ck.org, x86@...nel.org, akpm@...ux-foundation.org,
        luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
        hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
        jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        jgross@...e.com, andrew.cooper3@...rix.com, mingo@...nel.org,
        bristot@...nel.org, mathieu.desnoyers@...icios.com,
        geert@...ux-m68k.org, glaubitz@...sik.fu-berlin.de,
        anton.ivanov@...bridgegreys.com, mattst88@...il.com,
        krypton@...ich-teichert.org, rostedt@...dmis.org,
        David.Laight@...lab.com, richard@....at, mjguzik@...il.com
Subject: Re: [RFC PATCH 48/86] rcu: handle quiescent states for PREEMPT_RCU=n

On Mon, Nov 20, 2023 at 09:17:57PM -0800, Paul E. McKenney wrote:
> On Mon, Nov 20, 2023 at 07:26:05PM -0800, Ankur Arora wrote:
> > 
> > Paul E. McKenney <paulmck@...nel.org> writes:
> > > On Tue, Nov 07, 2023 at 01:57:34PM -0800, Ankur Arora wrote:
> > >> cond_resched() is used to provide urgent quiescent states for
> > >> read-side critical sections on PREEMPT_RCU=n configurations.
> > >> This was necessary because lacking preempt_count, there was no
> > >> way for the tick handler to know if we were executing in RCU
> > >> read-side critical section or not.
> > >>
> > >> An always-on CONFIG_PREEMPT_COUNT, however, allows the tick to
> > >> reliably report quiescent states.
> > >>
> > >> Accordingly, evaluate preempt_count() based quiescence in
> > >> rcu_flavor_sched_clock_irq().
> > >>
> > >> Suggested-by: Paul E. McKenney <paulmck@...nel.org>
> > >> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
> > >> ---
> > >>  kernel/rcu/tree_plugin.h |  3 ++-
> > >>  kernel/sched/core.c      | 15 +--------------
> > >>  2 files changed, 3 insertions(+), 15 deletions(-)
> > >>
> > >> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > >> index f87191e008ff..618f055f8028 100644
> > >> --- a/kernel/rcu/tree_plugin.h
> > >> +++ b/kernel/rcu/tree_plugin.h
> > >> @@ -963,7 +963,8 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
> > >>   */
> > >>  static void rcu_flavor_sched_clock_irq(int user)
> > >>  {
> > >> -	if (user || rcu_is_cpu_rrupt_from_idle()) {
> > >> +	if (user || rcu_is_cpu_rrupt_from_idle() ||
> > >> +	    !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
> > >
> > > This looks good.
> > >
> > >>  		/*
> > >>  		 * Get here if this CPU took its interrupt from user
> > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > >> index bf5df2b866df..15db5fb7acc7 100644
> > >> --- a/kernel/sched/core.c
> > >> +++ b/kernel/sched/core.c
> > >> @@ -8588,20 +8588,7 @@ int __sched _cond_resched(void)
> > >>  		preempt_schedule_common();
> > >>  		return 1;
> > >>  	}
> > >> -	/*
> > >> -	 * In preemptible kernels, ->rcu_read_lock_nesting tells the tick
> > >> -	 * whether the current CPU is in an RCU read-side critical section,
> > >> -	 * so the tick can report quiescent states even for CPUs looping
> > >> -	 * in kernel context.  In contrast, in non-preemptible kernels,
> > >> -	 * RCU readers leave no in-memory hints, which means that CPU-bound
> > >> -	 * processes executing in kernel context might never report an
> > >> -	 * RCU quiescent state.  Therefore, the following code causes
> > >> -	 * cond_resched() to report a quiescent state, but only when RCU
> > >> -	 * is in urgent need of one.
> > >> -	 *      /
> > >> -#ifndef CONFIG_PREEMPT_RCU
> > >> -	rcu_all_qs();
> > >> -#endif
> > >
> > > But...
> > >
> > > Suppose we have a long-running loop in the kernel that regularly
> > > enables preemption, but only momentarily.  Then the added
> > > rcu_flavor_sched_clock_irq() check would almost always fail, making
> > > for extremely long grace periods.
> > 
> > So, my thinking was that if RCU wants to end a grace period, it would
> > force a context switch by setting TIF_NEED_RESCHED (and as patch 38 mentions
> > RCU always uses the the eager version) causing __schedule() to call
> > rcu_note_context_switch().
> > That's similar to the preempt_schedule_common() case in the
> > _cond_resched() above.
> 
> But that requires IPIing that CPU, correct?
> 
> > But if I see your point, RCU might just want to register a quiescent
> > state and for this long-running loop rcu_flavor_sched_clock_irq() does
> > seem to fall down.
> > 
> > > Or did I miss a change that causes preempt_enable() to help RCU out?
> > 
> > Something like this?
> > 
> > diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> > index dc5125b9c36b..e50f358f1548 100644
> > --- a/include/linux/preempt.h
> > +++ b/include/linux/preempt.h
> > @@ -222,6 +222,8 @@ do { \
> >         barrier(); \
> >         if (unlikely(preempt_count_dec_and_test())) \
> >                 __preempt_schedule(); \
> > +       if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) \
> > +               rcu_all_qs(); \
> >  } while (0)
> 
> Or maybe something like this to lighten the load a bit:
> 
> #define preempt_enable() \
> do { \
> 	barrier(); \
> 	if (unlikely(preempt_count_dec_and_test())) { \
> 		__preempt_schedule(); \
> 		if (raw_cpu_read(rcu_data.rcu_urgent_qs) && \
> 		    !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) \
> 			rcu_all_qs(); \
> 	} \
> } while (0)
> 
> And at that point, we should be able to drop the PREEMPT_MASK, not
> that it makes any difference that I am aware of:
> 
> #define preempt_enable() \
> do { \
> 	barrier(); \
> 	if (unlikely(preempt_count_dec_and_test())) { \
> 		__preempt_schedule(); \
> 		if (raw_cpu_read(rcu_data.rcu_urgent_qs) && \
> 		    !(preempt_count() & SOFTIRQ_MASK)) \
> 			rcu_all_qs(); \
> 	} \
> } while (0)
> 
> Except that we can migrate as soon as that preempt_count_dec_and_test()
> returns.  And that rcu_all_qs() disables and re-enables preemption,
> which will result in undesired recursion.  Sigh.
> 
> So maybe something like this:
> 
> #define preempt_enable() \
> do { \
> 	if (raw_cpu_read(rcu_data.rcu_urgent_qs) && \
> 	    !(preempt_count() & SOFTIRQ_MASK)) \

Sigh.  This needs to include (PREEMPT_MASK | SOFTIRQ_MASK),
but check for equality to something like (1UL << PREEMPT_SHIFT).

Clearly time to sleep.  :-/

							Thanx, Paul

> 		rcu_all_qs(); \
> 	barrier(); \
> 	if (unlikely(preempt_count_dec_and_test())) { \
> 		__preempt_schedule(); \
> 	} \
> } while (0)
> 
> Then rcu_all_qs() becomes something like this:
> 
> void rcu_all_qs(void)
> {
> 	unsigned long flags;
> 
> 	/* Load rcu_urgent_qs before other flags. */
> 	if (!smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs)))
> 		return;
> 	this_cpu_write(rcu_data.rcu_urgent_qs, false);
> 	if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs))) {
> 		local_irq_save(flags);
> 		rcu_momentary_dyntick_idle();
> 		local_irq_restore(flags);
> 	}
> 	rcu_qs();
> }
> EXPORT_SYMBOL_GPL(rcu_all_qs);
> 
> > Though I do wonder about the likelihood of hitting the case you describe
> > and maybe instead of adding the check on every preempt_enable()
> > it might be better to instead force a context switch in the
> > rcu_flavor_sched_clock_irq() (as we do in the PREEMPT_RCU=y case.)
> 
> Maybe.  But rcu_all_qs() is way lighter weight than a context switch.
> 
> 							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ