linux-kernel - Re: Scenario TREE07 with CONFIG_PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220311154603.GC227945@lothringen>
Date:   Fri, 11 Mar 2022 16:46:03 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?

On Fri, Mar 11, 2022 at 07:21:48AM -0800, Paul E. McKenney wrote:
> On Fri, Mar 11, 2022 at 02:07:19PM +0100, Frederic Weisbecker wrote:
> > On Thu, Mar 10, 2022 at 02:52:19PM -0800, Paul E. McKenney wrote:
> > > On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> > > > On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > > > > Hello, Frederic,
> > > > > 
> > > > > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > > > > then am getting roughly one RCU CPU stall warning (or silent hang)
> > > > > per few tens of hours of rcutorture testing on dual-socket systems.
> > > > > The stall warnings feature starvation of RCU grace-period kthread.
> > > > > 
> > > > > Any advice on debugging this?
> > > > 
> > > > Oh, I'm testing that!
> > > 
> > > Even better, thank you!  ;-)
> > 
> > Here is what I'm testing below. If it happens to work though, it's still not
> > the most optimized way of dealing with the UP on SMP situation as we still start
> > an exp grace period when we could early return. But since we have a cookie
> > to pass to poll_state_synchronize_rcu_expedited()...
> > 
> > Oh but if we were to early check a positive rcu_blocking_is_gp() from
> > start_poll_synchronize_rcu_expedited(),
> > we could simply return the current value of rcu_state.expedited_sequence without
> > doing an rcu_exp_gp_seq_snap() and then pass that to
> > poll_state_synchronize_rcu_expedited() which should then immediately return.
> > 
> > Now even if we do that, we still need the below in case the CPUs went offline
> > in the middle of start_poll_synchronize_rcu_expedited() (again, assuming this
> > fixes the issue. I'm running the test).
> 
> Color me slow and stupid!!!
> 
> So the reason that this works for CONFIG_PREEMPT_DYNAMIC=y is that
> the rcu_blocking_is_gp() was never updated to account for this.
> 
> The first "if" in rcu_blocking_is_gp() needs to become something like
> this:
> 
> 	if (!preemption_boot_enabled())
> 
> Where:
> 
> 	bool preemption_boot_enabled(void)
> 	{
> 	#ifdef CONFIG_PREEMPT_DYNAMIC
> 		return preempt_dynamic_mode == preempt_dynamic_full;
> 	#else
> 		return IS_ENABLED(CONFIG_PREEMPTION);
> 	#endif
> 	}
> 
> Or am I missing something here?

Oh right there is that too!

I think we need to apply this patch:
https://lore.kernel.org/lkml/20211110202448.4054153-3-valentin.schneider@arm.com/
and then your above change. I can cook a series with the below.

> 
> > ---
> > >From 3c9f5df000b9659edbcf38cb87136fea1f8ac682 Mon Sep 17 00:00:00 2001
> > From: Frederic Weisbecker <frederic@...nel.org>
> > Date: Fri, 11 Mar 2022 13:30:02 +0100
> > Subject: [PATCH] rcu: Fix rcu exp polling
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> > ---
> >  kernel/rcu/tree_exp.h | 52 ++++++++++++++++++++++++-------------------
> >  1 file changed, 29 insertions(+), 23 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index d5f30085b0cf..69c4dcc9159a 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -794,27 +794,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> >  
> >  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
> >  
> > -/**
> > - * synchronize_rcu_expedited - Brute-force RCU grace period
> > - *
> > - * Wait for an RCU grace period, but expedite it.  The basic idea is to
> > - * IPI all non-idle non-nohz online CPUs.  The IPI handler checks whether
> > - * the CPU is in an RCU critical section, and if so, it sets a flag that
> > - * causes the outermost rcu_read_unlock() to report the quiescent state
> > - * for RCU-preempt or asks the scheduler for help for RCU-sched.  On the
> > - * other hand, if the CPU is not in an RCU read-side critical section,
> > - * the IPI handler reports the quiescent state immediately.
> > - *
> > - * Although this is a great improvement over previous expedited
> > - * implementations, it is still unfriendly to real-time workloads, so is
> > - * thus not recommended for any sort of common-case code.  In fact, if
> > - * you are using synchronize_rcu_expedited() in a loop, please restructure
> > - * your code to batch your updates, and then use a single synchronize_rcu()
> > - * instead.
> > - *
> > - * This has the same semantics as (but is more brutal than) synchronize_rcu().
> > - */
> > -void synchronize_rcu_expedited(void)
> 
> We should have a header comment here.  Given that I missed the need for
> this, for example.  ;-)
> 
> But feel free to send a formal patch without it, and I can add it.
> 
> Otherwise, it looks good.

Ok, preparing this.

Thanks!