linux-kernel - Re: Scenario TREE07 with CONFIG_PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220311152148.GF4285@paulmck-ThinkPad-P17-Gen-1>
Date:   Fri, 11 Mar 2022 07:21:48 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?

On Fri, Mar 11, 2022 at 02:07:19PM +0100, Frederic Weisbecker wrote:
> On Thu, Mar 10, 2022 at 02:52:19PM -0800, Paul E. McKenney wrote:
> > On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> > > On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > > > Hello, Frederic,
> > > > 
> > > > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > > > then am getting roughly one RCU CPU stall warning (or silent hang)
> > > > per few tens of hours of rcutorture testing on dual-socket systems.
> > > > The stall warnings feature starvation of RCU grace-period kthread.
> > > > 
> > > > Any advice on debugging this?
> > > 
> > > Oh, I'm testing that!
> > 
> > Even better, thank you!  ;-)
> 
> Here is what I'm testing below. If it happens to work though, it's still not
> the most optimized way of dealing with the UP on SMP situation as we still start
> an exp grace period when we could early return. But since we have a cookie
> to pass to poll_state_synchronize_rcu_expedited()...
> 
> Oh but if we were to early check a positive rcu_blocking_is_gp() from
> start_poll_synchronize_rcu_expedited(),
> we could simply return the current value of rcu_state.expedited_sequence without
> doing an rcu_exp_gp_seq_snap() and then pass that to
> poll_state_synchronize_rcu_expedited() which should then immediately return.
> 
> Now even if we do that, we still need the below in case the CPUs went offline
> in the middle of start_poll_synchronize_rcu_expedited() (again, assuming this
> fixes the issue. I'm running the test).

Color me slow and stupid!!!

So the reason that this works for CONFIG_PREEMPT_DYNAMIC=y is that
the rcu_blocking_is_gp() was never updated to account for this.

The first "if" in rcu_blocking_is_gp() needs to become something like
this:

	if (!preemption_boot_enabled())

Where:

	bool preemption_boot_enabled(void)
	{
	#ifdef CONFIG_PREEMPT_DYNAMIC
		return preempt_dynamic_mode == preempt_dynamic_full;
	#else
		return IS_ENABLED(CONFIG_PREEMPTION);
	#endif
	}

Or am I missing something here?

> ---
> >From 3c9f5df000b9659edbcf38cb87136fea1f8ac682 Mon Sep 17 00:00:00 2001
> From: Frederic Weisbecker <frederic@...nel.org>
> Date: Fri, 11 Mar 2022 13:30:02 +0100
> Subject: [PATCH] rcu: Fix rcu exp polling
> 
> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> ---
>  kernel/rcu/tree_exp.h | 52 ++++++++++++++++++++++++-------------------
>  1 file changed, 29 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index d5f30085b0cf..69c4dcc9159a 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -794,27 +794,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>  
>  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
>  
> -/**
> - * synchronize_rcu_expedited - Brute-force RCU grace period
> - *
> - * Wait for an RCU grace period, but expedite it.  The basic idea is to
> - * IPI all non-idle non-nohz online CPUs.  The IPI handler checks whether
> - * the CPU is in an RCU critical section, and if so, it sets a flag that
> - * causes the outermost rcu_read_unlock() to report the quiescent state
> - * for RCU-preempt or asks the scheduler for help for RCU-sched.  On the
> - * other hand, if the CPU is not in an RCU read-side critical section,
> - * the IPI handler reports the quiescent state immediately.
> - *
> - * Although this is a great improvement over previous expedited
> - * implementations, it is still unfriendly to real-time workloads, so is
> - * thus not recommended for any sort of common-case code.  In fact, if
> - * you are using synchronize_rcu_expedited() in a loop, please restructure
> - * your code to batch your updates, and then use a single synchronize_rcu()
> - * instead.
> - *
> - * This has the same semantics as (but is more brutal than) synchronize_rcu().
> - */
> -void synchronize_rcu_expedited(void)

We should have a header comment here.  Given that I missed the need for
this, for example.  ;-)

But feel free to send a formal patch without it, and I can add it.

Otherwise, it looks good.

							Thanx, Paul

> +static void __synchronize_rcu_expedited(bool polling)
>  {
>  	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
>  	struct rcu_exp_work rew;
> @@ -827,7 +807,7 @@ void synchronize_rcu_expedited(void)
>  			 "Illegal synchronize_rcu_expedited() in RCU read-side critical section");
>  
>  	/* Is the state is such that the call is a grace period? */
> -	if (rcu_blocking_is_gp())
> +	if (rcu_blocking_is_gp() && !polling)
>  		return;
>  
>  	/* If expedited grace periods are prohibited, fall back to normal. */
> @@ -863,6 +843,32 @@ void synchronize_rcu_expedited(void)
>  
>  	if (likely(!boottime))
>  		destroy_work_on_stack(&rew.rew_work);
> +
> +}
> +
> +/**
> + * synchronize_rcu_expedited - Brute-force RCU grace period
> + *
> + * Wait for an RCU grace period, but expedite it.  The basic idea is to
> + * IPI all non-idle non-nohz online CPUs.  The IPI handler checks whether
> + * the CPU is in an RCU critical section, and if so, it sets a flag that
> + * causes the outermost rcu_read_unlock() to report the quiescent state
> + * for RCU-preempt or asks the scheduler for help for RCU-sched.  On the
> + * other hand, if the CPU is not in an RCU read-side critical section,
> + * the IPI handler reports the quiescent state immediately.
> + *
> + * Although this is a great improvement over previous expedited
> + * implementations, it is still unfriendly to real-time workloads, so is
> + * thus not recommended for any sort of common-case code.  In fact, if
> + * you are using synchronize_rcu_expedited() in a loop, please restructure
> + * your code to batch your updates, and then use a single synchronize_rcu()
> + * instead.
> + *
> + * This has the same semantics as (but is more brutal than) synchronize_rcu().
> + */
> +void synchronize_rcu_expedited(void)
> +{
> +	__synchronize_rcu_expedited(false);
>  }
>  EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
>  
> @@ -903,7 +909,7 @@ static void sync_rcu_do_polled_gp(struct work_struct *wp)
>  	if (s & 0x1)
>  		return;
>  	while (!sync_exp_work_done(s))
> -		synchronize_rcu_expedited();
> +		__synchronize_rcu_expedited(true);
>  	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
>  	s = rnp->exp_seq_poll_rq;
>  	if (!(s & 0x1) && !sync_exp_work_done(s))
> -- 
> 2.25.1
>