lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 21 Jan 2021 16:20:12 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     tglx@...utronix.de, frederic@...nel.org,
        linux-kernel@...r.kernel.org, x86@...nel.org, cai@....pw,
        mgorman@...hsingularity.net, joel@...lfernandes.org,
        valentin.schneider@....com
Subject: Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

On Thu, Jan 21, 2021 at 05:56:53PM +0100, Peter Zijlstra wrote:
> On Wed, May 27, 2020 at 07:12:36PM +0200, Peter Zijlstra wrote:
> > Subject: rcu: Allow for smp_call_function() running callbacks from idle
> > 
> > Current RCU hard relies on smp_call_function() callbacks running from
> > interrupt context. A pending optimization is going to break that, it
> > will allow idle CPUs to run the callbacks from the idle loop. This
> > avoids raising the IPI on the requesting CPU and avoids handling an
> > exception on the receiving CPU.
> > 
> > Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
> > provided it is the idle task.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> > ---
> >  kernel/rcu/tree.c   | 25 +++++++++++++++++++------
> >  kernel/sched/idle.c |  4 ++++
> >  2 files changed, 23 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index d8e9dbbefcfa..c716eadc7617 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -418,16 +418,23 @@ void rcu_momentary_dyntick_idle(void)
> >  EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
> >  
> >  /**
> > - * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
> > + * rcu_is_cpu_rrupt_from_idle - see if 'interrupted' from idle
> >   *
> >   * If the current CPU is idle and running at a first-level (not nested)
> > - * interrupt from idle, return true.  The caller must have at least
> > - * disabled preemption.
> > + * interrupt, or directly, from idle, return true.
> > + *
> > + * The caller must have at least disabled IRQs.
> >   */
> >  static int rcu_is_cpu_rrupt_from_idle(void)
> >  {
> > -	/* Called only from within the scheduling-clock interrupt */
> > -	lockdep_assert_in_irq();
> > +	long nesting;
> > +
> > +	/*
> > +	 * Usually called from the tick; but also used from smp_function_call()
> > +	 * for expedited grace periods. This latter can result in running from
> > +	 * the idle task, instead of an actual IPI.
> > +	 */
> > +	lockdep_assert_irqs_disabled();
> >  
> >  	/* Check for counter underflows */
> >  	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > @@ -436,9 +443,15 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> >  			 "RCU dynticks_nmi_nesting counter underflow/zero!");
> >  
> >  	/* Are we at first interrupt nesting level? */
> > -	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > +	nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> > +	if (nesting > 1)
> >  		return false;
> >  
> > +	/*
> > +	 * If we're not in an interrupt, we must be in the idle task!
> > +	 */
> > +	WARN_ON_ONCE(!nesting && !is_idle_task(current));
> > +
> >  	/* Does CPU appear to be idle from an RCU standpoint? */
> >  	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> >  }
> 
> Let me revive this thread after yesterdays IRC conversation.
> 
> As said; it might be _extremely_ unlikely, but somewhat possible for us
> to send the IPI concurrent with hot-unplug, not yet observing
> rcutree_offline_cpu() or thereabout.
> 
> Then have the IPI 'delayed' enough to not happen until smpcfd_dying()
> and getting ran there.
> 
> This would then run the function from the stopper thread instead of the
> idle thread and trigger the warning, even though we're not holding
> rcu_read_lock() (which, IIRC, was the only constraint).
> 
> So would something like the below be acceptable?
> 
> ---
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 368749008ae8..2c8d4c3e341e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -445,7 +445,7 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>  	/*
>  	 * Usually called from the tick; but also used from smp_function_call()
>  	 * for expedited grace periods. This latter can result in running from
> -	 * the idle task, instead of an actual IPI.
> +	 * a (usually the idle) task, instead of an actual IPI.

The story is growing enough hair that we should tell it only once.
So here just where it is called from:

	/*
	 * Usually called from the tick; but also used from smp_function_call()
	 * for expedited grace periods.
	 */

>  	lockdep_assert_irqs_disabled();
>  
> @@ -461,9 +461,14 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>  		return false;
>  
>  	/*
> -	 * If we're not in an interrupt, we must be in the idle task!
> +	 * If we're not in an interrupt, we must be in task context.
> +	 *
> +	 * This will typically be the idle task through:
> +	 *   flush_smp_call_function_from_idle(),
> +	 *
> +	 * but can also be in CPU HotPlug through smpcfd_dying().
>  	 */

Good, but how about like this?

	/*
	 * If we are not in an interrupt handler, we must be in
	 * smp_call_function() handler.
	 *
	 * Normally, smp_call_function() handlers are invoked from
	 * the idle task via flush_smp_call_function_from_idle().
	 * However, they can also be invoked from CPU hotplug
	 * operations via smpcfd_dying().
	 */

> -	WARN_ON_ONCE(!nesting && !is_idle_task(current));
> +	WARN_ON_ONCE(!nesting && !in_task(current));

This is used in time-critical contexts, so why not RCU_LOCKDEP_WARN()?
That should also allow checking more closely.  Would something like the
following work?

	RCU_LOCKDEP_WARN(!nesting && !is_idle_task(current) && (!in_task(current) || !lockdep_cpus_write_held()));

Where lockdep_cpus_write_held is defined in kernel/cpu.c:

void lockdep_cpus_write_held(void)
{
#ifdef CONFIG_PROVE_LOCKING
	if (system_state < SYSTEM_RUNNING)
		return false;
	return lockdep_is_held_type(&cpu_hotplug_lock, 0);
#else
	return false;
#endif
}

Seem reasonable?

							Thanx, Paul

>  	/* Does CPU appear to be idle from an RCU standpoint? */
>  	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;

Powered by blists - more mailing lists