linux-kernel - Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181015193951.GA33528@joelaf.mtv.corp.google.com>
Date:   Mon, 15 Oct 2018 12:39:51 -0700
From:   Joel Fernandes <joel@...lfernandes.org>
To:     "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc:     Nikolay Borisov <nborisov@...e.com>, linux-kernel@...r.kernel.org,
        Jonathan Corbet <corbet@....net>,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        linux-doc@...r.kernel.org,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about
 disabling preemption

On Mon, Oct 15, 2018 at 04:21:12AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 15, 2018 at 09:05:22AM +0300, Nikolay Borisov wrote:
> > On 15.10.2018 05:47, Joel Fernandes wrote:
> > > On Sun, Oct 14, 2018 at 07:33:28PM -0700, Paul E. McKenney wrote:
> > >> On Sun, Oct 14, 2018 at 07:13:49PM -0700, Joel Fernandes wrote:
> > >>> On Sun, Oct 14, 2018 at 07:08:27PM -0700, Joel Fernandes wrote:
> > >>>> On Sun, Oct 14, 2018 at 04:17:31PM -0700, Paul E. McKenney wrote:
> > >>>>> On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
> > >>>>>> The Requirements.html document says "Disabling Preemption Does Not Block
> > >>>>>> Grace Periods". However this is no longer true with the RCU
> > >>>>>> consolidation. Lets remove the obsolete (non-)requirement entirely.
> > >>>>>>
> > >>>>>> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> > >>>>>
> > >>>>> Good catch, queued, thank you!
> > >>>>
> > >>>> Thanks! By the way after I sent the patch, I also tried Oleg's experiment to
> > >>>> confirm that this is indeed obsolete.  :)
> > >>>>
> > >>>> One thing interesting came up when I tried synchronize_rcu_expedited()
> > >>>> instead of synchronize_rcu() in Oleg's experiment, I still saw a multiple
> > >>>> millisecond delay between when the rcu read section completely and the
> > >>>> synchronize_rcu_expedited returns:
> > >>>>
> > >>>> For example, with synchronize_rcu_expedited, the 'SPIN done' and the 'SYNC
> > >>>> done' are about 3 millisecond apart:
> > >>>> [   77.599142] SPIN start
> > >>>> [   77.601595] SYNC start
> > >>>> [   82.604950] SPIN done!
> > >>>> [   82.607836] SYNC done!
> > >>>>  I saw anywhere from 2-6 milliseconds.
> > >>>>
> > >>>> The reason I bring this up is according to Requirements.html: In some cases,
> > >>>> the multi-millisecond synchronize_rcu() latencies are unacceptable. In these
> > >>>> cases, synchronize_rcu_expedited() may be used instead,.. so either I messed
> > >>>> something up in the experiment, or I need to update this part of the document ;-)
> > >>
> > >> In normal testing, 2-6 milliseconds is indeed excessive.  Could you please
> > >> point me at Oleg's experiment?  Also, what CONFIG_PREEMPT setting were
> > >> you using?  (My guess is CONFIG_PREEMPT=y.)
> > > 
> > > The CONFIG_PREEMPT config I am using is CONFIG_PREEMPT=y.
> > > 
> > >>> So I realized I'm running in Qemu so it could also be a scheduling delay of
> > >>> the vcpu thread. So apologies about the noise if the experiment works fine
> > >>> for you.
> > >>
> > >> I used rcuperf, which might not be doing the same thing as Oleg's
> > >> experiment.
> > > 
> > > The experiment is mentioned at:
> > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg912055.html
> > > 
> > > If you apply the below diff, it applies cleanly on rcu/dev. And then run:
> > > taskset 2 perl -e 'syscall 157, 666, 5000' &
> > > taskset 1 perl -e 'syscall 157, 777'
> > > 
> > > diff --git a/kernel/sys.c b/kernel/sys.c
> > > index cf5c67533ff1..b654b7566ca3 100644
> > > --- a/kernel/sys.c
> > > +++ b/kernel/sys.c
> > > @@ -2261,6 +2261,9 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
> > >  	return -EINVAL;
> > >  }
> > >  
> > > +#include <linux/delay.h>
> > > +
> > > +
> > >  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> > >  		unsigned long, arg4, unsigned long, arg5)
> > >  {
> > > @@ -2274,6 +2277,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> > >  
> > >  	error = 0;
> > >  	switch (option) {
> > > +	case 666:
> > > +		preempt_disable();
> > > +		pr_crit("SPIN start\n");
> > > +		while (arg2--)
> > > +			mdelay(1);
> 
> OK, this is the problem.  When you spin in the kernel for several
> milliseconds with preemption disabled, the consolidated grace period
> is -required- to wait for this preemption-disabled reader to complete,
> whether expedited or not.
> 
> So, expected behavior.  ;-)

Cool. Thanks for confirming. I ran some tests too and if I reduce the
preempt_disabled section's duration, then the delay for
synchronize_rcu_expedited is much lesser indeed.

> In any case, please don't spin for milliseconds with preemption disabled.
> The real-time guys are unlikely to be happy with you if you do this!

Well just to clarify, I was just running Oleg's test which did this. This
test was mentioned in the original documentation that I deleted. Ofcourse I
would not dare do such a thing in production code :-D. I guess to Oleg's
defense, he did it to very that synchronize_rcu() was not blocked on
preempt-disable sections which was a different test.

> > > +		pr_crit("SPIN done!\n");
> > > +		preempt_enable();
> > > +		break;
> > > +	case 777:
> > > +		pr_crit("SYNC start\n");
> > > +		synchronize_rcu();
> > > +		pr_crit("SYNC done!\n");
> > 
> > But you are using the console printing infrastructure which is rather
> > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > write to the lock-free ring buffer, this will reduce the noise from the
> > heavy console printing infrastructure.
> 
> And this might be a problem as well.

This was not the issue (or atleast not fully the issue) since I saw the same
thing with trace_printk. It was exactly what you said - which is the
excessively long preempt disabled times.

thanks!

 - Joel