lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR11MB58800BA38E0691E2FB3AF552DAB59@PH0PR11MB5880.namprd11.prod.outlook.com>
Date:   Thu, 23 Jun 2022 03:23:19 +0000
From:   "Zhang, Qiang1" <qiang1.zhang@...el.com>
To:     Boqun Feng <boqun.feng@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>
CC:     "quic_neeraju@...cinc.com" <quic_neeraju@...cinc.com>,
        "frederic@...nel.org" <frederic@...nel.org>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] rcu: Add exp QS check in rcu_exp_handler() for
 no-preemptible expedited RCU

On Wed, Jun 22, 2022 at 05:34:09PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2022 at 11:34:15PM +0000, Zhang, Qiang1 wrote:
> > 
> > On Wed, Jun 22, 2022 at 06:35:49PM +0800, Zqiang wrote:
> > > In CONFIG_PREEMPT=n and CONFIG_PREEMPT_COUNT=y kernel, after a exp
> > > grace period begins, if detected current CPU enters idle in
> > > rcu_exp_handler() IPI handler, will immediately report the exp QS of the
> > > current cpu, at this time, maybe not being in an RCU read-side critical
> > > section, but need wait until rcu-softirq or sched-clock irq or sched-switch
> > > occurs on current CPU to check and report exp QS.
> > > 
> > 
> > >I think the idea is OK, however, this "optimization" is based on the
> > >implementation detail that rcu_read_lock() counts preempt_count when
> > >CONFIG_PREEMPT_COUNT=y, right? It's a little bit dangerous because the
> > >preempt_count when CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n is mostly
> > >for debugging purposes IIUC, and in other words, _it could be gone_.
> > >
> > 
> > Yes, for CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n kernel
> > The rcu_read_lock/unlock are replaced by preempt_disbale/enable, and the 
> > preempt-count is exists,  so can report exp QS when not being an  RCU 
> > read-side critical(preempt_count & (PREEMPT_MASK | SOFTIRQ_MASK )return zero).
> > in IPI handler. 
> > 
> > For CONFIG_PREEMPT_COUNT=n and CONFIG_PREEMPT=n kernel, 
> > The rcu_read_lock/unlock is just barrier(). 
> > 
> > 
> > So I add IS_ENABLED(CONFIG_PREEMPT_COUNT) check in code.
> > 
> > Of course, for CONFIG_PREEMPT_COUNT=n  kernel, in RCU softirq, the 
> > preempt-count is also checked
> > 
> > /* Report any deferred quiescent states if preemption enabled. */
> >  if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && (!(preempt_count() & PREEMPT_MASK))) {
> >                  rcu_preempt_deferred_qs(current);
> > 
> > but the RCU softirq may not be triggered in time and reported exp QS, for
> > example a kernel loop exist on NO_HZ_FULL CPU
> > 
> > this change, It is to capture the exp QS state earlier and report it.
> > 
> > 
> > >Also I'm not aware of any but there could be someone assuming that RCU
> > >read-side critical sections can be formed without
> > >rcu_read_{lock,unlock}() in CONFIG_PREEMPT=n kernel. For example, there
> > >might be "creative" code like the following:
> > >
> > >	void do_something_only_in_nonpreempt(void)
> > >	{
> > >		int *p;
> > >
> > >		// This function only gets called in PREEMPT=n kernel,
> > >		// which means everywhere is a RCU read-side critical
> > >		// section, let's save some lines of code.
> > >
> > 		rcu_read_lock();
> > >		p = rcu_dereference_check(gp, !IS_ENABLED(PREEMPT));
> > >		... // of course no schedule() here.
> > >		<access p>
> >                              rcu_read_unlock();
> > >	}
> > >
> > 
> > Usually access to pointers of type rcu needs to be protected.

Yes, _ususally_ they are, but what about the special cases? Because
in PREEMPT=n kernel, almost everywhere is a RCU read-side critical
section, some one might have been "creative" enough to omit these
rcu_read_lock() and rcu_read_unlock().

> 
> Indeed, lockdep would normally complain about this sort of thing.
> But in kernels built with (say) CONFIG_PREEMPT_NONE=y but without
> CONFIG_PREEMPT_COUNT=N, can lockdep really tell the difference?
> 

>Actually with the rcu_dereference_check() above, lockdep cannot detect
>even CONFIG_PREEMPT_COUNT=y, that rcu_dereference_check() basically says
>"I know I'm in a read-side critical section if it's a non-preempt
>kernel, so don't bother to check". ;-)

> > Any thoughts?
> 
> It would be good to have some performance data on this change to expedited
> grace periods.  It is adding code, so it needs some real motivation.

I used rcuscale to test it, and  count the average time of writer-duration

no applied patch(org.txt)
writer avg  time  29690.39     29670.78     29770.65       29423.25

applied patch(new.txt)
writer avg time 28989.99       29003.54      29281.39       28986.58

or Is there a better way to test?

Thanks
Zqiang


>Agreed.
>
>Regards,
>Boqun

> So, does this change make a system-level difference in (say) expedited
> RCU grace-period latency, and if so, under what conditions?
> 
> 						Thanx, Paul
> 
> > >Again, I'm not aware of any existing code that does this but we need to
> > >be sure.
> > >
> > >Regards,
> > >Boqun
> > >
> > > This commit add a exp QS check in rcu_exp_handler(), when not being
> > > in an RCU read-side critical section, report exp QS earlier.
> > > 
> > > Signed-off-by: Zqiang <qiang1.zhang@...el.com>
> > > ---
> > >  kernel/rcu/tree_exp.h | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index be667583a554..34f08267410f 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -828,11 +828,14 @@ static void rcu_exp_handler(void *unused)
> > >  {
> > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > >  	struct rcu_node *rnp = rdp->mynode;
> > > +	bool preempt_bh_disabled =
> > > +				!!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
> > >  
> > >  	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> > >  	    __this_cpu_read(rcu_data.cpu_no_qs.b.exp))
> > >  		return;
> > > -	if (rcu_is_cpu_rrupt_from_idle()) {
> > > +	if (rcu_is_cpu_rrupt_from_idle() ||
> > > +			(IS_ENABLED(CONFIG_PREEMPT_COUNT) && !preempt_bh_disabled)) {
> > >  		rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
> > >  		return;
> > >  	}
> > > -- 
> > > 2.25.1
> > > 

View attachment "new.txt" of type "text/plain" (26105 bytes)

View attachment "org.txt" of type "text/plain" (26102 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ