lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220623003409.GD1790663@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 22 Jun 2022 17:34:09 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     "Zhang, Qiang1" <qiang1.zhang@...el.com>
Cc:     Boqun Feng <boqun.feng@...il.com>,
        "quic_neeraju@...cinc.com" <quic_neeraju@...cinc.com>,
        "frederic@...nel.org" <frederic@...nel.org>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] rcu: Add exp QS check in rcu_exp_handler() for
 no-preemptible expedited RCU

On Wed, Jun 22, 2022 at 11:34:15PM +0000, Zhang, Qiang1 wrote:
> 
> On Wed, Jun 22, 2022 at 06:35:49PM +0800, Zqiang wrote:
> > In CONFIG_PREEMPT=n and CONFIG_PREEMPT_COUNT=y kernel, after a exp
> > grace period begins, if detected current CPU enters idle in
> > rcu_exp_handler() IPI handler, will immediately report the exp QS of the
> > current cpu, at this time, maybe not being in an RCU read-side critical
> > section, but need wait until rcu-softirq or sched-clock irq or sched-switch
> > occurs on current CPU to check and report exp QS.
> > 
> 
> >I think the idea is OK, however, this "optimization" is based on the
> >implementation detail that rcu_read_lock() counts preempt_count when
> >CONFIG_PREEMPT_COUNT=y, right? It's a little bit dangerous because the
> >preempt_count when CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n is mostly
> >for debugging purposes IIUC, and in other words, _it could be gone_.
> >
> 
> Yes, for CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n kernel
> The rcu_read_lock/unlock are replaced by preempt_disbale/enable, and the 
> preempt-count is exists,  so can report exp QS when not being an  RCU 
> read-side critical(preempt_count & (PREEMPT_MASK | SOFTIRQ_MASK )return zero).
> in IPI handler. 
> 
> For CONFIG_PREEMPT_COUNT=n and CONFIG_PREEMPT=n kernel, 
> The rcu_read_lock/unlock is just barrier(). 
> 
> 
> So I add IS_ENABLED(CONFIG_PREEMPT_COUNT) check in code.
> 
> Of course, for CONFIG_PREEMPT_COUNT=n  kernel, in RCU softirq, the 
> preempt-count is also checked
> 
> /* Report any deferred quiescent states if preemption enabled. */
>  if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && (!(preempt_count() & PREEMPT_MASK))) {
>                  rcu_preempt_deferred_qs(current);
> 
> but the RCU softirq may not be triggered in time and reported exp QS, for
> example a kernel loop exist on NO_HZ_FULL CPU
> 
> this change, It is to capture the exp QS state earlier and report it.
> 
> 
> >Also I'm not aware of any but there could be someone assuming that RCU
> >read-side critical sections can be formed without
> >rcu_read_{lock,unlock}() in CONFIG_PREEMPT=n kernel. For example, there
> >might be "creative" code like the following:
> >
> >	void do_something_only_in_nonpreempt(void)
> >	{
> >		int *p;
> >
> >		// This function only gets called in PREEMPT=n kernel,
> >		// which means everywhere is a RCU read-side critical
> >		// section, let's save some lines of code.
> >
> 		rcu_read_lock();
> >		p = rcu_dereference_check(gp, !IS_ENABLED(PREEMPT));
> >		... // of course no schedule() here.
> >		<access p>
>                              rcu_read_unlock();
> >	}
> >
> 
> Usually access to pointers of type rcu needs to be protected.

Indeed, lockdep would normally complain about this sort of thing.
But in kernels built with (say) CONFIG_PREEMPT_NONE=y but without
CONFIG_PREEMPT_COUNT=N, can lockdep really tell the difference?

> Any thoughts?

It would be good to have some performance data on this change to expedited
grace periods.  It is adding code, so it needs some real motivation.
So, does this change make a system-level difference in (say) expedited
RCU grace-period latency, and if so, under what conditions?

						Thanx, Paul

> >Again, I'm not aware of any existing code that does this but we need to
> >be sure.
> >
> >Regards,
> >Boqun
> >
> > This commit add a exp QS check in rcu_exp_handler(), when not being
> > in an RCU read-side critical section, report exp QS earlier.
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@...el.com>
> > ---
> >  kernel/rcu/tree_exp.h | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index be667583a554..34f08267410f 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -828,11 +828,14 @@ static void rcu_exp_handler(void *unused)
> >  {
> >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> >  	struct rcu_node *rnp = rdp->mynode;
> > +	bool preempt_bh_disabled =
> > +				!!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
> >  
> >  	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> >  	    __this_cpu_read(rcu_data.cpu_no_qs.b.exp))
> >  		return;
> > -	if (rcu_is_cpu_rrupt_from_idle()) {
> > +	if (rcu_is_cpu_rrupt_from_idle() ||
> > +			(IS_ENABLED(CONFIG_PREEMPT_COUNT) && !preempt_bh_disabled)) {
> >  		rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
> >  		return;
> >  	}
> > -- 
> > 2.25.1
> > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ