[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR11MB58800BA38E0691E2FB3AF552DAB59@PH0PR11MB5880.namprd11.prod.outlook.com>
Date: Thu, 23 Jun 2022 03:23:19 +0000
From: "Zhang, Qiang1" <qiang1.zhang@...el.com>
To: Boqun Feng <boqun.feng@...il.com>,
"Paul E. McKenney" <paulmck@...nel.org>
CC: "quic_neeraju@...cinc.com" <quic_neeraju@...cinc.com>,
"frederic@...nel.org" <frederic@...nel.org>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] rcu: Add exp QS check in rcu_exp_handler() for
no-preemptible expedited RCU
On Wed, Jun 22, 2022 at 05:34:09PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2022 at 11:34:15PM +0000, Zhang, Qiang1 wrote:
> >
> > On Wed, Jun 22, 2022 at 06:35:49PM +0800, Zqiang wrote:
> > > In CONFIG_PREEMPT=n and CONFIG_PREEMPT_COUNT=y kernel, after a exp
> > > grace period begins, if detected current CPU enters idle in
> > > rcu_exp_handler() IPI handler, will immediately report the exp QS of the
> > > current cpu, at this time, maybe not being in an RCU read-side critical
> > > section, but need wait until rcu-softirq or sched-clock irq or sched-switch
> > > occurs on current CPU to check and report exp QS.
> > >
> >
> > >I think the idea is OK, however, this "optimization" is based on the
> > >implementation detail that rcu_read_lock() counts preempt_count when
> > >CONFIG_PREEMPT_COUNT=y, right? It's a little bit dangerous because the
> > >preempt_count when CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n is mostly
> > >for debugging purposes IIUC, and in other words, _it could be gone_.
> > >
> >
> > Yes, for CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n kernel
> > The rcu_read_lock/unlock are replaced by preempt_disbale/enable, and the
> > preempt-count is exists, so can report exp QS when not being an RCU
> > read-side critical(preempt_count & (PREEMPT_MASK | SOFTIRQ_MASK )return zero).
> > in IPI handler.
> >
> > For CONFIG_PREEMPT_COUNT=n and CONFIG_PREEMPT=n kernel,
> > The rcu_read_lock/unlock is just barrier().
> >
> >
> > So I add IS_ENABLED(CONFIG_PREEMPT_COUNT) check in code.
> >
> > Of course, for CONFIG_PREEMPT_COUNT=n kernel, in RCU softirq, the
> > preempt-count is also checked
> >
> > /* Report any deferred quiescent states if preemption enabled. */
> > if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && (!(preempt_count() & PREEMPT_MASK))) {
> > rcu_preempt_deferred_qs(current);
> >
> > but the RCU softirq may not be triggered in time and reported exp QS, for
> > example a kernel loop exist on NO_HZ_FULL CPU
> >
> > this change, It is to capture the exp QS state earlier and report it.
> >
> >
> > >Also I'm not aware of any but there could be someone assuming that RCU
> > >read-side critical sections can be formed without
> > >rcu_read_{lock,unlock}() in CONFIG_PREEMPT=n kernel. For example, there
> > >might be "creative" code like the following:
> > >
> > > void do_something_only_in_nonpreempt(void)
> > > {
> > > int *p;
> > >
> > > // This function only gets called in PREEMPT=n kernel,
> > > // which means everywhere is a RCU read-side critical
> > > // section, let's save some lines of code.
> > >
> > rcu_read_lock();
> > > p = rcu_dereference_check(gp, !IS_ENABLED(PREEMPT));
> > > ... // of course no schedule() here.
> > > <access p>
> > rcu_read_unlock();
> > > }
> > >
> >
> > Usually access to pointers of type rcu needs to be protected.
Yes, _ususally_ they are, but what about the special cases? Because
in PREEMPT=n kernel, almost everywhere is a RCU read-side critical
section, some one might have been "creative" enough to omit these
rcu_read_lock() and rcu_read_unlock().
>
> Indeed, lockdep would normally complain about this sort of thing.
> But in kernels built with (say) CONFIG_PREEMPT_NONE=y but without
> CONFIG_PREEMPT_COUNT=N, can lockdep really tell the difference?
>
>Actually with the rcu_dereference_check() above, lockdep cannot detect
>even CONFIG_PREEMPT_COUNT=y, that rcu_dereference_check() basically says
>"I know I'm in a read-side critical section if it's a non-preempt
>kernel, so don't bother to check". ;-)
> > Any thoughts?
>
> It would be good to have some performance data on this change to expedited
> grace periods. It is adding code, so it needs some real motivation.
I used rcuscale to test it, and count the average time of writer-duration
no applied patch(org.txt)
writer avg time 29690.39 29670.78 29770.65 29423.25
applied patch(new.txt)
writer avg time 28989.99 29003.54 29281.39 28986.58
or Is there a better way to test?
Thanks
Zqiang
>Agreed.
>
>Regards,
>Boqun
> So, does this change make a system-level difference in (say) expedited
> RCU grace-period latency, and if so, under what conditions?
>
> Thanx, Paul
>
> > >Again, I'm not aware of any existing code that does this but we need to
> > >be sure.
> > >
> > >Regards,
> > >Boqun
> > >
> > > This commit add a exp QS check in rcu_exp_handler(), when not being
> > > in an RCU read-side critical section, report exp QS earlier.
> > >
> > > Signed-off-by: Zqiang <qiang1.zhang@...el.com>
> > > ---
> > > kernel/rcu/tree_exp.h | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index be667583a554..34f08267410f 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -828,11 +828,14 @@ static void rcu_exp_handler(void *unused)
> > > {
> > > struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > struct rcu_node *rnp = rdp->mynode;
> > > + bool preempt_bh_disabled =
> > > + !!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
> > >
> > > if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> > > __this_cpu_read(rcu_data.cpu_no_qs.b.exp))
> > > return;
> > > - if (rcu_is_cpu_rrupt_from_idle()) {
> > > + if (rcu_is_cpu_rrupt_from_idle() ||
> > > + (IS_ENABLED(CONFIG_PREEMPT_COUNT) && !preempt_bh_disabled)) {
> > > rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
> > > return;
> > > }
> > > --
> > > 2.25.1
> > >
View attachment "new.txt" of type "text/plain" (26105 bytes)
View attachment "org.txt" of type "text/plain" (26102 bytes)
Powered by blists - more mailing lists