[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <PH0PR11MB58802CD4DFF62E426A5B36D9DA809@PH0PR11MB5880.namprd11.prod.outlook.com>
Date: Wed, 6 Jul 2022 02:06:08 +0000
From: "Zhang, Qiang1" <qiang1.zhang@...el.com>
To: "paulmck@...nel.org" <paulmck@...nel.org>
CC: Boqun Feng <boqun.feng@...il.com>,
"quic_neeraju@...cinc.com" <quic_neeraju@...cinc.com>,
"frederic@...nel.org" <frederic@...nel.org>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] rcu: Add exp QS check in rcu_exp_handler() for
no-preemptible expedited RCU
On Tue, Jun 28, 2022 at 04:32:58AM +0000, Zhang, Qiang1 wrote:
> On Thu, Jun 23, 2022 at 03:23:19AM +0000, Zhang, Qiang1 wrote:
> > On Wed, Jun 22, 2022 at 05:34:09PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 22, 2022 at 11:34:15PM +0000, Zhang, Qiang1 wrote:
> > > >
> > > > On Wed, Jun 22, 2022 at 06:35:49PM +0800, Zqiang wrote:
> > > > > In CONFIG_PREEMPT=n and CONFIG_PREEMPT_COUNT=y kernel, after a exp
> > > > > grace period begins, if detected current CPU enters idle in
> > > > > rcu_exp_handler() IPI handler, will immediately report the exp QS of the
> > > > > current cpu, at this time, maybe not being in an RCU read-side critical
> > > > > section, but need wait until rcu-softirq or sched-clock irq or sched-switch
> > > > > occurs on current CPU to check and report exp QS.
> > > > >
> > > >
> > > > >I think the idea is OK, however, this "optimization" is based on the
> > > > >implementation detail that rcu_read_lock() counts preempt_count when
> > > > >CONFIG_PREEMPT_COUNT=y, right? It's a little bit dangerous because the
> > > > >preempt_count when CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n is mostly
> > > > >for debugging purposes IIUC, and in other words, _it could be gone_.
> > > > >
> > > >
> > > > Yes, for CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n kernel
> > > > The rcu_read_lock/unlock are replaced by preempt_disbale/enable, and the
> > > > preempt-count is exists, so can report exp QS when not being an RCU
> > > > read-side critical(preempt_count & (PREEMPT_MASK | SOFTIRQ_MASK )return zero).
> > > > in IPI handler.
> > > >
> > > > For CONFIG_PREEMPT_COUNT=n and CONFIG_PREEMPT=n kernel,
> > > > The rcu_read_lock/unlock is just barrier().
> > > >
> > > >
> > > > So I add IS_ENABLED(CONFIG_PREEMPT_COUNT) check in code.
> > > >
> > > > Of course, for CONFIG_PREEMPT_COUNT=n kernel, in RCU softirq, the
> > > > preempt-count is also checked
> > > >
> > > > /* Report any deferred quiescent states if preemption enabled. */
> > > > if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && (!(preempt_count() & PREEMPT_MASK))) {
> > > > rcu_preempt_deferred_qs(current);
> > > >
> > > > but the RCU softirq may not be triggered in time and reported exp QS, for
> > > > example a kernel loop exist on NO_HZ_FULL CPU
> > > >
> > > > this change, It is to capture the exp QS state earlier and report it.
> > > >
> > > >
> > > > >Also I'm not aware of any but there could be someone assuming that RCU
> > > > >read-side critical sections can be formed without
> > > > >rcu_read_{lock,unlock}() in CONFIG_PREEMPT=n kernel. For example, there
> > > > >might be "creative" code like the following:
> > > > >
> > > > > void do_something_only_in_nonpreempt(void)
> > > > > {
> > > > > int *p;
> > > > >
> > > > > // This function only gets called in PREEMPT=n kernel,
> > > > > // which means everywhere is a RCU read-side critical
> > > > > // section, let's save some lines of code.
> > > > >
> > > > rcu_read_lock();
> > > > > p = rcu_dereference_check(gp, !IS_ENABLED(PREEMPT));
> > > > > ... // of course no schedule() here.
> > > > > <access p>
> > > > rcu_read_unlock();
> > > > > }
> > > > >
> > > >
> > > > Usually access to pointers of type rcu needs to be protected.
> >
> > Yes, _ususally_ they are, but what about the special cases? Because
> > in PREEMPT=n kernel, almost everywhere is a RCU read-side critical
> > section, some one might have been "creative" enough to omit these
> > rcu_read_lock() and rcu_read_unlock().
> >
> > >
> > > Indeed, lockdep would normally complain about this sort of thing.
> > > But in kernels built with (say) CONFIG_PREEMPT_NONE=y but without
> > > CONFIG_PREEMPT_COUNT=N, can lockdep really tell the difference?
> > >
> >
> > >Actually with the rcu_dereference_check() above, lockdep cannot detect
> > >even CONFIG_PREEMPT_COUNT=y, that rcu_dereference_check() basically says
> > >"I know I'm in a read-side critical section if it's a non-preempt
> > >kernel, so don't bother to check". ;-)
> >
> > > > Any thoughts?
> > >
> > > It would be good to have some performance data on this change to expedited
> > > grace periods. It is adding code, so it needs some real motivation.
> >
> > I used rcuscale to test it, and count the average time of writer-duration
> >
> > no applied patch(org.txt)
> > writer avg time 29690.39 29670.78 29770.65 29423.25
> >
> > applied patch(new.txt)
> > writer avg time 28989.99 29003.54 29281.39 28986.58
> >
> > or Is there a better way to test?
>
> >If I understand your measurments correctly, you are getting about a 2%
> >improvement in expedited grace-period latency in !PREEMPT kernels.
> >Do we have a situation for which that 2% improvement is important?
>
> To be honest, it will also be affected by the work scheduling delay. Indeed,
> my current test results are not very good, but it is beneficial to detect the
> QS state in advance and report it in time. of course, I will do more tests in
> the future
>
> >We would be taking some risk due to the issues Boqun points out or we
> >would be adding some complexity to avoid those issues. So this
>
> As for the code similar to the above scenario proposed by Boqun,
> it is true that lockdep will not complain, I do not deny that someone
> will write this way.
>
> If there really is such a situation as Boqun mentioned, There is also
> a risk in the following code segment in PREEMPT=n and PREEMPT_COUNT=y
>
> /* Report any deferred quiescent states if preemption enabled. */
> if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && (!(preempt_count() & PREEMPT_MASK)))
> rcu_preempt_deferred_qs(current);
>
>Fair point.
>
>I have applied the commit shown below. Does that look correct?
Thanks, This description is sufficient.
>
> Thanx, Paul
>
>------------------------------------------------------------------------
>
>commit 9db96a8acda6837e5d4cd6a1cd7323709810e418
>Author: Zqiang <qiang1.zhang@...el.com>
>Date: Tue Jul 5 12:09:51 2022 -0700
>
> rcu: Add QS check in rcu_exp_handler() for non-preemptible kernels
>
> Kernels built with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y maintain
> preempt_count() state. Because such kernels map __rcu_read_lock()
> and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
> respectively, this allows the expedited grace period's !CONFIG_PREEMPT_RCU
> version of the rcu_exp_handler() IPI handler function to use
> preempt_count() to detect quiescent states.
>
> This preempt_count() usage might seem to risk failures due to
> use of implicit RCU readers in portions of the kernel under #ifndef
> CONFIG_PREEMPTION, except that rcu_core() already disallows such implicit
> RCU readers. The moral of this story is that you must use explicit
> read-side markings such as rcu_read_lock() or preempt_disable() even if
> the code knows that this kernel does not support preemption.
>
> This commit therefore adds a preempt_count()-based check for a quiescent
> state in the !CONFIG_PREEMPT_RCU version of the rcu_exp_handler()
> function for kernels built with CONFIG_PREEMPT_COUNT=y, reporting an
> immediate quiescent state when the interrupted code had both preemption
> and softirqs enabled.
>
> This change results in about a 2% reduction in expedited grace-period
> latency in kernels built with both CONFIG_PREEMPT_RCU=n and
> CONFIG_PREEMPT_COUNT=y.
>
> Signed-off-by: Zqiang <qiang1.zhang@...el.com>
> Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> Link: https://lore.kernel.org/all/20220622103549.2840087-1-qiang1.zhang@intel.com/
>
>diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>index be667583a5547..b07998159d1fa 100644
>--- a/kernel/rcu/tree_exp.h
>+++ b/kernel/rcu/tree_exp.h
>@@ -828,11 +828,13 @@ static void rcu_exp_handler(void *unused)
> {
> struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> struct rcu_node *rnp = rdp->mynode;
>+ bool preempt_bh_enabled = !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
>
> if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> __this_cpu_read(rcu_data.cpu_no_qs.b.exp))
> return;
>- if (rcu_is_cpu_rrupt_from_idle()) {
>+ if (rcu_is_cpu_rrupt_from_idle() ||
>+ (IS_ENABLED(CONFIG_PREEMPT_COUNT) && preempt_bh_enabled)) {
> rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
> return;
> }
Powered by blists - more mailing lists