linux-kernel - Re: dyntick-idle CPU and node's qsmask

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20181121143952.GK4170@linux.ibm.com>
Date:   Wed, 21 Nov 2018 06:39:52 -0800
From:   "Paul E. McKenney" <paulmck@...ux.ibm.com>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     linux-kernel@...r.kernel.org, josh@...htriplett.org,
        rostedt@...dmis.org, mathieu.desnoyers@...icios.com,
        jiangshanlai@...il.com
Subject: Re: dyntick-idle CPU and node's qsmask

On Tue, Nov 20, 2018 at 08:37:22PM -0800, Joel Fernandes wrote:
> On Tue, Nov 20, 2018 at 06:41:07PM -0800, Paul E. McKenney wrote:
> [...] 
> > > > > I was thinking if we could simplify rcu_note_context_switch (the parts that
> > > > > call rcu_momentary_dyntick_idle), if we did the following in
> > > > > rcu_implicit_dynticks_qs.
> > > > > 
> > > > > Since we already call rcu_qs in rcu_note_context_switch, that would clear the
> > > > > rdp->cpu_no_qs flag. Then there should be no need to call
> > > > > rcu_momentary_dyntick_idle from rcu_note_context switch.
> > > > 
> > > > But does this also work for the rcu_all_qs() code path?
> > > 
> > > Could we not do something like this in rcu_all_qs? as some over-simplified
> > > pseudo code:
> > > 
> > > rcu_all_qs() {
> > >   if (!urgent_qs || !heavy_qs)
> > >      return;
> > > 
> > >   rcu_qs();   // This clears the rdp->cpu_no_qs flags which we can monitor in
> > >               //  the diff in my last email (from rcu_implicit_dynticks_qs)
> > > }
> > 
> > Except that rcu_qs() doesn't necessarily report the quiescent state to
> > the RCU core.  Keeping down context-switch overhead and all that.
> 
> Sure yeah, but I think the QS will be indirectly anyway by the force_qs_rnp()
> path if we detect that rcu_qs() happened on the CPU?

The force_qs_rnp() path won't see anything that has not already been
reported to the RCU core.

> > > > > I think this would simplify cond_resched as well.  Could this avoid the need
> > > > > for having an rcu_all_qs at all? Hopefully I didn't some Tasks-RCU corner cases..
> > > > 
> > > > There is also the code path from cond_resched() in PREEMPT=n kernels.
> > > > This needs rcu_all_qs().  Though it is quite possible that some additional
> > > > code collapsing is possible.
> > > > 
> > > > > Basically for some background, I was thinking can we simplify the code that
> > > > > calls "rcu_momentary_dyntick_idle" since we already register a qs in other
> > > > > ways (like by resetting cpu_no_qs).
> > > > 
> > > > One complication is that rcu_all_qs() is invoked with interrupts
> > > > and preemption enabled, while rcu_note_context_switch() is
> > > > invoked with interrupts disabled.  Also, as you say, Tasks RCU.
> > > > Plus rcu_all_qs() wants to exit immediately if there is nothing to
> > > > do, while rcu_note_context_switch() must unconditionally do rcu_qs()
> > > > -- yes, it could check, but that would be redundant with the checks
> > > 
> > > This immediate exit is taken care off in the above psuedo code, would that
> > > help the cond_resched performance?
> > 
> > It look like you are cautiously edging towards the two wrapper functions
> > calling common code, relying on inlining and simplification.  Why not just
> > try doing it?  ;-)
> 
> Sure yeah. I was more thinking of the ambitious goal of getting rid of the
> complexity and exploring the general design idea, than containing/managing
> the complexity with reducing code duplication. :D
> 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index c818e0c91a81..5aa0259c014d 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -1063,7 +1063,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> > > > >  	 * read-side critical section that started before the beginning
> > > > >  	 * of the current RCU grace period.
> > > > >  	 */
> > > > > -	if (rcu_dynticks_in_eqs_since(rdp, rdp->dynticks_snap)) {
> > > > > +	if (rcu_dynticks_in_eqs_since(rdp, rdp->dynticks_snap) || !rdp->cpu_no_qs.b.norm) {
> > > > 
> > > > If I am not too confused, this change could cause trouble for
> > > > nohz_full CPUs looping in the kernel.  Such CPUs don't necessarily take
> > > > scheduler-clock interrupts, last I checked, and this could prevent the
> > > > CPU from reporting its quiescent state to core RCU.
> > > 
> > > Would that still be a problem if rcu_all_qs called rcu_qs? Also the above
> > > diff is an OR condition so it is more relaxed than before.
> > 
> > Yes, because rcu_qs() is only guaranteed to capture the quiescent
> > state on the current CPU, not necessarily report it to the RCU core.
> 
> The reporting to the core is necessary to call rcu_report_qs_rnp so that the
> QS information is propogating up the tree, right?
> 
> Wouldn't that reporting be done anyway by:
> 
> force_qs_rnp
>   -> rcu_implicit_dynticks_qs  (which returns 1 because rdp->cpu_no_qs.b.norm
> 				was cleared by rcu_qs() and we detect that
> 				with help of above diff)

Ah.  It is not safe to sample rdp->cpu_no_qs.b.norm off-CPU, and that
is what your patch would do.  This is intentional -- if it were safe to
sample off-CPU, then it would be more expensive to read/update on-CPU.

>   -> rcu_report_qs_rnp is called with mask bit set for corresponding CPU that
>   				has the !rdp->cpu_no_qs.b.norm
> 
> 
> I think that's what I am missing - that why wouldn't the above scheme work.
> The only difference is reporting to the RCU core might invoke pending
> callbacks but I'm not sure if that matters for this. I'll these changes,
> and try tracing it out and study it more.  thanks for the patience,

There are a lot of moving parts and you have not yet gotten to all
of them.  I suggest next taking a look at the relationship between
rcu_check_callbacks() and rcu_process_callbacks(), including the
open_softirq().  These have old names -- they handle the interface
between the CPU and RCU code, among other things.  Including invoking
callbacks, but only for some configurations.  :-/

							Thanx, Paul