[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220905125949.GA173859@lothringen>
Date: Mon, 5 Sep 2022 14:59:49 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
rushikesh.s.kadam@...el.com, urezki@...il.com,
neeraj.iitr10@...il.com, paulmck@...nel.org, rostedt@...dmis.org,
vineeth@...byteword.org, boqun.feng@...il.com
Subject: Re: [PATCH v5 06/18] rcu: Introduce call_rcu_lazy() API
implementation
On Fri, Sep 02, 2022 at 07:09:39PM -0400, Joel Fernandes wrote:
> On 9/2/2022 11:21 AM, Frederic Weisbecker wrote:
> >> @@ -3904,7 +3943,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
> >> rdp->barrier_head.func = rcu_barrier_callback;
> >> debug_rcu_head_queue(&rdp->barrier_head);
> >> rcu_nocb_lock(rdp);
> >> - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> >> + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false,
> >> + /* wake gp thread */ true));
> >
> > It's a bad sign when you need to start commenting your boolean parameters :)
> > Perhaps use a single two-bit flag instead of two booleans, for readability?
>
> That's fair, what do you mean 2-bit flag? Are you saying, we encode the last 2
> parameters to flush bypass in a u*?
Yeah exactly. Such as rcu_nocb_flush_bypass(rdp, NULL, jiffies, FLUSH_LAZY | FLUSH_WAKE)
>
> > Also that's a subtle change which purpose isn't explained. It means that
> > rcu_barrier_entrain() used to wait for the bypass timer in the worst case
> > but now we force rcuog into it immediately. Should that be a separate change?
>
> It could be split, but it is laziness that amplifies the issue so I thought of
> keeping it in the same patch. I don't mind one way or the other.
Ok then lets keep it here but please add a comment for the reason to
force wake here.
> >> + case RCU_NOCB_WAKE_BYPASS:
> >> + mod_jif = 2;
> >> + break;
> >> +
> >> + case RCU_NOCB_WAKE:
> >> + case RCU_NOCB_WAKE_FORCE:
> >> + // For these, make it wake up the soonest if we
> >> + // were in a bypass or lazy sleep before.
> >> if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
> >> - mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
> >> - if (rdp_gp->nocb_defer_wakeup < waketype)
> >> - WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> >> + mod_jif = 1;
> >> + break;
> >> }
> >>
> >> + if (mod_jif)
> >> + mod_timer(&rdp_gp->nocb_timer, jiffies + mod_jif);
> >> +
> >> + if (rdp_gp->nocb_defer_wakeup < waketype)
> >> + WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> >
> > So RCU_NOCB_WAKE_BYPASS and RCU_NOCB_WAKE_LAZY don't override the timer state
> > anymore? Looks like something is missing.
>
> My goal was to make sure that NOCB_WAKE_LAZY wake keeps the timer lazy. If I
> don't do this, then when CPU enters idle, it will immediately do a wake up via
> this call:
>
> rcu_nocb_need_deferred_wakeup(rdp_gp, RCU_NOCB_WAKE)
But if the timer is in RCU_NOCB_WAKE_LAZY mode, that shouldn't be a problem.
>
> That was almost always causing lazy CBs to be non-lazy thus negating all the
> benefits.
>
> Note that bypass will also have same issue where the bypass CB will not wait for
> intended bypass duration. To make it consistent with lazy, I made bypass also
> not override nocb_defer_wakeup.
I'm surprised because rcu_nocb_flush_deferred_wakeup() should only do the wake up
if the timer is RCU_NOCB_WAKE or RCU_NOCB_WAKE_FORCE. Or is that code buggy
somehow?
Actually your change is modifying the timer delay without changing the timer
mode, which may shortcut rcu_nocb_flush_deferred_wakeup() check and actually
make it perform early upon idle loop entry.
Or am I missing something?
>
> I agree its not pretty, but it works and I could not find any code path where it
> does not work. That said, I am open to ideas for changing this and perhaps some
> of these unneeded delays with bypass CBs can be split into separate patches.
>
> Regarding your point about nocb_defer_wakeup state diverging from the timer
> programming, that happens anyway here in current code:
>
> 283 } else {
> 284 if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
> 285 mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
> 286 if (rdp_gp->nocb_defer_wakeup < waketype)
> 287 WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> 288 }
How so?
> >> @@ -705,12 +816,21 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> >> my_rdp->nocb_gp_gp = needwait_gp;
> >> my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
> >>
> >> - if (bypass && !rcu_nocb_poll) {
> >> - // At least one child with non-empty ->nocb_bypass, so set
> >> - // timer in order to avoid stranding its callbacks.
> >> - wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
> >> - TPS("WakeBypassIsDeferred"));
> >> + // At least one child with non-empty ->nocb_bypass, so set
> >> + // timer in order to avoid stranding its callbacks.
> >> + if (!rcu_nocb_poll) {
> >> + // If bypass list only has lazy CBs. Add a deferred
> >> + // lazy wake up.
> >> + if (lazy && !bypass) {
> >> + wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_LAZY,
> >> + TPS("WakeLazyIsDeferred"));
> >
> > What if:
> >
> > 1) rdp(1) has lazy callbacks
> > 2) all other rdp's have no callback at all
> > 3) nocb_gp_wait() runs through all rdp's, everything is handled, except for
> > these lazy callbacks
> > 4) It reaches the above path, ready to arm the RCU_NOCB_WAKE_LAZY timer,
> > but it hasn't yet called wake_nocb_gp_defer()
> > 5) Oh but rdp(2) queues a non-lazy callback. interrupts are disabled so it defers
> > the wake up to nocb_gp_wait() with arming the timer in RCU_NOCB_WAKE.
> > 6) nocb_gp_wait() finally calls wake_nocb_gp_defer() and override the timeout
> > to several seconds ahead.
> > 7) No more callbacks queued, the non-lazy callback will have to wait several
> > seconds to complete.
> >
> > Or did I miss something?
>
> In theory, I can see this being an issue. In practice, I have not seen it to
> be.
What matters is that the issue looks plausible.
> In my view, the nocb GP thread should not go to sleep in the first place if
> there are any non-bypass CBs being queued. If it does, then that seems an
> optimization-related bug.
Yeah, it's a constraint introduced by the optimized delayed wake up.
By why would RCU_NOCB_WAKE_LAZY need to overwrite RCU_NOCB_WAKE in the first
place?
>
> That said, we can make wake_nocb_gp_defer() more robust perhaps by making it not
> overwrite the timer if the wake-type requested is weaker than RCU_NOCB_WAKE,
> however that should not cause the going-into-idle issues I pointed. Whether the
> idle time issue will happen, I have no idea. But in theory, will that address
> your concern above?
Yes, but I'm still confused by this idle time issue.
>
> > Note that the race exists with RCU_NOCB_WAKE_BYPASS
> > but it's only about one jiffy delay, not seconds.
>
> Well, 2 jiffies. But yeah.
>
> Thanks, so far I do not see anything that cannot be fixed on top of this patch
> but you raised some good points. Maybe we ought to rewrite the idle path to not
> disturb lazy CBs in a different way, or something (while keeping the timer state
> consistent with the programming of the timer in wake_nocb_gp_defer()).
I'd rather see an updated patch (not the whole patchset but just this one) rather
than deltas, just to make sure I'm not missing something in the whole picture.
Thanks.
Powered by blists - more mailing lists