[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8812ea75-ef14-0d5d-19d8-bda70394b41a@joelfernandes.org>
Date: Tue, 6 Sep 2022 22:56:01 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
rushikesh.s.kadam@...el.com, urezki@...il.com,
neeraj.iitr10@...il.com, paulmck@...nel.org, rostedt@...dmis.org,
vineeth@...byteword.org, boqun.feng@...il.com
Subject: Re: [PATCH v5 06/18] rcu: Introduce call_rcu_lazy() API
implementation
On 9/6/2022 3:11 PM, Frederic Weisbecker wrote:
> On Tue, Sep 06, 2022 at 12:43:52PM -0400, Joel Fernandes wrote:
>> On 9/6/2022 12:38 PM, Joel Fernandes wrote:
>> Ah, now I know why I got confused. I *used* to flush the bypass list before when
>> !lazy CBs showed up. Paul suggested this is overkill. In this old overkill
>> method, I was missing a wake up which was likely causing the boot regression.
>> Forcing a wake up fixed that. Now in v5 I make it such that I don't do the flush
>> on a !lazy rate-limit.
>>
>> I am sorry for the confusion. Either way, in my defense this is just an extra
>> bit of code that I have to delete. This code is hard. I have mostly relied on a
>> test-driven development. But now thanks to this review and I am learning the
>> code more and more...
>
> Yeah this code is hard.
>
> Especially as it's possible to flush from both sides and queue the timer
> from both sides. And both sides read the bypass/lazy counter locklessly.
> But only call_rcu_*() can queue/increase the bypass size whereas only
> nocb_gp_wait() can cancel the timer. Phew!
>
Haha, Indeed ;-)
> Among the many possible dances between rcu_nocb_try_bypass()
> and nocb_gp_wait(), I haven't found a way yet for the timer to be
> set to LAZY when it should be BYPASS (or other kind of accident such
> as an ignored callback).
> In the worst case we may arm an earlier timer than necessary
> (RCU_NOCB_WAKE_BYPASS instead of RCU_NOCB_WAKE_LAZY for example).
>
> Famous last words...
Agreed.
On the issue of regressions with non-lazy things being treated as lazy, I was
thinking of adding a bounded-time-check to:
[PATCH v5 08/18] rcu: Add per-CB tracing for queuing, flush and invocation.
Where, if a non-lazy CB takes an abnormally long time to execute (say it was
subject to a race-condition), it would splat. This can be done because I am
tracking the queue-time in the rcu_head in that patch.
On another note, boot time regressions show up pretty quickly (at least on
ChromeOS) when non-lazy things become lazy and so far with the latest code it
has fortunately been pretty well behaved.
Thanks,
- Joel
Powered by blists - more mailing lists