[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221215013938.GF4001@paulmck-ThinkPad-P17-Gen-1>
Date: Wed, 14 Dec 2022 17:39:38 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: boqun.feng@...il.com, frederic@...nel.org, neeraj.iitr10@...il.com,
urezki@...il.com, rcu@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] srcu: Yet more detail for
srcu_readers_active_idx_check() comments
On Wed, Dec 14, 2022 at 08:34:03PM -0500, Joel Fernandes wrote:
> On Wed, Dec 14, 2022 at 7:04 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > On Wed, Dec 14, 2022 at 11:14:48PM +0000, Joel Fernandes wrote:
> > > On Wed, Dec 14, 2022 at 11:10 PM Joel Fernandes <joel@...lfernandes.org> wrote:
> > > >
> > > > On Wed, Dec 14, 2022 at 11:07 PM Joel Fernandes <joel@...lfernandes.org> wrote:
> > > > >
> > > > > On Wed, Dec 14, 2022 at 9:24 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> > > > > > > I also did not get why you care about readers that come and ago (you
> > > > > > > mentioned the first reader seeing incorrect idx and the second reader
> > > > > > > seeing the right flipped one, etc). Those readers are irrelevant
> > > > > > > AFAICS since they came and went, and need not be waited on , right?.
> > > > > >
> > > > > > The comment is attempting to show (among other things) that we don't
> > > > > > need to care about readers that come and go more than twice during that
> > > > > > critical interval of time during the counter scans.
> > > > >
> > > > > Why do we need to care about readers that come and go even once? Once
> > > > > they are gone, they have already done an unlock() and their RSCS is
> > > > > over, so they need to be considered AFAICS.
> > > > >
> > > >
> > > > Aargh, I meant: "so they need to be considered AFAICS".
> > >
> > > Trying again: "so they need not be considered AFAICS".
> >
> > Give or take counter wrap, which can make it appear that still-present
> > readers have finished.
>
> Ah you mean those flood of readers affect the counter wrapping and not
> that those readers have to be waited on or anything, they just happen
> to have a side-effect on *existing readers* which need to be waited
> on.
Exactly, that flood of readers could potentially result in a
counter-wrap-induced too-short SRCU grace period, and it is SRCU's job
to avoid that, and specifically the job of the code that this comment
lives in.
> Thanks a lot for this explanation, this part I agree. Readers that
> sampled the idx before the flip happened, and then did their
> lock+unlock counter increments both after the flip, and after the
> second unlock counter scan (second scan), can mess up the lock
> counters such that the second scan found lock==unlock, even though it
> is not to be due to pre-existing readers. But as you pointed out,
> there have to be a substantially large number of these to cause the
> equality check to match. This might be another reason why it is
> important to scan the unlocks first, because the locks are what have
> to cause the wrap around of the lock counter. Instead if you counted
> locks first, then the unlocks would have to do the catching up to the
> locks which are much fewer than a full wrap around.
True enough!
> I still don't see why this affects only the first reader. There could
> be more than 1 reader that needs to be waited on (the readers that
> started before the grace period started). Say there are 5 of them.
> When the grace period starts, the interfering readers (2^32 of them or
> so) could have sampled the old idx before the flip, and then do
> lock+unlock (on that old pre-flip() idx) in quick succession after the
> smp_mb() in the second srcu_readers_active_idx_check(). That causes
> those 5 poor readers to not be waited on. Granted, any new readers
> after this thundering herd should see the new idx and will not be
> affected, thanks to the memory barriers.
Yes, there could be quite a few such readers, but it only takes one
messed-up reader to constitute a bug in SRCU. ;-)
> Still confused, but hey I'll take it little at a time ;-) Also thanks
> for the suggestions for litmus tests.
Agreed, setting this sort of thing aside for a bit and then coming back
to it can be helpful.
Thanx, Paul
> Cheers,
>
> - Joel
>
> > > Anyway, my 1 year old son is sick so signing off for now. Thanks.
> >
> > Ouch! I hope he recovers quickly and completely!!!
> >
> > Thanx, Paul
Powered by blists - more mailing lists