[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101108031136.0766149f@laptop.hypervisor.org>
Date: Mon, 8 Nov 2010 03:11:36 +0100
From: "Udo A. Steinberg" <udo@...ervisor.org>
To: paulmck@...ux.vnet.ibm.com
Cc: Joe Korty <joe.korty@...r.com>, fweisbec@...il.com,
mathieu.desnoyers@...icios.com, dhowells@...hat.com,
loic.minier@...aro.org, dhaval.giani@...il.com, tglx@...utronix.de,
peterz@...radead.org, linux-kernel@...r.kernel.org,
josh@...htriplett.org
Subject: Re: [PATCH] a local-timer-free version of RCU
On Sat, 6 Nov 2010 12:28:12 -0700 Paul E. McKenney (PEM) wrote:
PEM> > + * rcu_quiescent() is called from rcu_read_unlock() when a
PEM> > + * RCU batch was started while the rcu_read_lock/rcu_read_unlock
PEM> > + * critical section was executing.
PEM> > + */
PEM> > +
PEM> > +void rcu_quiescent(int cpu)
PEM> > +{
PEM>
PEM> What prevents two different CPUs from calling this concurrently?
PEM> Ah, apparently nothing -- the idea being that
PEM> rcu_grace_period_complete() sorts it out. Though if the second CPU was
PEM> delayed, it seems like it might incorrectly end a subsequent grace
PEM> period as follows:
PEM>
PEM> o CPU 0 clears the second-to-last bit.
PEM>
PEM> o CPU 1 clears the last bit.
PEM>
PEM> o CPU 1 sees that the mask is empty, so invokes
PEM> rcu_grace_period_complete(), but is delayed in the function
PEM> preamble.
PEM>
PEM> o CPU 0 sees that the mask is empty, so invokes
PEM> rcu_grace_period_complete(), ending the grace period.
PEM> Because the RCU_NEXT_PENDING is set, it also starts
PEM> a new grace period.
PEM>
PEM> o CPU 1 continues in rcu_grace_period_complete(), incorrectly
PEM> ending the new grace period.
PEM>
PEM> Or am I missing something here?
The scenario you describe seems possible. However, it should be easily fixed
by passing the perceived batch number as another parameter to rcu_set_state()
and making it part of the cmpxchg. So if the caller tries to set state bits
on a stale batch number (e.g., batch != rcu_batch), it can be detected.
There is a similar, although harmless, issue in call_rcu(): Two CPUs can
concurrently add callbacks to their respective nxt list and compute the same
value for nxtbatch. One CPU succeeds in setting the PENDING bit while
observing COMPLETE to be clear, so it starts a new batch. Afterwards, the
other CPU also sets the PENDING bit, but this time for the next batch. So
it ends up requesting nxtbatch+1, although there is no need to. This also
would be fixed by making the batch number part of the cmpxchg.
Cheers,
- Udo
Download attachment "signature.asc" of type "application/pgp-signature" (199 bytes)
Powered by blists - more mailing lists