[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101108031936.63a13ff2@laptop.hypervisor.org>
Date: Mon, 8 Nov 2010 03:19:36 +0100
From: "Udo A. Steinberg" <udo@...ervisor.org>
To: paulmck@...ux.vnet.ibm.com
Cc: Joe Korty <joe.korty@...r.com>, fweisbec@...il.com,
mathieu.desnoyers@...icios.com, dhowells@...hat.com,
loic.minier@...aro.org, dhaval.giani@...il.com, tglx@...utronix.de,
peterz@...radead.org, linux-kernel@...r.kernel.org,
josh@...htriplett.org
Subject: Re: [PATCH] a local-timer-free version of RCU
On Mon, 8 Nov 2010 03:11:36 +0100 Udo A. Steinberg (UAS) wrote:
UAS> On Sat, 6 Nov 2010 12:28:12 -0700 Paul E. McKenney (PEM) wrote:
UAS>
UAS> PEM> > + * rcu_quiescent() is called from rcu_read_unlock() when a
UAS> PEM> > + * RCU batch was started while the rcu_read_lock/rcu_read_unlock
UAS> PEM> > + * critical section was executing.
UAS> PEM> > + */
UAS> PEM> > +
UAS> PEM> > +void rcu_quiescent(int cpu)
UAS> PEM> > +{
UAS> PEM>
UAS> PEM> What prevents two different CPUs from calling this concurrently?
UAS> PEM> Ah, apparently nothing -- the idea being that
UAS> PEM> rcu_grace_period_complete() sorts it out. Though if the second
UAS> PEM> CPU was delayed, it seems like it might incorrectly end a
UAS> PEM> subsequent grace period as follows:
UAS> PEM>
UAS> PEM> o CPU 0 clears the second-to-last bit.
UAS> PEM>
UAS> PEM> o CPU 1 clears the last bit.
UAS> PEM>
UAS> PEM> o CPU 1 sees that the mask is empty, so invokes
UAS> PEM> rcu_grace_period_complete(), but is delayed in the function
UAS> PEM> preamble.
UAS> PEM>
UAS> PEM> o CPU 0 sees that the mask is empty, so invokes
UAS> PEM> rcu_grace_period_complete(), ending the grace period.
UAS> PEM> Because the RCU_NEXT_PENDING is set, it also starts
UAS> PEM> a new grace period.
UAS> PEM>
UAS> PEM> o CPU 1 continues in rcu_grace_period_complete(),
UAS> PEM> incorrectly ending the new grace period.
UAS> PEM>
UAS> PEM> Or am I missing something here?
UAS>
UAS> The scenario you describe seems possible. However, it should be easily
UAS> fixed by passing the perceived batch number as another parameter to
UAS> rcu_set_state() and making it part of the cmpxchg. So if the caller
UAS> tries to set state bits on a stale batch number (e.g., batch !=
UAS> rcu_batch), it can be detected.
UAS>
UAS> There is a similar, although harmless, issue in call_rcu(): Two CPUs can
UAS> concurrently add callbacks to their respective nxt list and compute the
UAS> same value for nxtbatch. One CPU succeeds in setting the PENDING bit
UAS> while observing COMPLETE to be clear, so it starts a new batch.
Correction: while observing COMPLETE to be set!
UAS> Afterwards, the other CPU also sets the PENDING bit, but this time for
UAS> the next batch. So it ends up requesting nxtbatch+1, although there is
UAS> no need to. This also would be fixed by making the batch number part of
UAS> the cmpxchg.
Cheers,
- Udo
Download attachment "signature.asc" of type "application/pgp-signature" (199 bytes)
Powered by blists - more mailing lists