[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110313055627.GW2234@linux.vnet.ibm.com>
Date: Sat, 12 Mar 2011 21:56:27 -0800
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Joe Korty <joe.korty@...r.com>
Cc: Frederic Weisbecker <fweisbec@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Lai Jiangshan <laijs@...fujitsu.com>,
"mathieu.desnoyers@...icios.com" <mathieu.desnoyers@...icios.com>,
"dhowells@...hat.com" <dhowells@...hat.com>,
"loic.minier@...aro.org" <loic.minier@...aro.org>,
"dhaval.giani@...il.com" <dhaval.giani@...il.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"josh@...htriplett.org" <josh@...htriplett.org>,
"houston.jim@...cast.net" <houston.jim@...cast.net>,
"corbet@....net" <corbet@....net>
Subject: Re: JRCU Theory of Operation
On Sat, Mar 12, 2011 at 07:43:36PM -0500, Joe Korty wrote:
> On Sat, Mar 12, 2011 at 09:36:29AM -0500, Paul E. McKenney wrote:
> > On Thu, Mar 10, 2011 at 02:50:45PM -0500, Joe Korty wrote:
> >>
> >> A longer answer, on a slighly expanded topic, goes as follows. The heart
> >> of jrcu is in this (slighly edited) line,
> >>
> >> rcu_data[cpu].wait = preempt_count_cpu(cpu) > idle_cpu(cpu);
> >
> > So, if we are idle, then the preemption count must be 2 or greater
> > to make the current grace period wait on a CPU. But if we are not
> > idle, then the preemption count need only be 1 or greater to make
> > the current grace period wait on a CPU.
> >
> > But why do should an idle CPU block the current RCU grace period
> > in any case? The idle loop is defined to be a quiescent state
> > for rcu_sched. (Not that permitting RCU read-side critical sections
> > in the idle loop would be a bad thing, as long as the associated
> > pitfalls were all properly avoided.)
>
> Amazingly enough, the base preemption level for idle is '1', not '0'.
> This suprised me deeply, but on reflection it made sense. When idle
> needs to be preempted, there is no need to actually preempt it .. one
> just kick starts it and it will go execute the schedule for you.
Ah, got it, thank you!
> >> Here, the garbage collector is making an attempt to deduce, at the
> >> start of the current batch, whether or not some cpu is executing code
> >> in a quiescent region. If it is, then that cpu's wait state can be set
> >> to zero right away -- we don't have to wait for that cpu to execute a
> >> quiescent point tap later on to discover that fact. This nicely covers
> >> the user app and idle cpu situations discussed above.
> >>
> >> Now, we all know that fetching the preempt_count of some process running on
> >> another cpu is guaranteed to return a stale (obsolete) value, and may even
> >> be dangerous (pointers are being followed after all). Putting aside the
> >> question of safety, for now, leaves us with a trio of questions: are there
> >> times when this inherently unstable value is in fact stable and useful?
> >> When it is not stable, is that fact relevant or irrelevant to the correct
> >> operation of jrcu? And finally, does the fact that we cannot tell when
> >> it is stable and when it is not, also relevant?
> >
> > And there is also the ordering of the preempt_disable() and the accesses
> > within the critical section... Just because you recently saw a quiescent
> > state doesn't mean that the preceding critical section has completed --
> > even x86 is happy to spill stores out of a critical section ended by
> > preempt_enable. If one of those stores is to an RCU protected
> > data structure, you might end up freeing the structure before the
> > store completed.
> >
> > Or is the idea that you would wait 50 milliseconds after detecting
> > the quiescent state before invoking the corresponding RCU callbacks?
>
> Yep.
OK.
> > I am missing how ->which switching is safe, given the possibility of
> > access from other CPUs.
>
> JRCU allows writes to continue through the old '->which'
> value for a period of time. All it requires is that
> within 50 msecs that the writes have ceased and that
> the writing cpu has executed a smp_wmb() and the effects
> of the smp_wmb() have propagated throughout the system.
>
> Even though I keep saying 50msecs for everything, I
> suspect that the Q switching meets all the above quiescent
> requirements in a few tens of microseconds. Thus even
> a 1 msec JRCU sampling period is expected to be safe,
> at least in regard to Q switching.
I would feel better about this is the CPU vendors were willing to give
an upper bound...
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists