[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110313004336.GA14518@tsunami.ccur.com>
Date: Sat, 12 Mar 2011 19:43:36 -0500
From: Joe Korty <joe.korty@...r.com>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Lai Jiangshan <laijs@...fujitsu.com>,
"mathieu.desnoyers@...icios.com" <mathieu.desnoyers@...icios.com>,
"dhowells@...hat.com" <dhowells@...hat.com>,
"loic.minier@...aro.org" <loic.minier@...aro.org>,
"dhaval.giani@...il.com" <dhaval.giani@...il.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"josh@...htriplett.org" <josh@...htriplett.org>,
"houston.jim@...cast.net" <houston.jim@...cast.net>,
"corbet@....net" <corbet@....net>
Subject: Re: JRCU Theory of Operation
On Sat, Mar 12, 2011 at 09:36:29AM -0500, Paul E. McKenney wrote:
> On Thu, Mar 10, 2011 at 02:50:45PM -0500, Joe Korty wrote:
>>
>> A longer answer, on a slighly expanded topic, goes as follows. The heart
>> of jrcu is in this (slighly edited) line,
>>
>> rcu_data[cpu].wait = preempt_count_cpu(cpu) > idle_cpu(cpu);
>
> So, if we are idle, then the preemption count must be 2 or greater
> to make the current grace period wait on a CPU. But if we are not
> idle, then the preemption count need only be 1 or greater to make
> the current grace period wait on a CPU.
>
> But why do should an idle CPU block the current RCU grace period
> in any case? The idle loop is defined to be a quiescent state
> for rcu_sched. (Not that permitting RCU read-side critical sections
> in the idle loop would be a bad thing, as long as the associated
> pitfalls were all properly avoided.)
Amazingly enough, the base preemption level for idle is '1', not '0'.
This suprised me deeply, but on reflection it made sense. When idle
needs to be preempted, there is no need to actually preempt it .. one
just kick starts it and it will go execute the schedule for you.
>> Here, the garbage collector is making an attempt to deduce, at the
>> start of the current batch, whether or not some cpu is executing code
>> in a quiescent region. If it is, then that cpu's wait state can be set
>> to zero right away -- we don't have to wait for that cpu to execute a
>> quiescent point tap later on to discover that fact. This nicely covers
>> the user app and idle cpu situations discussed above.
>>
>> Now, we all know that fetching the preempt_count of some process running on
>> another cpu is guaranteed to return a stale (obsolete) value, and may even
>> be dangerous (pointers are being followed after all). Putting aside the
>> question of safety, for now, leaves us with a trio of questions: are there
>> times when this inherently unstable value is in fact stable and useful?
>> When it is not stable, is that fact relevant or irrelevant to the correct
>> operation of jrcu? And finally, does the fact that we cannot tell when
>> it is stable and when it is not, also relevant?
>
> And there is also the ordering of the preempt_disable() and the accesses
> within the critical section... Just because you recently saw a quiescent
> state doesn't mean that the preceding critical section has completed --
> even x86 is happy to spill stores out of a critical section ended by
> preempt_enable. If one of those stores is to an RCU protected
> data structure, you might end up freeing the structure before the
> store completed.
>
> Or is the idea that you would wait 50 milliseconds after detecting
> the quiescent state before invoking the corresponding RCU callbacks?
Yep.
> I am missing how ->which switching is safe, given the possibility of
> access from other CPUs.
JRCU allows writes to continue through the old '->which'
value for a period of time. All it requires is that
within 50 msecs that the writes have ceased and that
the writing cpu has executed a smp_wmb() and the effects
of the smp_wmb() have propagated throughout the system.
Even though I keep saying 50msecs for everything, I
suspect that the Q switching meets all the above quiescent
requirements in a few tens of microseconds. Thus even
a 1 msec JRCU sampling period is expected to be safe,
at least in regard to Q switching.
Regards,
Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists