linux-kernel - Re: JRCU Theory of Operation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110313055627.GW2234@linux.vnet.ibm.com>
Date:	Sat, 12 Mar 2011 21:56:27 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Joe Korty <joe.korty@...r.com>
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	"mathieu.desnoyers@...icios.com" <mathieu.desnoyers@...icios.com>,
	"dhowells@...hat.com" <dhowells@...hat.com>,
	"loic.minier@...aro.org" <loic.minier@...aro.org>,
	"dhaval.giani@...il.com" <dhaval.giani@...il.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"josh@...htriplett.org" <josh@...htriplett.org>,
	"houston.jim@...cast.net" <houston.jim@...cast.net>,
	"corbet@....net" <corbet@....net>
Subject: Re: JRCU Theory of Operation

On Sat, Mar 12, 2011 at 07:43:36PM -0500, Joe Korty wrote:
> On Sat, Mar 12, 2011 at 09:36:29AM -0500, Paul E. McKenney wrote:
> > On Thu, Mar 10, 2011 at 02:50:45PM -0500, Joe Korty wrote:
> >>
> >> A longer answer, on a slighly expanded topic, goes as follows.  The heart
> >> of jrcu is in this (slighly edited) line,
> >>
> >>   rcu_data[cpu].wait = preempt_count_cpu(cpu) > idle_cpu(cpu);
> > 
> > So, if we are idle, then the preemption count must be 2 or greater
> > to make the current grace period wait on a CPU.  But if we are not
> > idle, then the preemption count need only be 1 or greater to make
> > the current grace period wait on a CPU.
> > 
> > But why do should an idle CPU block the current RCU grace period
> > in any case?  The idle loop is defined to be a quiescent state
> > for rcu_sched.  (Not that permitting RCU read-side critical sections
> > in the idle loop would be a bad thing, as long as the associated
> > pitfalls were all properly avoided.)
> 
> Amazingly enough, the base preemption level for idle is '1', not '0'.
> This suprised me deeply, but on reflection it made sense.  When idle
> needs to be preempted, there is no need to actually preempt it .. one
> just kick starts it and it will go execute the schedule for you.

Ah, got it, thank you!

> >> Here, the garbage collector is making an attempt to deduce, at the
> >> start of the current batch, whether or not some cpu is executing code
> >> in a quiescent region.  If it is, then that cpu's wait state can be set
> >> to zero right away -- we don't have to wait for that cpu to execute a
> >> quiescent point tap later on to discover that fact.  This nicely covers
> >> the user app and idle cpu situations discussed above.
> >>
> >> Now, we all know that fetching the preempt_count of some process running on
> >> another cpu is guaranteed to return a stale (obsolete) value, and may even
> >> be dangerous (pointers are being followed after all).  Putting aside the
> >> question of safety, for now, leaves us with a trio of questions: are there
> >> times when this inherently unstable value is in fact stable and useful?
> >> When it is not stable, is that fact relevant or irrelevant to the correct
> >> operation of jrcu? And finally, does the fact that we cannot tell when
> >> it is stable and when it is not, also relevant?
> > 
> > And there is also the ordering of the preempt_disable() and the accesses
> > within the critical section...  Just because you recently saw a quiescent
> > state doesn't mean that the preceding critical section has completed --
> > even x86 is happy to spill stores out of a critical section ended by
> > preempt_enable.  If one of those stores is to an RCU protected
> > data structure, you might end up freeing the structure before the
> > store completed.
> > 
> > Or is the idea that you would wait 50 milliseconds after detecting
> > the quiescent state before invoking the corresponding RCU callbacks?
> 
> Yep.  

OK.

> > I am missing how ->which switching is safe, given the possibility of
> > access from other CPUs.
> 
> JRCU allows writes to continue through the old '->which'
> value for a period of time.  All it requires is that
> within 50 msecs that the writes have ceased and that
> the writing cpu has executed a smp_wmb() and the effects
> of the smp_wmb() have propagated throughout the system.
> 
> Even though I keep saying 50msecs for everything, I
> suspect that the Q switching meets all the above quiescent
> requirements in a few tens of microseconds.  Thus even
> a 1 msec JRCU sampling period is expected to be safe,
> at least in regard to Q switching.

I would feel better about this is the CPU vendors were willing to give
an upper bound...

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/