linux-kernel - Re: [PATCH RFC 3/9] RCU: Preemptible RCU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070921234446.GH9059@linux.vnet.ibm.com>
Date:	Fri, 21 Sep 2007 16:44:46 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-rt-users <linux-rt-users@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>, akpm@...ux-foundation.org,
	dipankar@...ibm.com, josht@...ux.vnet.ibm.com, tytso@...ibm.com,
	dvhltc@...ibm.com, Thomas Gleixner <tglx@...utronix.de>,
	bunk@...nel.org, ego@...ibm.com, oleg@...sign.ru
Subject: Re: [PATCH RFC 3/9] RCU: Preemptible RCU

On Fri, Sep 21, 2007 at 07:23:09PM -0400, Steven Rostedt wrote:
> --
> On Fri, 21 Sep 2007, Paul E. McKenney wrote:
> 
> > If you do a synchronize_rcu() it might well have to wait through the
> > following sequence of states:
> >
> > Stage 0: (might have to wait through part of this to get out of "next" queue)
> > 	rcu_try_flip_idle_state,	/* "I" */
> > 	rcu_try_flip_waitack_state, 	/* "A" */
> > 	rcu_try_flip_waitzero_state,	/* "Z" */
> > 	rcu_try_flip_waitmb_state	/* "M" */
> > Stage 1:
> > 	rcu_try_flip_idle_state,	/* "I" */
> > 	rcu_try_flip_waitack_state, 	/* "A" */
> > 	rcu_try_flip_waitzero_state,	/* "Z" */
> > 	rcu_try_flip_waitmb_state	/* "M" */
> > Stage 2:
> > 	rcu_try_flip_idle_state,	/* "I" */
> > 	rcu_try_flip_waitack_state, 	/* "A" */
> > 	rcu_try_flip_waitzero_state,	/* "Z" */
> > 	rcu_try_flip_waitmb_state	/* "M" */
> > Stage 3:
> > 	rcu_try_flip_idle_state,	/* "I" */
> > 	rcu_try_flip_waitack_state, 	/* "A" */
> > 	rcu_try_flip_waitzero_state,	/* "Z" */
> > 	rcu_try_flip_waitmb_state	/* "M" */
> > Stage 4:
> > 	rcu_try_flip_idle_state,	/* "I" */
> > 	rcu_try_flip_waitack_state, 	/* "A" */
> > 	rcu_try_flip_waitzero_state,	/* "Z" */
> > 	rcu_try_flip_waitmb_state	/* "M" */
> >
> > So yes, grace periods do indeed have some latency.
> 
> Yes they do. I'm now at the point that I'm just "trusting" you that you
> understand that each of these stages are needed. My IQ level only lets me
> understand next -> wait -> done, but not the extra 3 shifts in wait.
> 
> ;-)

In the spirit of full disclosure, I am not -absolutely- certain that
they are needed, only that they are sufficient.  Just color me paranoid.

> > > True, but the "me" confused me. Since that task struct is not me ;-)
> >
> > Well, who is it, then?  ;-)
> 
> It's the app I watch sitting there waiting it's turn for it's callback to
> run.

:-)

> > > > > Isn't the GP detection done via a tasklet/softirq. So wouldn't a
> > > > > local_bh_disable be sufficient here? You already cover NMIs, which would
> > > > > also handle normal interrupts.
> > > >
> > > > This is also my understanding, but I think this disable is an
> > > > 'optimization' in that it avoids the regular IRQs from jumping through
> > > > these hoops outlined below.
> > >
> > > But isn't disabling irqs slower than doing a local_bh_disable? So the
> > > majority of times (where irqs will not happen) we have this overhead.
> >
> > The current code absolutely must exclude the scheduling-clock hardirq
> > handler.
> 
> ACKed,
> The reasoning you gave in Peter's reply most certainly makes sense.
> 
> > > > > > + *
> > > > > > + * If anyone is nuts enough to run this CONFIG_PREEMPT_RCU implementation
> > > > >
> > > > > Oh, come now! It's not "nuts" to use this ;-)
> > > > >
> > > > > > + * on a large SMP, they might want to use a hierarchical organization of
> > > > > > + * the per-CPU-counter pairs.
> > > > > > + */
> > > >
> > > > Its the large SMP case that's nuts, and on that I have to agree with
> > > > Paul, its not really large SMP friendly.
> > >
> > > Hmm, that could be true. But on large SMP systems, you usually have a
> > > large amounts of memory, so hopefully a really long synchronize_rcu
> > > would not be a problem.
> >
> > Somewhere in the range from 64 to a few hundred CPUs, the global lock
> > protecting the try_flip state machine would start sucking air pretty
> > badly.  But the real problem is synchronize_sched(), which loops through
> > all the CPUs --  this would likely cause problems at a few tens of
> > CPUs, perhaps as early as 10-20.
> 
> hehe, From someone who's largest box is 4 CPUs, to me 16 CPUS is large.
> But I can see hundreds, let alone thousands of CPUs would make a huge
> grinding halt on things like synchronize_sched. God, imaging if all CPUs
> did that approximately at the same time. The system would should a huge
> jitter.

Well, the first time the SGI guys tried to boot a 1024-CPU Altix, I got
an email complaining about RCU overheads.  ;-)  Manfred Spraul fixed
things up for them, though.

							Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/