linux-kernel - Re: [PATCH tip/core/rcu 6/7] rcu: Drive quiescent-state-forcing delay from HZ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130413195336.GA14799@leaf>
Date:	Sat, 13 Apr 2013 12:53:36 -0700
From:	Josh Triplett <josh@...htriplett.org>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
	dipankar@...ibm.com, akpm@...ux-foundation.org,
	mathieu.desnoyers@...ymtl.ca, niv@...ibm.com, tglx@...utronix.de,
	peterz@...radead.org, rostedt@...dmis.org, Valdis.Kletnieks@...edu,
	dhowells@...hat.com, edumazet@...gle.com, darren@...art.com,
	fweisbec@...il.com, sbw@....edu
Subject: Re: [PATCH tip/core/rcu 6/7] rcu: Drive quiescent-state-forcing
 delay from HZ

On Sat, Apr 13, 2013 at 12:34:25PM -0700, Paul E. McKenney wrote:
> On Sat, Apr 13, 2013 at 11:18:00AM -0700, Josh Triplett wrote:
> > On Fri, Apr 12, 2013 at 11:38:04PM -0700, Paul E. McKenney wrote:
> > > On Fri, Apr 12, 2013 at 04:54:02PM -0700, Josh Triplett wrote:
> > > > On Fri, Apr 12, 2013 at 04:19:13PM -0700, Paul E. McKenney wrote:
> > > > > From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
> > > > > 
> > > > > Systems with HZ=100 can have slow bootup times due to the default
> > > > > three-jiffy delays between quiescent-state forcing attempts.  This
> > > > > commit therefore auto-tunes the RCU_JIFFIES_TILL_FORCE_QS value based
> > > > > on the value of HZ.  However, this would break very large systems that
> > > > > require more time between quiescent-state forcing attempts.  This
> > > > > commit therefore also ups the default delay by one jiffy for each
> > > > > 256 CPUs that might be on the system (based off of nr_cpu_ids at
> > > > > runtime, -not- NR_CPUS at build time).
> > > > > 
> > > > > Reported-by: Paul Mackerras <paulus@....ibm.com>
> > > > > Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> > > > 
> > > > Something seems very wrong if RCU regularly hits the fqs code during
> > > > boot; feels like there's some more straightforward solution we're
> > > > missing.  What causes these CPUs to fall under RCU's scrutiny during
> > > > boot yet not actually hit the RCU codepaths naturally?
> > > 
> > > The problem is that they are running HZ=100, so that RCU will often
> > > take 30-60 milliseconds per grace period.  At that point, you only
> > > need 16-30 grace periods to chew up a full second, so it is not all
> > > that hard to eat up the additional 8-12 seconds of boot time that
> > > they were seeing.  IIRC, UP boot was costing them 4 seconds.
> > > 
> > > For HZ=1000, this would translate to 800ms to 1.2s, which is nowhere
> > > near as annoying.
> > 
> > That raises two questions, though.  First, who calls synchronize_rcu()
> > repeatedly during boot, and could they call call_rcu() instead to avoid
> > blocking for an RCU grace period?  Second, why does RCU need 3-6 jiffies
> > to resolve a grace period during boot?  That suggests that RCU doesn't
> > actually resolve a grace period until the force-quiescent-state
> > machinery kicks in, meaning that the normal quiescent-state mechanism
> > didn't work.
> 
> Indeed, converting synchronize_rcu() to call_rcu() might also be
> helpful.  The reason that RCU often does not resolve grace periods until
> force_quiescent_state() is that it is often the case during boot that
> all but one CPU is idle.  RCU tries hard to avoid waking up idle CPUs,
> so it must scan them.  Scanning is relatively expensive, so there is
> reason to wait.

How are those CPUs going idle without first telling RCU that they're
quiesced?  Seems like, during boot at least, you want RCU to use its
idle==quiesced logic to proactively note continuously-quiescent states.
Ideally, you should not hit the FQS code at all during boot.

> One thing that could be done would be to scan immediately during boot,
> and then back off once boot has completed.  Of course, RCU has no idea
> when boot has completed, but one way to get this effect is to boot
> with rcutree.jiffies_till_first_fqs=0, and then use sysfs to set it
> to 3 once boot has completed.

What do you mean by "boot has completed" here?  The kernel's early
initialization, the kernel's initialization up to running /sbin/init, or
userspace initialization up through supporting user login?

In any case, I don't think it makes sense to do this with FQS.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/