lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140813182030.GD4752@linux.vnet.ibm.com>
Date:	Wed, 13 Aug 2014 11:20:30 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org, mingo@...nel.org,
	laijs@...fujitsu.com, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
	josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
	dhowells@...hat.com, edumazet@...gle.com, dvhart@...ux.intel.com,
	fweisbec@...il.com, oleg@...hat.com, bobby.prani@...il.com,
	rafael@...nel.org
Subject: Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle
 tasks

On Wed, Aug 13, 2014 at 04:42:19PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 13, 2014 at 07:12:17AM -0700, Paul E. McKenney wrote:
> > > That's not an excuse for doing horrible things. And inventing new infra
> > > that needs to wake all CPUs is horrible.
> > 
> > Does your patch even work? 
> 
> Haven't even tried compiling it, but making it work shouldn't be too
> hard.
> 
> > Looks like it should, and yes, the idle loop
> > seems quite a bit simpler than it was a few years ago, but we really
> > don't need some strange thing that leaves a CPU idle but not visible as
> > such to RCU.
> 
> There's slightly more to it though; things like the x86 mwait idle wait
> functions tend to do far too much; for instance look at:
> 
> drivers/idle/intel_idle.c:intel_idle()

OK, let's see if I can follow the idly bouncing ball.

Looks like most arches call cpu_startup_entry(), which calls
arch_cpu_idle_prepare() followed by cpu_idle_loop().

arch_cpu_idle_prepare() is local_fiq_enable() on ARM and empty elsewhere.

cpu_idle_loop() does tick_nohz_idle_enter(), which does some nohz stuff.
It also checks for offline, and invokes arch_cpu_idle_enter(), which
is empty except on x86 and ARM.  On ARM, it messes with LEDs, and on
x86 it appears to disable an NMI-based watchdog timer.
Interrupts are disabled, and we do either cpu_idle_poll() if specified
or cpuidle_idle_call().  cpu_idle_poll() is the old-time idle loop,
with rcu_idle_enter()/_exit() and enabling interrupts prior to spinning.
No sign of stop_critical_timings(), though.  So let's not bury
rcu_idle_enter() in stop_critical_timings().

cpuidle_idle_call() does a fastpath irq-enable/exit if need-resched,
then does stop_critical_timings() and rcu_idle_enter().  Then we
have the buried complexity with cpuidle_select(), but a negative
return says to check need-resched and enable interrupts or to
invoke arch_cpu_idle(), which executes various sleep instructions
on various architectures.  Some notable variants:

o	ARM has an arm_pm_idle() function pointer so that different
	SoCs can have different idle-power-down sequences.  Alternatively,
	some SoCs have functions named CPUNAME_do_idle(), for example,
	imx5_cpu_do_idle().  These seem to invoke processor.do_idle().

	Pushing rcu_idle_enter() and rcu_idle_exit() down below
	arch_cpu_idle() on ARM looks to be asking for it.

o	ARM64 does cpu_do_idle, which does a wait-for-interrupt instruction.

o	AVR calls into assembly, hand-coding the need-resched check.
	Not sure why that would still be needed.

o	CRIS has one function that just enables interrupts and returns,
	and anohter that enables interrupts and halts.

o	UM times the duration of the idle time based on what appears to
	be the time until the next event.

o	Unicore does a string of what appear to be no-ops.

o	x86 does a couple of levels of indirection, one being the
	x86_idle() function pointer and another being the safe_halt()
	function pointer.

	amd_e400_idle() is a bit ornate, but still does default_idle()
	which wrappers the safe_halt() pointer.

And various other architectures seem to work similarly, but lots of
hair here.  So Steven, you OK with the underlying arch_cpu_idle()
functions being off-limits to tracing?

Now, if cpuidle_select() returns non-negative, we are dealing with
the CPU-idle governor, which is invoked at the later cpuidle_enter().

Hmmm...  On the CPU-idle drivers...

o	apm_idle_driver puts the idle loop into the ->enter() function,
	apm_cpu_idle().

o	ACPI puts the idle loop in acpi_idle_do_entry(), and does call
	stop_critical_timings(), but not rcu_idle_enter().
	So presumably stop_critical_timings() can nest?  Not clear
	from the code.

o	The CPS driver is even stranger...  Is cps_gen_entry_code()
	really depositing assembly instructions into a buffer that is
	passed back as a function?

o	The intel_idle driver is the one with mwait_idle_with_hints(),
	so you covered it below.

Your patch covers the cpuidle_enter() transition, which means
that functions like cpuidle_enter(), acpi_idle_enter_c1(), and
acpi_idle_do_entry() would be off-limits to trampolining.  In the case
of CPS, quite a bit of code.

> We should push the rcu_idle_{enter,exit}() down to around
> mwait_idle_with_hints(), so we don't call half the word with RCU
> disabled.

That would be for the intel_idle.c CPU-idle driver.  The other drivers
also need rcu_idle_{enter,exit}().

> > I have already said that I will be happy to rip out the wakeup code
> > when it is no longer needed, and I agree that it would be way better if
> > not needed.
> 
> I'd prefer to dtrt now and not needing to fix it later.

Once it works, I might consider it "right" and adjust accordingly.
At the moment, speculation.

> Auditing all idle functions will be somewhat of a pain, but its entirely
> doable. Looking at this stuff, it appears we can clean it up massively;
> see how the generic cpuidle code already has the broadcast logic in, so
> we can remove that from the drivers by setting the right flags.

There is certainly quite a bit of hair in a number of these drivers,
no two ways about it.

> We can similarly pull out the leave_mm() call by adding a
> CPUIDLE_FLAG_TLB_FLUSH. At which point all we'd need to do is mark the
> intel_idle (and all other cpuidle_state::enter functions with __notrace.

That one seems to be specific to intel_idle.  But yes, nice to avoid
waking an idle CPU for TLB flushes.

> > But I won't base a patch on hypotheticals.  You have already
> > drawn way too much water from -that- well over the past years!  ;-)
> 
> not entirely sure what you're referring to there ;-)

Heh!

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ