linux-kernel - Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20170712184617.GZ2393@linux.vnet.ibm.com>
Date:   Wed, 12 Jul 2017 11:46:17 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Frederic Weisbecker <fweisbec@...il.com>,
        Christoph Lameter <cl@...ux.com>,
        "Li, Aubrey" <aubrey.li@...ux.intel.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Aubrey Li <aubrey.li@...el.com>, tglx@...utronix.de,
        len.brown@...el.com, rjw@...ysocki.net, tim.c.chen@...ux.intel.com,
        arjan@...ux.intel.com, yang.zhang.wz@...il.com, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods

On Wed, Jul 12, 2017 at 07:17:56PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 12, 2017 at 08:54:58AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 12, 2017 at 02:22:49PM +0200, Peter Zijlstra wrote:
> > > On Tue, Jul 11, 2017 at 11:09:31AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Jul 11, 2017 at 06:34:22PM +0200, Peter Zijlstra wrote:
> > > > > Also, RCU_FAST_NO_HZ will make a fairly large difference here.. Paul
> > > > > what's the state of that thing, do we actually want that or not?
> > > > 
> > > > If you are battery powered and don't have tight real-time latency
> > > > constraints, you want it -- it has represent a 30-40% boost in battery
> > > > lifetime for some low-utilization battery-powered devices.  Otherwise,
> > > > probably not.
> > > 
> > > Would it make sense to hook that off of tick_nohz_idle_enter(); in
> > > specific the part where we actually stop the tick; instead of every
> > > idle?
> > 
> > The actions RCU takes on RCU_FAST_NO_HZ depend on the current state of
> > the CPU's callback lists, so it seems to me that the decision has to
> > be made on each idle entry.
> > 
> > Now it might be possible to make the checks more efficient, and doing
> > that is on my list.
> > 
> > Or am I missing your point?
> 
> Could be I'm just not remembering how all that works.. But I was
> wondering if we can do the expensive bits if we've decided to actually
> go NOHZ and avoid doing it on every idle entry.
> 
> IIRC the RCU fast NOHZ bits try and flush the callback list (or paw it
> off to another CPU?) such that we can go NOHZ sooner. Having a !empty
> callback list avoid NOHZ from happening.

The code did indeed attempt to flush the callback list back in the day,
but that proved to not actually save any power.  There were several
variations in the meantime, but what it does now is to check to see if
there are callbacks at rcu_needs_cpu() time:

1.	If there are none, RCU tells the caller that it doesn't need
	the CPU.

2.	If there are some, and some of them are non-lazy (as in doing
	something other than just freeing memory), RCU updates its idea
	of which grace period the callbacks are waiting for, otherwise
	leaves the callbacks alone, but returns saying that it needs
	the CPU around four jiffies (by default), but rounded to allow
	one wakeup to handle all CPUs in the power domain.  Use the
	rcu_idle_gp_delay boot/sysfs parameter to adjust the wait
	duration if required.  (I haven't heard of adjustment ever
	being required.)

	Note that a non-lazy callback might well be synchronize_rcu(),
	so we cannot wait too long, or we will be delaying things
	too much.

3.	If there are some callbacks, and all of them are lazy, RCU
	again updates its idea of which grace period the callbacks are
	waiting for, otherwise leaves the callbacks alone, but returns
	saying that it needs the CPU around six seconds (by default)
	in the future, but using round_jiffies(), again to share wakeups
	within a power domain.  Use the rcu_idle_lazy_gp_delay
	boot/sysfs parameter to adjust the wait, and again, as far as
	I know adjustment never has been necessary.

When the CPU is awakened, it will update its callback based on any
grace periods that have elapsed in the meantime.  There is a bit
of work later at rcu_idle_enter() time, but it is quite small.

> Now if we've already decided we can't in fact go NOHZ due to other
> concerns, flushing the callback list is pointless work. So I'm thinking
> we can find a better place to do this.

True, if the tick will still be happening, there is little point
in bothering RCU about it.  And if CPUs tend to go idle with RCU
callbacks, then it would be cheaper to check arch_needs_cpu() and
irq_work_needs_cpu() first.  If CPUs tend to be free of callbacks
when they go idle, this won't help, and might be counterproductive.

But if rcu_needs_cpu() or rcu_prepare_for_idle() is showing up on
profiles, I could adjust things.  This would include making
rcu_prepare_for_idle() no longer expect that rcu_needs_cpu() had
previously been called on the current path to idle.  (Not a big
deal, just that the obvious chnage to tick_nohz_stop_sched_tick()
won't necessarily do what you want.)

So please let me know if rcu_needs_cpu() or rcu_prepare_for_idle() are
prominent contributors to to-idle latency.

							Thanx, Paul