linux-kernel - Re: [PATCH 2/4] timer: relax tick stop in idle entry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151116232640.GM5184@linux.vnet.ibm.com>
Date:	Mon, 16 Nov 2015 15:26:40 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Josh Triplett <josh@...htriplett.org>
Cc:	Jacob Pan <jacob.jun.pan@...ux.intel.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	John Stultz <john.stultz@...aro.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
	Len Brown <len.brown@...el.com>,
	Rafael Wysocki <rafael.j.wysocki@...el.com>,
	Eduardo Valentin <edubezval@...il.com>,
	Paul Turner <pjt@...gle.com>
Subject: Re: [PATCH 2/4] timer: relax tick stop in idle entry

On Mon, Nov 16, 2015 at 02:32:11PM -0800, Josh Triplett wrote:
> On Mon, Nov 16, 2015 at 01:51:26PM -0800, Jacob Pan wrote:
> > On Mon, 16 Nov 2015 16:06:57 +0100 (CET)
> > Thomas Gleixner <tglx@...utronix.de> wrote:
> > 
> > > >           <idle>-0     [000]    30.093474: bprint:
> > > > __tick_nohz_idle_enter: JPAN: tick_nohz_stop_sched_tick 609 delta
> > > > 1000000 [JP] but sees delta is exactly 1 tick away. didn't stop
> > > > tick.  
> > > 
> > > If the delta is 1 tick then it is not supposed to stop it. Did you
> > > ever try to figure out WHY it is 1 tick?
> > > 
> > > There are two code pathes which can set it to basemono + TICK_NSEC:
> > > 
> > >         if (rcu_needs_cpu(basemono, &next_rcu) ||
> > >             arch_needs_cpu() || irq_work_needs_cpu()) {
> > >                 next_tick = basemono + TICK_NSEC;
> > >         } else {
> > >                 next_tmr = get_next_timer_interrupt(basejiff,
> > > basemono); ts->next_timer = next_tmr;
> > >                 /* Take the next rcu event into account */
> > >                 next_tick = next_rcu < next_tmr ? next_rcu : next_tmr;
> > >         }
> > > 
> > > Can you please figure out WHY the tick is requested to continue
> > > instead of blindly wreckaging the logic in that code?
> > 
> > Looks like the it hits in both cases during forced idle.
> > + Josh
> > + Paul
> > 
> > For the first case, it is always related to RCU. I found there are two
> > CONFIG options to avoid this undesired tick in idle loop.
> > 1. enable CONFIG_RCU_NOCB_CPU_ALL, offload to orcu kthreads
> > 2. or enable CONFIG_RCU_FAST_NO_HZ (enter dytick idle w/ rcu callback)
> > 
> > Either one works but my concern is that users may not realize the
> > intricate CONFIG_ options and how they translate into energy savings.
> > Consulted with Josh, it seems we could add a check here to recognize
> > the forced idle state and relax rcu_needs_cpu() to return false even it
> > has callbacks. Since we are blocking everybody for a short time (5 ticks
> > default). It should not impact synchronize and kfree rcu.
> 
> Right; as long as you're blocking *everybody*, and RCU priority boosting
> doesn't come into play (meaning a real-time task is waiting on RCU
> callbacks), then I don't see any harm in blocking RCU callbacks for a
> while.  You'd block completion of synchronize_rcu() and similar, as well
> as memory reclamation, but since you've blocked *every* CPU systemwide
> then that doesn't cause a problem.

True enough.  But how does RCU distinguish between this being a
normal idle cycle that might last indefinitely on the one hand and the
five-jiffy system-wide throttling on the other?  OK, maybe there is a
global variable that says that the just-now-starting idle period is
system-wide throttling.  But then what about the CPU that just went
idle 10 microseconds ago, and therefore left its timer tick running?
Fine and well, we could IPI it to wake it up and let it see that we
are now doing thermal throttling.  But then we presumably also have to
IPI it at the end of the thermal-throttling interval in order for it to
re-evaluate whether or not it should have the tick going.  :-/

On the one hand, I am sure that all of this can be made to work,
but simply having systems using thermal throttling enable either
CONFIG_RCU_NOCB_CPU_ALL or CONFIG_RCU_FAST_NO_HZ seems -way- simpler.
CONFIG_RCU_FAST_NO_HZ is probably the better choice for generic workloads,
but CONFIG_RCU_NOCB_CPU_ALL is the better choice for embedded workloads
where it is less likely that RCU callbacks will be posted with continuous
wild abandon.

Or am I missing something subtle here?

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/