linux-kernel - Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170711160926.GA18805@lerouge>
Date:   Tue, 11 Jul 2017 18:09:27 +0200
From:   Frederic Weisbecker <fweisbec@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Christoph Lameter <cl@...ux.com>
Cc:     "Li, Aubrey" <aubrey.li@...ux.intel.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Aubrey Li <aubrey.li@...el.com>, tglx@...utronix.de,
        len.brown@...el.com, rjw@...ysocki.net, tim.c.chen@...ux.intel.com,
        arjan@...ux.intel.com, paulmck@...ux.vnet.ibm.com,
        yang.zhang.wz@...il.com, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods

On Tue, Jul 11, 2017 at 11:41:57AM +0200, Peter Zijlstra wrote:
> On Tue, Jul 11, 2017 at 12:40:06PM +0800, Li, Aubrey wrote:
> > > On Mon, Jul 10, 2017 at 06:42:06PM +0200, Peter Zijlstra wrote:
> 
> > >> Data to indicate what hurts how much would be a very good addition to
> > >> the Changelogs. Clearly you have some, you really should have shared.
> 
> > In the idle loop,
> > 
> > - quiet_vmstat costs 5562ns - 6296ns
> 
> Urgh, that thing is horrible, also I think its placed wrong. The comment
> near that function says it should be called when we enter NOHZ.
> 
> Which suggests something like so:
> 
> ---
>  kernel/sched/idle.c      | 1 -
>  kernel/time/tick-sched.c | 1 +
>  2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 6c23e30c0e5c..ef63adce0c9c 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -219,7 +219,6 @@ static void do_idle(void)
>  	 */
>  
>  	__current_set_polling();
> -	quiet_vmstat();
>  	tick_nohz_idle_enter();
>  
>  	while (!need_resched()) {
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index c7a899c5ce64..eb0e9753db8f 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -787,6 +787,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  	if (!ts->tick_stopped) {
>  		calc_load_nohz_start();
>  		cpu_load_update_nohz_start();
> +		quiet_vmstat();

This patch seems to make sense. Christoph?

>  
>  		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
>  		ts->tick_stopped = 1;
> 
> 
> > - tick_nohz_idle_enter costs 7058ns - 10726ns
> > - tick_nohz_idle_exit costs 8372ns - 20850ns
> 
> Right, those are horrible expensive, but skipping them isn't 'hard', the
> only tricky bit is finding a condition that makes sense.

Note you can statically disable it with nohz=0 boot parameter.

> 
> See Mike's patch: https://patchwork.kernel.org/patch/2839221/
> 
> Combined with the above, and possibly a better condition, that should
> get rid of most of this.

Such a patch could work well if the decision from the scheduler to not stop the tick
happens on idle entry.

Now if sched_needs_cpu() first allows to stop the tick then refuses it later
in the end of an idle IRQ, this won't have the desired effect. As long as ts->tick_stopped=1,
it stays so until we really restart the tick. So the whole costly nohz machinery stays on.

I guess it doesn't matter though, as we are talking about making fast idle entry so the
decision not to stop the tick is likely to be done once on idle entry, when ts->tick_stopped=0.

One exception though: if the tick is already stopped when we enter idle (full nohz case). And
BTW stopping the tick outside idle shouldn't be concerned here.

So I'd rather put that on can_stop_idle_tick().

> 
> > - totally from arch_cpu_idle_enter entry to arch_cpu_idle_exit return costs
> >   9122ns - 15318ns.
> >   --In this period, rcu_idle_enter costs 1985ns - 2262ns, rcu_idle_exit costs
> >     1813ns - 3507ns
> 
> Is that the POPF being painful? or something else?

Probably that and the atomic_add_return().

Thanks.