linux-kernel - Re: [RFT][PATCH v5 7/7] cpuidle: menu: Avoid selecting shallow states with stopped tick

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <abc9a820-3d67-3269-f1c2-bb205c97763b@tu-dresden.de>
Date:   Mon, 19 Mar 2018 13:47:16 +0100
From:   Thomas Ilsche <thomas.ilsche@...dresden.de>
To:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        "Frederic Weisbecker" <fweisbec@...il.com>
CC:     Thomas Gleixner <tglx@...utronix.de>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        Doug Smythies <dsmythies@...us.net>,
        "Rik van Riel" <riel@...riel.com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        "Mike Galbraith" <mgalbraith@...e.de>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFT][PATCH v5 7/7] cpuidle: menu: Avoid selecting shallow states
 with stopped tick

On 2018-03-15 23:19, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> 
> If the scheduler tick has been stopped already and the governor
> selects a shallow idle state, the CPU can spend a long time in that
> state if the selection is based on an inaccurate prediction of idle
> time.  That effect turns out to be noticeable, so it needs to be
> mitigated.

What are some common causes for that situation?
How could I trigger this for testing?

> To that end, modify the menu governor to discard the result of the
> idle time prediction if the tick is stopped and the predicted idle
> time is less than the tick period length, unless the tick timer is
> going to expire soon.

This seems dangerous. Using a C-state that is too deep could be
problematic for soft latency, caches and overall energy.

Would it be viable to re-enable the sched tick to act as a fallback?
Generally, would it be feasible to modify the upcoming sched tick
timer to be a better time for a fallback wakeup in certain situations?

> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
> 
> v4 -> v5:
>    * Rebase on top of the new [1-6/7].
>    * Never use the interactivity factor when the tick is stopped.
> 
> ---
>   drivers/cpuidle/governors/menu.c |   29 ++++++++++++++++++++++-------
>   1 file changed, 22 insertions(+), 7 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -353,13 +353,28 @@ static int menu_select(struct cpuidle_dr
>   	 */
>   	data->predicted_us = min(data->predicted_us, expected_interval);
>   
> -	/*
> -	 * Use the performance multiplier and the user-configurable
> -	 * latency_req to determine the maximum exit latency.
> -	 */
> -	interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> -	if (latency_req > interactivity_req)
> -		latency_req = interactivity_req;
> +	if (tick_nohz_tick_stopped()) {
> +		/*
> +		 * If the tick is already stopped, the cost of possible short
> +		 * idle duration misprediction is much higher, because the CPU
> +		 * may be stuck in a shallow idle state for a long time as a
> +		 * result of it.  In that case say we might mispredict and try
> +		 * to force the CPU into a state for which we would have stopped
> +		 * the tick, unless the tick timer is going to expire really
> +		 * soon anyway.
> +		 */
> +		if (data->predicted_us < TICK_USEC_HZ)
> +			data->predicted_us = min_t(unsigned int, TICK_USEC_HZ,
> +						   ktime_to_us(tick_time));

This applies to the heuristic (expected_interval) and the (heuristically
corrected) next timer. Should this modification be applied only to the
expected_interval under the assumption that the next_timer_us * correction
is never totally wrong.

> +	} else {
> +		/*
> +		 * Use the performance multiplier and the user-configurable
> +		 * latency_req to determine the maximum exit latency.
> +		 */
> +		interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> +		if (latency_req > interactivity_req)
> +			latency_req = interactivity_req;
> +	}
>   
>   	expected_interval = data->predicted_us;
>   	/*
>