lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1872489.HpEenvS50o@vostro.rjw.lan>
Date:	Sun, 03 Jan 2016 01:56:31 +0100
From:	"Rafael J. Wysocki" <rjw@...ysocki.net>
To:	Chen Yu <yu.c.chen@...el.com>
Cc:	viresh.kumar@...aro.org, lenb@...nel.org,
	srinivas.pandruvada@...ux.intel.com, rui.zhang@...el.com,
	linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC

On Wednesday, December 16, 2015 12:20:29 PM Chen Yu wrote:
> It is reported that, with CONFIG_HZ_PERIODIC=y cpu stays at the
> lowest frequency even if the usage goes to 100%, neither ondemand
> nor conservative governor works, however performance and
> userspace work as expected. If set with CONFIG_NO_HZ_FULL=y,
> everything goes well.
> 
> This problem is caused by improper calculation of the idle_time
> when the load is extremely high(near 100%). Firstly, cpufreq_governor
> uses get_cpu_idle_time to get the total idle time for specific cpu, then:
> 
> 1.If the system is configured with CONFIG_NO_HZ_FULL, the idle time is
>   returned by ktime_get, which is always increasing, it's OK.
> 2.However, if the system is configured with CONFIG_HZ_PERIODIC,
>   get_cpu_idle_time might not guarantee to be always increasing,
>   because it will leverage get_cpu_idle_time_jiffy to calculate the
>   idle_time, consider the following scenario:
> 
> At T1:
> idle_tick_1 = total_tick_1 - user_tick_1
> 
> sample period(80ms)...
> 
> At T2: ( T2 = T1 + 80ms):
> idle_tick_2 = total_tick_2 - user_tick_2
> 
> Currently the algorithm is using (idle_tick_2 - idle_tick_1) to
> get the delta idle_time during the past sample period, however
> it CAN NOT guarantee that idle_tick_2 >= idle_tick_1, especially
> when cpu load is high.
> (Yes, total_tick_2 >= total_tick_1, and user_tick_2 >= user_tick_1,
> but how about idle_tick_2 and idle_tick_1? No guarantee.)
> So governor might get a negative value of idle_time during the past
> sample period, which might mislead the system that the idle time is
> very big(converted to unsigned int), and the busy time is nearly zero,
> which causes the governor to always choose the lowest cpufreq,
> then cause this problem.
> 
> In theory there are two solutions:
> 
> 1.The logic should not rely on the idle tick during every sample period,
>   but be based on the busy tick directly, as this is how 'top' is
>   implemented.
> 
> 2.Or the logic must make sure that the idle_time is strictly increasing
>   during each sample period, then there would be no negative idle_time
>   anymore. This solution requires minimum modification to current code
>   and this patch uses method 2.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821
> Reported-by: Jan Fikar <j.fikar@...il.com>
> Signed-off-by: Chen Yu <yu.c.chen@...el.com>

Applied, thanks!

> ---
>  drivers/cpufreq/cpufreq_governor.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index b260576..363445f 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -84,6 +84,9 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>  			(cur_wall_time - j_cdbs->prev_cpu_wall);
>  		j_cdbs->prev_cpu_wall = cur_wall_time;
>  
> +		if (cur_idle_time < j_cdbs->prev_cpu_idle)
> +			cur_idle_time = j_cdbs->prev_cpu_idle;
> +
>  		idle_time = (unsigned int)
>  			(cur_idle_time - j_cdbs->prev_cpu_idle);
>  		j_cdbs->prev_cpu_idle = cur_idle_time;
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ