[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20100122142454.2767a4e7.akpm@linux-foundation.org>
Date: Fri, 22 Jan 2010 14:24:54 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Mike Chan <mike@...roid.com>
Cc: cpufreq@...r.kernel.org, linux-kernel@...r.kernel.org,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH] cpufreq: stats: Do not account for idle time when
tracking time_in_state
On Tue, 12 Jan 2010 17:15:50 -0800
Mike Chan <mike@...roid.com> wrote:
> Setting ignore_idle to 1 ignores idle time from time_in_state accounting.
>
> Currently cpufreq stats accounts for idle time time_in_state for each
> cpu speed. For cpu's that have a low power idle state this improperly
> accounts for time spent at each speed.
>
> The most relevant case is when the system is idle yet cpu time is still
> accounted for at the lowest speed. This results in heavily skewed statistics
> (towards the lowest speed) which makes these statistics useless when tuning
> cpufreq scaling with cpuidle.
>
Is there somewhere where this new userspace interface should have been
documented?
> +static ssize_t store_ignore_idle(struct cpufreq_policy *policy, char *buf)
> +{
> + int input;
> + if (sscanf(buf, "%d", &input) != 1)
> + return -EINVAL;
> +
> + ignore_idle = input;
> + return 1;
> +}
No bounds checking is needed?
This function will accept input of the form "42foo", which is sloppy.
Use of strict_strtoul() will fix that.
> +static ssize_t show_ignore_idle(struct cpufreq_policy *policy, char *buf)
> +{
> + return sprintf(buf, "%d\n", ignore_idle);
> +}
> +
> +CPUFREQ_STATDEVICE_ATTR(total_trans, 0444, show_total_trans, NULL);
> +CPUFREQ_STATDEVICE_ATTR(time_in_state, 0444, show_time_in_state, NULL);
> +CPUFREQ_STATDEVICE_ATTR(ignore_idle, 0664, show_ignore_idle, store_ignore_idle);
>
> static struct attribute *default_attrs[] = {
> &_attr_total_trans.attr,
> @@ -304,6 +323,21 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb,
> return 0;
> }
>
> +void cpufreq_exit_idle(int cpu, unsigned long ticks)
> +{
> + struct cpufreq_stats *stat;
> + stat = per_cpu(cpufreq_stats_table, cpu);
> +
> + /* Wait until cpu stats is initalized */
> + if (!ignore_idle || !stat || !stat->time_in_state)
> + return;
> +
> + spin_lock(&cpufreq_stats_lock);
> + stat->time_in_state[stat->last_index] =
> + cputime_sub(stat->time_in_state[stat->last_index], ticks);
> + spin_unlock(&cpufreq_stats_lock);
> +}
cpufreq_stats_lock is not an irq-safe lock, so if cpufreq_exit_idle()
gets called from an interrupt then this function is deadlockable. It
doesn't _look_ like cpufreq_exit_idle() is presently called from a
clock interrupt, but I didn't look too closely.
> static int __cpuinit cpufreq_stat_cpu_callback(struct notifier_block *nfb,
> unsigned long action,
> void *hcpu)
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 4de02b1..e3f1363 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -256,6 +256,11 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
>
> void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state);
>
> +#ifdef CONFIG_CPU_FREQ_STAT
> +extern void cpufreq_exit_idle(int cpu, unsigned long ticks);
> +#else
> +#define cpufreq_exit_idle(int cpu, unsigned long ticks) do {} while (0)
This didn't need to be a macro, I think. A static inline provides
typechecking and is hence preferred.
> +#endif
>
> static inline void cpufreq_verify_within_limits(struct cpufreq_policy *policy, unsigned int min, unsigned int max)
> {
> diff --git a/kernel/sched.c b/kernel/sched.c
> index c535cc4..feef94c 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -54,6 +54,7 @@
> #include <linux/rcupdate.h>
> #include <linux/cpu.h>
> #include <linux/cpuset.h>
> +#include <linux/cpufreq.h>
> #include <linux/percpu.h>
> #include <linux/kthread.h>
> #include <linux/proc_fs.h>
> @@ -5187,6 +5188,7 @@ void account_steal_ticks(unsigned long ticks)
> */
> void account_idle_ticks(unsigned long ticks)
> {
> + cpufreq_exit_idle(smp_processor_id(), ticks);
> account_idle_time(jiffies_to_cputime(ticks));
> }
This only gets called if CONFIG_VIRT_CPU_ACCOUNTING=y, which is mostly
a Xen thing. But your changelog didn't describe this being a
xen-specific fix so I wonder whether that was intended.
Please cc the sched developers on patches of this nature. And the Xen
developers if appropriate.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists