lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150128175606.GB27003@developer.hsd1.ca.comcast.net>
Date:	Wed, 28 Jan 2015 13:56:08 -0400
From:	Eduardo Valentin <edubezval@...il.com>
To:	Javi Merino <javi.merino@....com>
Cc:	linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
	punit.agrawal@....com, broonie@...nel.org,
	Zhang Rui <rui.zhang@...el.com>
Subject: Re: [PATCH v1 3/7] thermal: cpu_cooling: implement the power cooling
 device API

On Wed, Jan 28, 2015 at 05:00:34PM +0000, Javi Merino wrote:
> Add a basic power model to the cpu cooling device to implement the
> power cooling device API.  The power model uses the current frequency,
> current load and OPPs for the power calculations.  The cpus must have
> registered their OPPs using the OPP library.
> 
> Cc: Zhang Rui <rui.zhang@...el.com>
> Cc: Eduardo Valentin <edubezval@...il.com>
> Signed-off-by: Punit Agrawal <punit.agrawal@....com>
> Signed-off-by: Javi Merino <javi.merino@....com>
> ---
>  Documentation/thermal/cpu-cooling-api.txt | 156 +++++++++-
>  drivers/thermal/cpu_cooling.c             | 480 +++++++++++++++++++++++++++++-
>  include/linux/cpu_cooling.h               |  39 +++
>  3 files changed, 670 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/thermal/cpu-cooling-api.txt b/Documentation/thermal/cpu-cooling-api.txt
> index 753e47cc2e20..71653584cd03 100644
> --- a/Documentation/thermal/cpu-cooling-api.txt
> +++ b/Documentation/thermal/cpu-cooling-api.txt
> @@ -36,8 +36,162 @@ the user. The registration APIs returns the cooling device pointer.
>      np: pointer to the cooling device device tree node
>      clip_cpus: cpumask of cpus where the frequency constraints will happen.
>  
> -1.1.3 void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
> +1.1.3 struct thermal_cooling_device *cpufreq_power_cooling_register(
> +    const struct cpumask *clip_cpus, u32 capacitance,
> +    get_static_t plat_static_func)
> +
> +Similar to cpufreq_cooling_register, this function registers a cpufreq
> +cooling device.  Using this function, the cooling device will
> +implement the power extensions by using a simple cpu power model.  The
> +cpus must have registered their OPPs using the OPP library.
> +
> +The additional parameters are needed for the power model (See 2. Power
> +models).  "capacitance" is the dynamic power coefficient (See 2.1
> +Dynamic power).  "plat_static_func" is a function to calculate the
> +static power consumed by these cpus (See 2.2 Static power).
> +
> +1.1.4 struct thermal_cooling_device *of_cpufreq_power_cooling_register(
> +    struct device_node *np, const struct cpumask *clip_cpus, u32 capacitance,
> +    get_static_t plat_static_func)
> +
> +Similar to cpufreq_power_cooling_register, this function register a
> +cpufreq cooling device with power extensions using the device tree
> +information supplied by the np parameter.
> +
> +1.1.5 void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
>  
>      This interface function unregisters the "thermal-cpufreq-%x" cooling device.
>  
>      cdev: Cooling device pointer which has to be unregistered.
> +
> +2. Power models
> +
> +The power API registration functions provide a simple power model for
> +CPUs.  The current power is calculated as dynamic + (optionally)
> +static power.  This power model requires that the operating-points of
> +the CPUs are registered using the kernel's opp library and the
> +`cpufreq_frequency_table` is assigned to the `struct device` of the
> +cpu.  If you are using CONFIG_CPUFREQ_DT then the
> +`cpufreq_frequency_table` should already be assigned to the cpu
> +device.
> +
> +The `plat_static_func` parameter of `cpufreq_power_cooling_register()`
> +and `of_cpufreq_power_cooling_register()` is optional.  If you don't
> +provide it, only dynamic power will be considered.
> +
> +2.1 Dynamic power
> +
> +The dynamic power consumption of a processor depends on many factors.
> +For a given processor implementation the primary factors are:
> +
> +- The time the processor spends running, consuming dynamic power, as
> +  compared to the time in idle states where dynamic consumption is
> +  negligible.  Herein we refer to this as 'utilisation'.
> +- The voltage and frequency levels as a result of DVFS.  The DVFS
> +  level is a dominant factor governing power consumption.
> +- In running time the 'execution' behaviour (instruction types, memory
> +  access patterns and so forth) causes, in most cases, a second order
> +  variation.  In pathological cases this variation can be significant,
> +  but typically it is of a much lesser impact than the factors above.
> +
> +A high level dynamic power consumption model may then be represented as:
> +
> +Pdyn = f(run) * Voltage^2 * Frequency * Utilisation
> +
> +f(run) here represents the described execution behaviour and its
> +result has a units of Watts/Hz/Volt^2 (this often expressed in
> +mW/MHz/uVolt^2)
> +
> +The detailed behaviour for f(run) could be modelled on-line.  However,
> +in practice, such an on-line model has dependencies on a number of
> +implementation specific processor support and characterisation
> +factors.  Therefore, in initial implementation that contribution is
> +represented as a constant coefficient.  This is a simplification
> +consistent with the relative contribution to overall power variation.
> +
> +In this simplified representation our model becomes:
> +
> +Pdyn = Capacitance * Voltage^2 * Frequency * Utilisation
> +
> +Where `capacitance` is a constant that represents an indicative
> +running time dynamic power coefficient in fundamental units of
> +mW/MHz/uVolt^2.  Typical values for mobile CPUs might lie in range
> +from 100 to 500.  For reference, the approximate values for the SoC in
> +ARM's Juno Development Platform are 530 for the Cortex-A57 cluster and
> +140 for the Cortex-A53 cluster.
> +
> +
> +2.2 Static power
> +
> +Static leakage power consumption depends on a number of factors.  For a
> +given circuit implementation the primary factors are:
> +
> +- Time the circuit spends in each 'power state'
> +- Temperature
> +- Operating voltage
> +- Process grade
> +
> +The time the circuit spends in each 'power state' for a given
> +evaluation period at first order means OFF or ON.  However,
> +'retention' states can also be supported that reduce power during
> +inactive periods without loss of context.
> +
> +Note: The visibility of state entries to the OS can vary, according to
> +platform specifics, and this can then impact the accuracy of a model
> +based on OS state information alone.  It might be possible in some
> +cases to extract more accurate information from system resources.
> +
> +The temperature, operating voltage and process 'grade' (slow to fast)
> +of the circuit are all significant factors in static leakage power
> +consumption.  All of these have complex relationships to static power.
> +
> +Circuit implementation specific factors include the chosen silicon
> +process as well as the type, number and size of transistors in both
> +the logic gates and any RAM elements included.
> +
> +The static power consumption modelling must take into account the
> +power managed regions that are implemented.  Taking the example of an
> +ARM processor cluster, the modelling would take into account whether
> +each CPU can be powered OFF separately or if only a single power
> +region is implemented for the complete cluster.
> +
> +In one view, there are others, a static power consumption model can
> +then start from a set of reference values for each power managed
> +region (e.g. CPU, Cluster/L2) in each state (e.g. ON, OFF) at an
> +arbitrary process grade, voltage and temperature point.  These values
> +are then scaled for all of the following: the time in each state, the
> +process grade, the current temperature and the operating voltage.
> +However, since both implementation specific and complex relationships
> +dominate the estimate, the appropriate interface to the model from the
> +cpu cooling device is to provide a function callback that calculates
> +the static power in this platform.  When registering the cpu cooling
> +device pass a function pointer that follows the `get_static_t`
> +prototype:
> +
> +    int plat_get_static(cpumask_t *cpumask, int interval,
> +                        unsigned long voltage, u32 &power);
> +
> +`cpumask` is the cpumask of the cpus involved in the calculation.
> +`voltage` is the voltage at which they are operating.  The function
> +should calculate the average static power for the last `interval`
> +milliseconds.  It returns 0 on success, -E* on error.  If it
> +succeeds, it should store the static power in `power`.  Reading the
> +temperature of the cpus described by `cpumask` is left for
> +plat_get_static() to do as the platform knows best which thermal
> +sensor is closest to the cpu.
> +
> +If `plat_static_func` is NULL, static power is considered to be
> +negligible for this platform and only dynamic power is considered.
> +
> +The platform specific callback can then use any combination of tables
> +and/or equations to permute the estimated value.  Process grade
> +information is not passed to the model since access to such data, from
> +on-chip measurement capability or manufacture time data, is platform
> +specific.
> +
> +Note: the significance of static power for CPUs in comparison to
> +dynamic power is highly dependent on implementation.  Given the
> +potential complexity in implementation, the importance and accuracy of
> +its inclusion when using cpu cooling devices should be assessed on a
> +case by case basis.
> +
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index f65f0d109fc8..a639aaf228f5 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -26,6 +26,7 @@
>  #include <linux/thermal.h>
>  #include <linux/cpufreq.h>
>  #include <linux/err.h>
> +#include <linux/pm_opp.h>
>  #include <linux/slab.h>
>  #include <linux/cpu.h>
>  #include <linux/cpu_cooling.h>
> @@ -45,6 +46,19 @@
>   */
>  
>  /**
> + * struct power_table - frequency to power conversion
> + * @frequency:	frequency in KHz
> + * @power:	power in mW
> + *
> + * This structure is built when the cooling device registers and helps
> + * in translating frequency to power and viceversa.
> + */
> +struct power_table {
> +	u32 frequency;
> +	u32 power;
> +};
> +
> +/**
>   * struct cpufreq_cooling_device - data for cooling device with cpufreq
>   * @id: unique integer value corresponding to each cpufreq_cooling_device
>   *	registered.
> @@ -58,6 +72,15 @@
>   *	cpufreq frequencies.
>   * @allowed_cpus: all the cpus involved for this cpufreq_cooling_device.
>   * @node: list_head to link all cpufreq_cooling_device together.
> + * @last_load: load measured by the latest call to cpufreq_get_actual_power()
> + * @time_in_idle: previous reading of the absolute time that this cpu was idle
> + * @time_in_idle_timestamp: wall time of the last invocation of
> + *	get_cpu_idle_time_us()
> + * @dyn_power_table: array of struct power_table for frequency to power
> + *	conversion, sorted in ascending order.
> + * @dyn_power_table_entries: number of entries in the @dyn_power_table array
> + * @cpu_dev: the first cpu_device from @allowed_cpus that has OPPs registered
> + * @plat_get_static_power: callback to calculate the static power
>   *
>   * This structure is required for keeping information of each registered
>   * cpufreq_cooling_device.
> @@ -71,6 +94,13 @@ struct cpufreq_cooling_device {
>  	unsigned int *freq_table;	/* In descending order */
>  	struct cpumask allowed_cpus;
>  	struct list_head node;
> +	u32 last_load;
> +	u64 time_in_idle[NR_CPUS];
> +	u64 time_in_idle_timestamp[NR_CPUS];
> +	struct power_table *dyn_power_table;
> +	int dyn_power_table_entries;
> +	struct device *cpu_dev;
> +	get_static_t plat_get_static_power;
>  };
>  static DEFINE_IDR(cpufreq_idr);
>  static DEFINE_MUTEX(cooling_cpufreq_lock);
> @@ -205,6 +235,210 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>  	return 0;
>  }
>  
> +/**
> + * build_dyn_power_table() - create a dynamic power to frequency table
> + * @cpufreq_device:	the cpufreq cooling device in which to store the table
> + * @capacitance: dynamic power coefficient for these cpus
> + *
> + * Build a dynamic power to frequency table for this cpu and store it
> + * in @cpufreq_device.  This table will be used in cpu_power_to_freq() and
> + * cpu_freq_to_power() to convert between power and frequency
> + * efficiently.  Power is stored in mW, frequency in KHz.  The
> + * resulting table is in ascending order.
> + *
> + * Return: 0 on success, -E* on error.
> + */
> +static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
> +				 u32 capacitance)
> +{
> +	struct power_table *power_table;
> +	struct dev_pm_opp *opp;
> +	struct device *dev = NULL;
> +	int num_opps = 0, cpu, i, ret = 0;
> +	unsigned long freq;
> +
> +	rcu_read_lock();
> +
> +	for_each_cpu(cpu, &cpufreq_device->allowed_cpus) {
> +		dev = get_cpu_device(cpu);
> +		if (!dev) {
> +			dev_warn(&cpufreq_device->cool_dev->device,
> +				 "No cpu device for cpu %d\n", cpu);
> +			continue;
> +		}
> +
> +		num_opps = dev_pm_opp_get_opp_count(dev);
> +		if (num_opps > 0) {
> +			break;
> +		} else if (num_opps < 0) {
> +			ret = num_opps;
> +			goto unlock;
> +		}
> +	}
> +
> +	if (num_opps == 0) {
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	power_table = kcalloc(num_opps, sizeof(*power_table), GFP_KERNEL);
> +
> +	for (freq = 0, i = 0;
> +	     opp = dev_pm_opp_find_freq_ceil(dev, &freq), !IS_ERR(opp);
> +	     freq++, i++) {
> +		u32 freq_mhz, voltage_mv;
> +		u64 power;
> +
> +		freq_mhz = freq / 1000000;
> +		voltage_mv = dev_pm_opp_get_voltage(opp) / 1000;
> +
> +		/*
> +		 * Do the multiplication with MHz and millivolt so as
> +		 * to not overflow.
> +		 */
> +		power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv;
> +		do_div(power, 1000000000);
> +
> +		/* frequency is stored in power_table in KHz */
> +		power_table[i].frequency = freq / 1000;
> +
> +		/* power is stored in mW */
> +		power_table[i].power = power;
> +	}
> +
> +	if (i == 0) {
> +		ret = PTR_ERR(opp);
> +		goto unlock;
> +	}
> +
> +	cpufreq_device->cpu_dev = dev;
> +	cpufreq_device->dyn_power_table = power_table;
> +	cpufreq_device->dyn_power_table_entries = i;
> +
> +unlock:
> +	rcu_read_unlock();
> +	return ret;
> +}
> +
> +static u32 cpu_freq_to_power(struct cpufreq_cooling_device *cpufreq_device,
> +			     u32 freq)
> +{
> +	int i;
> +	struct power_table *pt = cpufreq_device->dyn_power_table;
> +
> +	for (i = 1; i < cpufreq_device->dyn_power_table_entries; i++)
> +		if (freq < pt[i].frequency)
> +			break;
> +
> +	return pt[i - 1].power;
> +}
> +
> +static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_device,
> +			     u32 power)
> +{
> +	int i;
> +	struct power_table *pt = cpufreq_device->dyn_power_table;
> +
> +	for (i = 1; i < cpufreq_device->dyn_power_table_entries; i++)
> +		if (power < pt[i].power)
> +			break;
> +
> +	return pt[i - 1].frequency;
> +}
> +
> +/**
> + * get_load() - get load for a cpu since last updated
> + * @cpufreq_device:	&struct cpufreq_cooling_device for this cpu
> + * @cpu:	cpu number
> + *
> + * Return: The average load of cpu @cpu in percentage since this
> + * function was last called.
> + */
> +static u32 get_load(struct cpufreq_cooling_device *cpufreq_device, int cpu)
> +{
> +	u32 load;
> +	u64 now, now_idle, delta_time, delta_idle;
> +
> +	now_idle = get_cpu_idle_time(cpu, &now, 0);
> +	delta_idle = now_idle - cpufreq_device->time_in_idle[cpu];
> +	delta_time = now - cpufreq_device->time_in_idle_timestamp[cpu];
> +
> +	if (delta_time <= delta_idle)
> +		load = 0;
> +	else
> +		load = div64_u64(100 * (delta_time - delta_idle), delta_time);
> +
> +	cpufreq_device->time_in_idle[cpu] = now_idle;
> +	cpufreq_device->time_in_idle_timestamp[cpu] = now;
> +
> +	return load;
> +}
> +
> +/**
> + * get_static_power() - calculate the static power consumed by the cpus
> + * @cpufreq_device:	struct &cpufreq_cooling_device for this cpu cdev
> + * @tz:		thermal zone device in which we're operating
> + * @freq:	frequency in KHz
> + * @power:	pointer in which to store the calculated static power
> + *
> + * Calculate the static power consumed by the cpus described by
> + * @cpu_actor running at frequency @freq.  This function relies on a
> + * platform specific function that should have been provided when the
> + * actor was registered.  If it wasn't, the static power is assumed to
> + * be negligible.  The calculated static power is stored in @power.
> + *
> + * Return: 0 on success, -E* on failure.
> + */
> +static int get_static_power(struct cpufreq_cooling_device *cpufreq_device,
> +			    struct thermal_zone_device *tz, unsigned long freq,
> +			    u32 *power)
> +{
> +	struct dev_pm_opp *opp;
> +	unsigned long voltage;
> +	struct cpumask *cpumask = &cpufreq_device->allowed_cpus;
> +	unsigned long freq_hz = freq * 1000;
> +
> +	if (!cpufreq_device->plat_get_static_power) {
> +		*power = 0;
> +		return 0;
> +	}
> +
> +	rcu_read_lock();
> +
> +	opp = dev_pm_opp_find_freq_exact(cpufreq_device->cpu_dev, freq_hz,
> +					 true);
> +	voltage = dev_pm_opp_get_voltage(opp);
> +
> +	rcu_read_unlock();
> +
> +	if (voltage == 0) {
> +		dev_warn_ratelimited(cpufreq_device->cpu_dev,
> +				     "Failed to get voltage for frequency %lu: %ld\n",
> +				     freq_hz, IS_ERR(opp) ? PTR_ERR(opp) : 0);
> +		return -EINVAL;
> +	}
> +
> +	return cpufreq_device->plat_get_static_power(cpumask, tz->passive_delay,
> +						     voltage, power);
> +}
> +
> +/**
> + * get_dynamic_power() - calculate the dynamic power
> + * @cpufreq_device:	&cpufreq_cooling_device for this cdev
> + * @freq:	current frequency
> + *
> + * Return: the dynamic power consumed by the cpus described by
> + * @cpufreq_device.
> + */
> +static u32 get_dynamic_power(struct cpufreq_cooling_device *cpufreq_device,
> +			     unsigned long freq)
> +{
> +	u32 raw_cpu_power;
> +
> +	raw_cpu_power = cpu_freq_to_power(cpufreq_device, freq);
> +	return (raw_cpu_power * cpufreq_device->last_load) / 100;
> +}
> +
>  /* cpufreq cooling device callback functions are defined below */
>  
>  /**
> @@ -280,8 +514,161 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
>  	return 0;
>  }
>  
> +/**
> + * cpufreq_get_requested_power() - get the current power
> + * @cdev:	&thermal_cooling_device pointer
> + * @tz:		a valid thermal zone device pointer
> + * @power:	pointer in which to store the resulting power
> + *
> + * Calculate the current power consumption of the cpus in milliwatts
> + * and store it in @power.  This function should actually calculate
> + * the requested power, but it's hard to get the frequency that
> + * cpufreq would have assigned if there were no thermal limits.
> + * Instead, we calculate the current power on the assumption that the
> + * immediate future will look like the immediate past.
> + *
> + * Return: 0 on success, -E* if getting the static power failed.
> + */
> +static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev,
> +				       struct thermal_zone_device *tz,
> +				       u32 *power)
> +{
> +	unsigned long freq;
> +	int cpu, ret;
> +	u32 static_power, dynamic_power, total_load = 0;
> +	struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
> +
> +	freq = cpufreq_quick_get(cpumask_any(&cpufreq_device->allowed_cpus));
> +
> +	for_each_cpu(cpu, &cpufreq_device->allowed_cpus) {
> +		u32 load;
> +
> +		if (cpu_online(cpu))
> +			load = get_load(cpufreq_device, cpu);
> +		else
> +			load = 0;
> +
> +		total_load += load;
> +	}
> +
> +	cpufreq_device->last_load = total_load;
> +
> +	dynamic_power = get_dynamic_power(cpufreq_device, freq);
> +	ret = get_static_power(cpufreq_device, tz, freq, &static_power);
> +	if (ret)
> +		return ret;
> +
> +	*power = static_power + dynamic_power;
> +	return 0;
> +}

Repeating the query I've just made on v5, do we care if the system uses
different opps during the load sampling interval? 

Meaning, 1 - idle might not reflect the correct load.


> +
> +/**
> + * cpufreq_state2power() - convert a cpu cdev state to power consumed
> + * @cdev:	&thermal_cooling_device pointer
> + * @tz:		a valid thermal zone device pointer
> + * @state:	cooling device state to be converted
> + * @power:	pointer in which to store the resulting power
> + *
> + * Convert cooling device state @state into power consumption in
> + * milliwatts assuming 100% load.  Store the calculated power in
> + * @power.
> + *
> + * Return: 0 on success, -EINVAL if the cooling device state could not
> + * be converted into a frequency or other -E* if there was an error
> + * when calculating the static power.
> + */
> +static int cpufreq_state2power(struct thermal_cooling_device *cdev,
> +			       struct thermal_zone_device *tz,
> +			       unsigned long state, u32 *power)
> +{
> +	unsigned int freq, num_cpus;
> +	cpumask_t cpumask;
> +	u32 static_power, dynamic_power;
> +	int ret;
> +	struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
> +
> +	cpumask_and(&cpumask, &cpufreq_device->allowed_cpus, cpu_online_mask);
> +	num_cpus = cpumask_weight(&cpumask);
> +
> +	/* None of our cpus are online, so no power */
> +	if (num_cpus == 0) {
> +		*power = 0;
> +		return 0;
> +	}
> +
> +	freq = cpufreq_device->freq_table[state];
> +	if (!freq)
> +		return -EINVAL;
> +
> +	dynamic_power = cpu_freq_to_power(cpufreq_device, freq) * num_cpus;
> +	ret = get_static_power(cpufreq_device, tz, freq, &static_power);
> +	if (ret)
> +		return ret;
> +
> +	*power = static_power + dynamic_power;
> +	return 0;
> +}
> +
> +/**
> + * cpufreq_power2state() - convert power to a cooling device state
> + * @cdev:	&thermal_cooling_device pointer
> + * @tz:		a valid thermal zone device pointer
> + * @power:	power in milliwatts to be converted
> + * @state:	pointer in which to store the resulting state
> + *
> + * Calculate a cooling device state for the cpus described by @cdev
> + * that would allow them to consume at most @power mW and store it in
> + * @state.  Note that this calculation depends on external factors
> + * such as the cpu load or the current static power.  Calling this
> + * function with the same power as input can yield different cooling
> + * device states depending on those external factors.
> + *
> + * Return: 0 on success, -ENODEV if no cpus are online or -EINVAL if
> + * the calculated frequency could not be converted to a valid state.
> + * The latter should not happen unless the frequencies available to
> + * cpufreq have changed since the initialization of the cpu cooling
> + * device.
> + */
> +static int cpufreq_power2state(struct thermal_cooling_device *cdev,
> +			       struct thermal_zone_device *tz, u32 power,
> +			       unsigned long *state)
> +{
> +	unsigned int cpu, cur_freq, target_freq;
> +	int ret;
> +	s32 dyn_power;
> +	u32 last_load, normalised_power, static_power;
> +	struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
> +
> +	cpu = cpumask_any_and(&cpufreq_device->allowed_cpus, cpu_online_mask);
> +
> +	/* None of our cpus are online */
> +	if (cpu >= nr_cpu_ids)
> +		return -ENODEV;
> +
> +	cur_freq = cpufreq_quick_get(cpu);
> +	ret = get_static_power(cpufreq_device, tz, cur_freq, &static_power);
> +	if (ret)
> +		return ret;
> +
> +	dyn_power = power - static_power;
> +	dyn_power = dyn_power > 0 ? dyn_power : 0;
> +	last_load = cpufreq_device->last_load ?: 1;
> +	normalised_power = (dyn_power * 100) / last_load;
> +	target_freq = cpu_power_to_freq(cpufreq_device, normalised_power);
> +
> +	*state = cpufreq_cooling_get_level(cpu, target_freq);
> +	if (*state == THERMAL_CSTATE_INVALID) {
> +		dev_warn_ratelimited(&cdev->device,
> +				     "Failed to convert %dKHz for cpu %d into a cdev state\n",
> +				     target_freq, cpu);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  /* Bind cpufreq callbacks to thermal cooling device ops */
> -static struct thermal_cooling_device_ops const cpufreq_cooling_ops = {
> +static struct thermal_cooling_device_ops cpufreq_cooling_ops = {
>  	.get_max_state = cpufreq_get_max_state,
>  	.get_cur_state = cpufreq_get_cur_state,
>  	.set_cur_state = cpufreq_set_cur_state,
> @@ -311,6 +698,9 @@ static unsigned int find_next_max(struct cpufreq_frequency_table *table,
>   * @np: a valid struct device_node to the cooling device device tree node
>   * @clip_cpus: cpumask of cpus where the frequency constraints will happen.
>   * Normally this should be same as cpufreq policy->related_cpus.
> + * @capacitance: dynamic power coefficient for these cpus
> + * @plat_static_func: function to calculate the static power consumed by these
> + *                    cpus (optional)
>   *
>   * This interface function registers the cpufreq cooling device with the name
>   * "thermal-cpufreq-%x". This api can support multiple instances of cpufreq
> @@ -322,7 +712,8 @@ static unsigned int find_next_max(struct cpufreq_frequency_table *table,
>   */
>  static struct thermal_cooling_device *
>  __cpufreq_cooling_register(struct device_node *np,
> -			   const struct cpumask *clip_cpus)
> +			const struct cpumask *clip_cpus, u32 capacitance,
> +			get_static_t plat_static_func)
>  {
>  	struct thermal_cooling_device *cool_dev;
>  	struct cpufreq_cooling_device *cpufreq_dev;
> @@ -357,6 +748,20 @@ __cpufreq_cooling_register(struct device_node *np,
>  
>  	cpumask_copy(&cpufreq_dev->allowed_cpus, clip_cpus);
>  
> +	if (capacitance) {
> +		cpufreq_cooling_ops.get_requested_power =
> +			cpufreq_get_requested_power;
> +		cpufreq_cooling_ops.state2power = cpufreq_state2power;
> +		cpufreq_cooling_ops.power2state = cpufreq_power2state;
> +		cpufreq_dev->plat_get_static_power = plat_static_func;
> +
> +		ret = build_dyn_power_table(cpufreq_dev, capacitance);
> +		if (ret) {
> +			cool_dev = ERR_PTR(ret);
> +			goto free_table;
> +		}
> +	}
> +
>  	ret = get_idr(&cpufreq_idr, &cpufreq_dev->id);
>  	if (ret) {
>  		cool_dev = ERR_PTR(ret);
> @@ -422,7 +827,7 @@ free_cdev:
>  struct thermal_cooling_device *
>  cpufreq_cooling_register(const struct cpumask *clip_cpus)
>  {
> -	return __cpufreq_cooling_register(NULL, clip_cpus);
> +	return __cpufreq_cooling_register(NULL, clip_cpus, 0, NULL);
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_cooling_register);
>  
> @@ -446,11 +851,78 @@ of_cpufreq_cooling_register(struct device_node *np,
>  	if (!np)
>  		return ERR_PTR(-EINVAL);
>  
> -	return __cpufreq_cooling_register(np, clip_cpus);
> +	return __cpufreq_cooling_register(np, clip_cpus, 0, NULL);
>  }
>  EXPORT_SYMBOL_GPL(of_cpufreq_cooling_register);
>  
>  /**
> + * cpufreq_power_cooling_register() - create cpufreq cooling device with power extensions
> + * @clip_cpus:	cpumask of cpus where the frequency constraints will happen
> + * @capacitance:	dynamic power coefficient for these cpus
> + * @plat_static_func:	function to calculate the static power consumed by these
> + *			cpus (optional)
> + *
> + * This interface function registers the cpufreq cooling device with
> + * the name "thermal-cpufreq-%x".  This api can support multiple
> + * instances of cpufreq cooling devices.  Using this function, the
> + * cooling device will implement the power extensions by using a
> + * simple cpu power model.  The cpus must have registered their OPPs
> + * using the OPP library.
> + *
> + * An optional @plat_static_func may be provided to calculate the
> + * static power consumed by these cpus.  If the platform's static
> + * power consumption is unknown or negligible, make it NULL.
> + *
> + * Return: a valid struct thermal_cooling_device pointer on success,
> + * on failure, it returns a corresponding ERR_PTR().
> + */
> +struct thermal_cooling_device *
> +cpufreq_power_cooling_register(const struct cpumask *clip_cpus, u32 capacitance,
> +			       get_static_t plat_static_func)
> +{
> +	return __cpufreq_cooling_register(NULL, clip_cpus, capacitance,
> +				plat_static_func);
> +}
> +EXPORT_SYMBOL(cpufreq_power_cooling_register);
> +
> +/**
> + * of_cpufreq_power_cooling_register() - create cpufreq cooling device with power extensions
> + * @np:	a valid struct device_node to the cooling device device tree node
> + * @clip_cpus:	cpumask of cpus where the frequency constraints will happen
> + * @capacitance:	dynamic power coefficient for these cpus
> + * @plat_static_func:	function to calculate the static power consumed by these
> + *			cpus (optional)
> + *
> + * This interface function registers the cpufreq cooling device with
> + * the name "thermal-cpufreq-%x".  This api can support multiple
> + * instances of cpufreq cooling devices.  Using this API, the cpufreq
> + * cooling device will be linked to the device tree node provided.
> + * Using this function, the cooling device will implement the power
> + * extensions by using a simple cpu power model.  The cpus must have
> + * registered their OPPs using the OPP library.
> + *
> + * An optional @plat_static_func may be provided to calculate the
> + * static power consumed by these cpus.  If the platform's static
> + * power consumption is unknown or negligible, make it NULL.
> + *
> + * Return: a valid struct thermal_cooling_device pointer on success,
> + * on failure, it returns a corresponding ERR_PTR().
> + */
> +struct thermal_cooling_device *
> +of_cpufreq_power_cooling_register(struct device_node *np,
> +				  const struct cpumask *clip_cpus,
> +				  u32 capacitance,
> +				  get_static_t plat_static_func)
> +{
> +	if (!np)
> +		return ERR_PTR(-EINVAL);
> +
> +	return __cpufreq_cooling_register(np, clip_cpus, capacitance,
> +				plat_static_func);
> +}
> +EXPORT_SYMBOL(of_cpufreq_power_cooling_register);
> +
> +/**
>   * cpufreq_cooling_unregister - function to remove cpufreq cooling device.
>   * @cdev: thermal cooling device pointer.
>   *
> diff --git a/include/linux/cpu_cooling.h b/include/linux/cpu_cooling.h
> index bd955270d5aa..c156f5082758 100644
> --- a/include/linux/cpu_cooling.h
> +++ b/include/linux/cpu_cooling.h
> @@ -28,6 +28,9 @@
>  #include <linux/thermal.h>
>  #include <linux/cpumask.h>
>  
> +typedef int (*get_static_t)(cpumask_t *cpumask, int interval,
> +			    unsigned long voltage, u32 *power);
> +
>  #ifdef CONFIG_CPU_THERMAL
>  /**
>   * cpufreq_cooling_register - function to create cpufreq cooling device.
> @@ -36,6 +39,10 @@
>  struct thermal_cooling_device *
>  cpufreq_cooling_register(const struct cpumask *clip_cpus);
>  
> +struct thermal_cooling_device *
> +cpufreq_power_cooling_register(const struct cpumask *clip_cpus,
> +			       u32 capacitance, get_static_t plat_static_func);
> +
>  /**
>   * of_cpufreq_cooling_register - create cpufreq cooling device based on DT.
>   * @np: a valid struct device_node to the cooling device device tree node.
> @@ -45,6 +52,12 @@ cpufreq_cooling_register(const struct cpumask *clip_cpus);
>  struct thermal_cooling_device *
>  of_cpufreq_cooling_register(struct device_node *np,
>  			    const struct cpumask *clip_cpus);
> +
> +struct thermal_cooling_device *
> +of_cpufreq_power_cooling_register(struct device_node *np,
> +				  const struct cpumask *clip_cpus,
> +				  u32 capacitance,
> +				  get_static_t plat_static_func);
>  #else
>  static inline struct thermal_cooling_device *
>  of_cpufreq_cooling_register(struct device_node *np,
> @@ -52,6 +65,15 @@ of_cpufreq_cooling_register(struct device_node *np,
>  {
>  	return ERR_PTR(-ENOSYS);
>  }
> +
> +static inline struct thermal_cooling_device *
> +of_cpufreq_power_cooling_register(struct device_node *np,
> +				  const struct cpumask *clip_cpus,
> +				  u32 capacitance,
> +				  get_static_t plat_static_func)
> +{
> +	return NULL;
> +}
>  #endif
>  
>  /**
> @@ -68,11 +90,28 @@ cpufreq_cooling_register(const struct cpumask *clip_cpus)
>  	return ERR_PTR(-ENOSYS);
>  }
>  static inline struct thermal_cooling_device *
> +cpufreq_power_cooling_register(const struct cpumask *clip_cpus,
> +			       u32 capacitance, get_static_t plat_static_func)
> +{
> +	return NULL;
> +}
> +
> +static inline struct thermal_cooling_device *
>  of_cpufreq_cooling_register(struct device_node *np,
>  			    const struct cpumask *clip_cpus)
>  {
>  	return ERR_PTR(-ENOSYS);
>  }
> +
> +static inline struct thermal_cooling_device *
> +of_cpufreq_power_cooling_register(struct device_node *np,
> +				  const struct cpumask *clip_cpus,
> +				  u32 capacitance,
> +				  get_static_t plat_static_func)
> +{
> +	return NULL;
> +}
> +
>  static inline
>  void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
>  {
> -- 
> 1.9.1
> 

Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ