linux-kernel - Re: [PATCH v1 2/2] PM: QoS: Add a performance QoS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240215093955.mgox2yiaidxh3ktq@dhruva>
Date: Thu, 15 Feb 2024 15:09:55 +0530
From: Dhruva Gole <d-gole@...com>
To: Daniel Lezcano <daniel.lezcano@...aro.org>
CC: <rafael@...nel.org>, <caleb.connolly@...aro.org>, <lina.iyer@...aro.org>,
        <lukasz.luba@....com>, <quic_manafm@...cinc.com>,
        <quic_priyjain@...cinc.com>, <linux-pm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <vibhore@...com>,
        <khilman@...libre.com>
Subject: Re: [PATCH v1 2/2] PM: QoS: Add a performance QoS

Hi,

On Dec 13, 2023 at 18:58:18 +0100, Daniel Lezcano wrote:
> Currently cpufreq and devfreq are using the freq QoS to aggregate the
> requests for frequency ranges.
> 
> However, there are new devices wanting to act not on a frequency range
> but on a performance index range. Those need also to export to
> userspace the knob to act on their performance limits.
> 
> This change provides a performance limiter QoS based on a minimum /
> maximum performance values. At init time, the limits of the interval
> are 0 / 1024. It is up to the backend to convert the 1024 to the
> maximum performance state. So if the performance must be limited to
> 50%, it should set to maximum limit to 512 where the backend will end
> up by converting (max performance index / 2). The same applies for the
> minimum. Obviously, the min can not be greater than the max.
> 
>  1. With the example above, if there is a odd number like 5 for the
>  number of performance indexes and we ask for 512 (so 50%), what would
>  be the performance index computed? (5/2=2 or 5/2=3)? (I would say the
>  minimum otherwise we end up with a performance limit greater than
>  what we actually asked for).
> 
>  2. The conversion from 1024 to a performance index will inevatibly
>  end up to a state above or below the percentage given. Shall it be
>  reflected in the value set? eg. We want to apply a performance limit
>  to be 33% maximum. So it is, 1024 x 0.333333 = 314. If there are 20
>  performance indexes, that will be (20 x 314) / 1024 = 6.13, so index
>  6. Shall we convert this index back to the requested performance
>  limit to (6.13 x 1024) / 20 = 307 ? (So requested is 314 but it is
>  actually 307).
> 
> The end goal is to make the freq QoS and perf QoS to co-exist together
> in the next changes in the different backends. A change of one of the
> QoS impacts the other. For instance if there are 5 performance states
> and we set a performance limit to 80%, then the maximum state will 4.
> 
> For the long term, when those can co-exist, then we can implement a
> cooling device based on the performance Qos which will be generic for
> all devices using this QoS. That will imply the CPUs, the GPUs and any
> devfreq devices. So devfreq and cpufreq cooling devices can be merged
> into a single performance cooling device which will be generic for all
> devices with a performance limit QoS.
> 
> In a similar way, in the future, a power QoS could be added also and a
> power based cooling device. So any device with the energy model and a
> power capping feature can become a cooling device and the power
> computation part in the cooling devices will move to the back ends. We
> will end up with a generic power cooling device compatible with all
> power capable devices.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@...aro.org>
> ---
>  drivers/base/power/power.h |   2 +
>  drivers/base/power/qos.c   | 158 +++++++++++++++++++++++++-
>  drivers/base/power/sysfs.c |  92 +++++++++++++++
>  include/linux/cpufreq.h    |   2 +
>  include/linux/pm_qos.h     |  42 +++++++
>  kernel/power/qos.c         | 225 +++++++++++++++++++++++++++++++++++++
>  6 files changed, 517 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/power/power.h b/drivers/base/power/power.h
> index 922ed457db19..eb1a77a7a0f4 100644
> --- a/drivers/base/power/power.h
> +++ b/drivers/base/power/power.h
> @@ -78,6 +78,8 @@ extern int pm_qos_sysfs_add_flags(struct device *dev);
>  extern void pm_qos_sysfs_remove_flags(struct device *dev);
>  extern int pm_qos_sysfs_add_latency_tolerance(struct device *dev);
>  extern void pm_qos_sysfs_remove_latency_tolerance(struct device *dev);
> +extern int pm_qos_sysfs_add_perf_limit(struct device *dev);
> +extern void pm_qos_sysfs_remove_perf_limit(struct device *dev);
>  extern int dpm_sysfs_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid);
>  
>  #else /* CONFIG_PM */
> diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> index ae0b9d2573ec..a71cff1f8048 100644
> --- a/drivers/base/power/qos.c
> +++ b/drivers/base/power/qos.c
> @@ -128,6 +128,14 @@ s32 dev_pm_qos_read_value(struct device *dev, enum dev_pm_qos_req_type type)
>  		ret = IS_ERR_OR_NULL(qos) ? PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE
>  			: freq_qos_read_value(&qos->freq, FREQ_QOS_MAX);
>  		break;
> +	case DEV_PM_QOS_MIN_PERF:
> +		ret =  IS_ERR_OR_NULL(qos) ? PM_QOS_MIN_PERF_DEFAULT_VALUE
> +			: perf_qos_read_value(&qos->perf, RANGE_QOS_MIN);
> +		break;
> +	case DEV_PM_QOS_MAX_PERF:
> +		ret =  IS_ERR_OR_NULL(qos) ? PM_QOS_MAX_PERF_DEFAULT_VALUE
> +			: perf_qos_read_value(&qos->perf, RANGE_QOS_MAX);
> +		break;
>  	default:
>  		WARN_ON(1);
>  		ret = 0;
> @@ -177,6 +185,10 @@ static int apply_constraint(struct dev_pm_qos_request *req,
>  		ret = pm_qos_update_flags(&qos->flags, &req->data.flr,
>  					  action, value);
>  		break;
> +	case DEV_PM_QOS_MIN_PERF:
> +	case DEV_PM_QOS_MAX_PERF:
> +		ret = perf_qos_apply(&req->data.perf, action, value);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -223,6 +235,20 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
>  	c->no_constraint_value = PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT;
>  	c->type = PM_QOS_MIN;
>  
> +	c = &qos->perf.lower_bound;
> +	plist_head_init(&c->list);
> +	c->target_value = PM_QOS_MIN_PERF_DEFAULT_VALUE;
> +	c->default_value = PM_QOS_MIN_PERF_DEFAULT_VALUE;
> +	c->no_constraint_value = PM_QOS_MIN_PERF_DEFAULT_VALUE;
> +	c->type = PM_QOS_MAX;
> +
> +	c = &qos->perf.upper_bound;
> +	plist_head_init(&c->list);
> +	c->target_value = PM_QOS_MAX_PERF_DEFAULT_VALUE;
> +	c->default_value = PM_QOS_MAX_PERF_DEFAULT_VALUE;
> +	c->no_constraint_value = PM_QOS_MAX_PERF_DEFAULT_VALUE;
> +	c->type = PM_QOS_MIN;
> +
>  	freq_constraints_init(&qos->freq);
>  
>  	INIT_LIST_HEAD(&qos->flags.list);
> @@ -299,6 +325,20 @@ void dev_pm_qos_constraints_destroy(struct device *dev)
>  		memset(req, 0, sizeof(*req));
>  	}
>  
> +	c = &qos->perf.lower_bound;
> +	plist_for_each_entry_safe(req, tmp, &c->list, data.freq.pnode) {
> +		apply_constraint(req, PM_QOS_REMOVE_REQ,
> +				 PM_QOS_MIN_PERF_DEFAULT_VALUE);
> +		memset(req, 0, sizeof(*req));
> +	}
> +
> +	c = &qos->perf.upper_bound;
> +	plist_for_each_entry_safe(req, tmp, &c->list, data.freq.pnode) {
> +		apply_constraint(req, PM_QOS_REMOVE_REQ,
> +				 PM_QOS_MAX_PERF_DEFAULT_VALUE);
> +		memset(req, 0, sizeof(*req));
> +	}
> +	
>  	f = &qos->flags;
>  	list_for_each_entry_safe(req, tmp, &f->list, data.flr.node) {
>  		apply_constraint(req, PM_QOS_REMOVE_REQ, PM_QOS_DEFAULT_VALUE);
> @@ -349,17 +389,32 @@ static int __dev_pm_qos_add_request(struct device *dev,
>  
>  	req->dev = dev;
>  	req->type = type;
> -	if (req->type == DEV_PM_QOS_MIN_FREQUENCY)
> +
> +	switch (type) {
> +	case DEV_PM_QOS_MIN_FREQUENCY:
>  		ret = freq_qos_add_request(&dev->power.qos->freq,
>  					   &req->data.freq,
>  					   FREQ_QOS_MIN, value);
> -	else if (req->type == DEV_PM_QOS_MAX_FREQUENCY)
> +		break;
> +	case DEV_PM_QOS_MAX_FREQUENCY:
>  		ret = freq_qos_add_request(&dev->power.qos->freq,
>  					   &req->data.freq,
>  					   FREQ_QOS_MAX, value);
> -	else
> +		break;
> +	case DEV_PM_QOS_MIN_PERF:
> +		ret = perf_qos_add_request(&dev->power.qos->perf,
> +					   &req->data.perf,
> +					   RANGE_QOS_MIN, value);
> +		break;
> +	case DEV_PM_QOS_MAX_PERF:
> +		ret = perf_qos_add_request(&dev->power.qos->perf,
> +					   &req->data.perf,
> +					   RANGE_QOS_MAX, value);
> +		break;
> +	default:
>  		ret = apply_constraint(req, PM_QOS_ADD_REQ, value);
> -
> +		break;
> +	}
>  	return ret;
>  }
>  
> @@ -427,6 +482,10 @@ static int __dev_pm_qos_update_request(struct dev_pm_qos_request *req,
>  	case DEV_PM_QOS_MAX_FREQUENCY:
>  		curr_value = req->data.freq.pnode.prio;
>  		break;
> +	case DEV_PM_QOS_MIN_PERF:
> +	case DEV_PM_QOS_MAX_PERF:
> +		curr_value = req->data.perf.pnode.prio;
> +		break;
>  	case DEV_PM_QOS_FLAGS:
>  		curr_value = req->data.flr.flags;
>  		break;
> @@ -674,6 +733,14 @@ static void __dev_pm_qos_drop_user_request(struct device *dev,
>  		req = dev->power.qos->flags_req;
>  		dev->power.qos->flags_req = NULL;
>  		break;
> +	case DEV_PM_QOS_MIN_PERF:
> +		req = dev->power.qos->perf_min_req;
> +		dev->power.qos->perf_min_req = NULL;
> +		break;
> +	case DEV_PM_QOS_MAX_PERF:
> +		req = dev->power.qos->perf_max_req;
> +		dev->power.qos->perf_max_req = NULL;
> +		break;
>  	default:
>  		WARN_ON(1);
>  		return;
> @@ -980,3 +1047,86 @@ void dev_pm_qos_hide_latency_tolerance(struct device *dev)
>  	pm_runtime_put(dev);
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_qos_hide_latency_tolerance);
> +
> +int dev_pm_qos_expose_perf_limit(struct device *dev)
> +{
> +	struct dev_pm_qos_request *req_min;
> +	struct dev_pm_qos_request *req_max;
> +	int ret;
> +
> +	if (!device_is_registered(dev))
> +		return -EINVAL;
> +
> +	req_min = kzalloc(sizeof(*req_min), GFP_KERNEL);
> +	if (!req_min)
> +		return -ENOMEM;
> +
> +	req_max = kzalloc(sizeof(*req_max), GFP_KERNEL);
> +	if (!req_max) {
> +		kfree(req_min);
> +		return -ENOMEM;
> +	}
> +	

Oops, looks like we forgot to run checkpatch ;)

There are many more errors with these patches and I'd urge you to run
checkpatch and fix and re-spin.
Do keep me in CC from next rev.

> +	ret = dev_pm_qos_add_request(dev, req_min, DEV_PM_QOS_MIN_PERF,
> +				     PM_QOS_MIN_PERF_DEFAULT_VALUE);
> +	if (ret < 0) {
> +		kfree(req_min);
> +		kfree(req_max);
> +		return ret;
> +	}
> +
> +	ret = dev_pm_qos_add_request(dev, req_max, DEV_PM_QOS_MAX_PERF,
> +				     PM_QOS_MAX_PERF_DEFAULT_VALUE);
> +	if (ret < 0) {
> +		dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_MIN_PERF);
> +		return ret;
> +	}
> +
> +	mutex_lock(&dev_pm_qos_sysfs_mtx);
> +
> +	mutex_lock(&dev_pm_qos_mtx);
> +
> +	if (IS_ERR_OR_NULL(dev->power.qos))
> +		ret = -ENODEV;
> +	else if (dev->power.qos->perf_min_req || dev->power.qos->perf_max_req)
> +		ret = -EEXIST;
> +
> +	if (ret < 0) {
> +		__dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_MIN_PERF);
> +		__dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_MAX_PERF);
> +		mutex_unlock(&dev_pm_qos_mtx);
> +		goto out;
> +	}
> +
> +	dev->power.qos->perf_min_req = req_min;
> +	dev->power.qos->perf_max_req = req_max;
> +
> +	mutex_unlock(&dev_pm_qos_mtx);
> +
> +	ret = pm_qos_sysfs_add_perf_limit(dev);
> +	if (ret) {
> +		dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_MIN_PERF);
> +		dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_MAX_PERF);
> +	}
> +out:
> +	mutex_unlock(&dev_pm_qos_sysfs_mtx);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(dev_pm_qos_expose_perf_limit);
> +
> +void dev_pm_qos_hide_perf_limit(struct device *dev)
> +{
> +	mutex_lock(&dev_pm_qos_sysfs_mtx);
> +
> +	pm_qos_sysfs_remove_perf_limit(dev);
> +
> +	mutex_lock(&dev_pm_qos_mtx);
> +
> +	__dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_MIN_PERF);
> +	__dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_MAX_PERF);
> +	

whitespace.. ^^

> +	mutex_unlock(&dev_pm_qos_mtx);
> +
> +	mutex_unlock(&dev_pm_qos_sysfs_mtx);
> +}
> +EXPORT_SYMBOL_GPL(dev_pm_qos_hide_perf_limit);
> diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
> index a1474fb67db9..5a45191006c1 100644
> --- a/drivers/base/power/sysfs.c
> +++ b/drivers/base/power/sysfs.c
> @@ -317,6 +317,76 @@ static ssize_t pm_qos_no_power_off_store(struct device *dev,
>  
>  static DEVICE_ATTR_RW(pm_qos_no_power_off);
>  
> +
> +static ssize_t pm_qos_perf_limit_min_max_show(struct device *dev,
> +					      struct device_attribute *attr,
> +					      char *buf, bool max)
> +{
> +	s32 value = dev_pm_qos_read_value(dev, max ? DEV_PM_QOS_MAX_PERF :
> +					  DEV_PM_QOS_MIN_PERF);
> +
> +	return sysfs_emit(buf, "%d\n", value);
> +}
> +
> +static ssize_t pm_qos_perf_limit_min_max_store(struct device *dev,
> +					       struct device_attribute *attr,
> +					       const char *buf, size_t n, bool max)
> +{
> +	int ret;

Your function return type is ssize_t,  do you want to change the type of
ret too?

> +	s32 min_value = dev_pm_qos_read_value(dev, DEV_PM_QOS_MIN_PERF);
> +	s32 max_value = dev_pm_qos_read_value(dev, DEV_PM_QOS_MAX_PERF);
> +	s32 new_value;
> +
> +	if (kstrtoint(buf, 0, &new_value))
> +		return -EINVAL;
> +
> +	if (new_value < PM_QOS_MIN_PERF_DEFAULT_VALUE ||
> +	    new_value > PM_QOS_MAX_PERF_DEFAULT_VALUE)
> +		return -EINVAL;
> +
> +	if (max && (new_value < min_value))
> +		return -EINVAL;
> +
> +	if (!max && (new_value > max_value))
> +		return -EINVAL;

No strong opinions, but might help debug better if you print why each
EINVAL was returned?

> +
> +	ret = dev_pm_qos_update_request(max ? dev->power.qos->perf_max_req :
> +					dev->power.qos->perf_min_req, new_value);
> +
> +	return ret < 0 ? ret : n;
> +}
> +
> +static ssize_t pm_qos_perf_limit_min_show(struct device *dev,
> +						 struct device_attribute *attr,
> +						 char *buf)
> +{
> +	return pm_qos_perf_limit_min_max_show(dev, attr, buf, false);
> +}
> +
> +static ssize_t pm_qos_perf_limit_min_store(struct device *dev,
> +						  struct device_attribute *attr,
> +						  const char *buf, size_t n)
> +{
> +	return pm_qos_perf_limit_min_max_store(dev, attr, buf, n, false);
> +}
> +
> +static ssize_t pm_qos_perf_limit_max_show(struct device *dev,
> +						 struct device_attribute *attr,
> +						 char *buf)
> +{
> +	return pm_qos_perf_limit_min_max_show(dev, attr, buf, true);
> +}
> +
> +static ssize_t pm_qos_perf_limit_max_store(struct device *dev,
> +						  struct device_attribute *attr,
> +						  const char *buf, size_t n)
> +{
> +	return pm_qos_perf_limit_min_max_store(dev, attr, buf, n, true);
> +}
> +
> +static DEVICE_ATTR_RW(pm_qos_perf_limit_min);
> +static DEVICE_ATTR_RW(pm_qos_perf_limit_max);
> +
[...]

-- 
Best regards,
Dhruva Gole <d-gole@...com>