[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <019bbcd9-7bbc-45bb-9c05-f59a4c90c26e@nvidia.com>
Date: Tue, 9 Dec 2025 22:08:19 +0530
From: Sumit Gupta <sumitg@...dia.com>
To: Pierre Gondois <pierre.gondois@....com>
Cc: linux-kernel@...r.kernel.org, acpica-devel@...ts.linux.dev,
linux-doc@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-pm@...r.kernel.org, zhanjie9@...ilicon.com, ionela.voinescu@....com,
perry.yuan@....com, mario.limonciello@....com, gautham.shenoy@....com,
rdunlap@...radead.org, zhenglifeng1@...wei.com, corbet@....net,
robert.moore@...el.com, lenb@...nel.org, viresh.kumar@...aro.org,
linux-tegra@...r.kernel.org, treding@...dia.com, jonathanh@...dia.com,
vsethi@...dia.com, ksitaraman@...dia.com, sanjayc@...dia.com,
nhartman@...dia.com, bbasu@...dia.com, rafael@...nel.org, ray.huang@....com,
sumitg@...dia.com
Subject: Re: [PATCH v4 4/8] ACPI: CPPC: add APIs and sysfs interface for
min/max_perf
On 27/11/25 20:24, Pierre Gondois wrote:
> External email: Use caution opening links or attachments
>
>
> On 11/5/25 12:38, Sumit Gupta wrote:
>> CPPC allows platforms to specify minimum and maximum performance
>> limits that constrain the operating range for CPU performance scaling
>> when Autonomous Selection is enabled. These limits can be dynamically
>> adjusted to implement power management policies or workload-specific
>> optimizations.
>>
>> Add cppc_get_min_perf() and cppc_set_min_perf() functions to read and
>> write the MIN_PERF register, allowing dynamic adjustment of the minimum
>> performance floor.
>>
>> Add cppc_get_max_perf() and cppc_set_max_perf() functions to read and
>> write the MAX_PERF register, enabling dynamic ceiling control for
>> maximum performance.
>>
>> Expose these capabilities through cpufreq sysfs attributes that accept
>> frequency values in kHz (which are converted to/from performance values
>> internally):
>> - /sys/.../cpufreq/policy*/min_perf: Read/write min perf as freq (kHz)
>> - /sys/.../cpufreq/policy*/max_perf: Read/write max perf as freq (kHz)
>>
>> The frequency-based interface provides a user-friendly abstraction which
>> is similar to other cpufreq sysfs interfaces, while the driver handles
>> conversion to hardware performance values.
>>
>> Also update EPP constants for better clarity:
>> - Rename CPPC_ENERGY_PERF_MAX to CPPC_EPP_ENERGY_EFFICIENCY_PREF
>> - Add CPPC_EPP_PERFORMANCE_PREF for the performance-oriented setting
>>
>> Signed-off-by: Sumit Gupta<sumitg@...dia.com>
>> ---
>> drivers/acpi/cppc_acpi.c | 55 ++++++++++-
>> drivers/cpufreq/cppc_cpufreq.c | 166 +++++++++++++++++++++++++++++++++
>> include/acpi/cppc_acpi.h | 23 ++++-
>> 3 files changed, 242 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
>> index 757e8ce87e9b..ef53eb8a1feb 100644
>> --- a/drivers/acpi/cppc_acpi.c
>> +++ b/drivers/acpi/cppc_acpi.c
>> @@ -1634,7 +1634,7 @@ EXPORT_SYMBOL_GPL(cppc_set_epp_perf);
>> */
>> int cppc_set_epp(int cpu, u64 epp_val)
>> {
>> - if (epp_val > CPPC_ENERGY_PERF_MAX)
>> + if (epp_val > CPPC_EPP_ENERGY_EFFICIENCY_PREF)
>> return -EINVAL;
>>
>> return cppc_set_reg_val(cpu, ENERGY_PERF, epp_val);
>> @@ -1757,6 +1757,59 @@ int cppc_set_enable(int cpu, bool enable)
>> return cppc_set_reg_val(cpu, ENABLE, enable);
>> }
>> EXPORT_SYMBOL_GPL(cppc_set_enable);
>> +
>> +/**
>> + * cppc_get_min_perf - Get the min performance register value.
>> + * @cpu: CPU from which to get min performance.
>> + * @min_perf: Return address.
>> + *
>> + * Return: 0 for success, -EIO on register access failure,
>> -EOPNOTSUPP if not supported.
>> + */
>> +int cppc_get_min_perf(int cpu, u64 *min_perf)
>> +{
>> + return cppc_get_reg_val(cpu, MIN_PERF, min_perf);
>> +}
>> +EXPORT_SYMBOL_GPL(cppc_get_min_perf);
>> +
>> +/**
>> + * cppc_set_min_perf() - Write the min performance register.
>> + * @cpu: CPU on which to write register.
>> + * @min_perf: Value to write to the MIN_PERF register.
>> + *
>> + * Return: 0 for success, -EIO otherwise.
>> + */
>> +int cppc_set_min_perf(int cpu, u64 min_perf)
>> +{
>> + return cppc_set_reg_val(cpu, MIN_PERF, min_perf);
>> +}
>> +EXPORT_SYMBOL_GPL(cppc_set_min_perf);
>> +
>> +/**
>> + * cppc_get_max_perf - Get the max performance register value.
>> + * @cpu: CPU from which to get max performance.
>> + * @max_perf: Return address.
>> + *
>> + * Return: 0 for success, -EIO on register access failure,
>> -EOPNOTSUPP if not supported.
>> + */
>> +int cppc_get_max_perf(int cpu, u64 *max_perf)
>> +{
>> + return cppc_get_reg_val(cpu, MAX_PERF, max_perf);
>> +}
>> +EXPORT_SYMBOL_GPL(cppc_get_max_perf);
>> +
>> +/**
>> + * cppc_set_max_perf() - Write the max performance register.
>> + * @cpu: CPU on which to write register.
>> + * @max_perf: Value to write to the MAX_PERF register.
>> + *
>> + * Return: 0 for success, -EIO otherwise.
>> + */
>> +int cppc_set_max_perf(int cpu, u64 max_perf)
>> +{
>> + return cppc_set_reg_val(cpu, MAX_PERF, max_perf);
>> +}
>> +EXPORT_SYMBOL_GPL(cppc_set_max_perf);
>> +
>> /**
>> * cppc_get_perf - Get a CPU's performance controls.
>> * @cpu: CPU for which to get performance controls.
>> diff --git a/drivers/cpufreq/cppc_cpufreq.c
>> b/drivers/cpufreq/cppc_cpufreq.c
>> index cf3ed6489a4f..cde6202e9c51 100644
>> --- a/drivers/cpufreq/cppc_cpufreq.c
>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>> @@ -23,10 +23,12 @@
>> #include <uapi/linux/sched/types.h>
>>
>> #include <linux/unaligned.h>
>> +#include <linux/cleanup.h>
>>
>> #include <acpi/cppc_acpi.h>
>>
>> static struct cpufreq_driver cppc_cpufreq_driver;
>> +static DEFINE_MUTEX(cppc_cpufreq_update_autosel_config_lock);
>>
>> #ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
>> static enum {
>> @@ -582,6 +584,68 @@ static void cppc_cpufreq_put_cpu_data(struct
>> cpufreq_policy *policy)
>> policy->driver_data = NULL;
>> }
>>
>> +/**
>> + * cppc_cpufreq_set_mperf_limit - Generic function to set min/max
>> performance limit
>> + * @policy: cpufreq policy
>> + * @val: performance value to set
>> + * @update_reg: whether to update hardware register
>
> I m not sure I see in which case we might not want to update the
> hardware register.
> Isn't the min/max_perf values relevant even when autonomous selection is
> disabled/absent ?
>
Explained in reply on 'patch 7/8'. Adding here also brief info.
When disabling auto_sel, only the policy limits are reset, the
min/max_perf registers are preserved.
When re-enabled, these preserved values are restored to both
hardware reg and policy.
>
>> + * @update_policy: whether to update policy constraints
>> + * @is_min: true for min_perf, false for max_perf
>> + */
>> +static int cppc_cpufreq_set_mperf_limit(struct cpufreq_policy
>> *policy, u64 val,
>> + bool update_reg, bool
>> update_policy, bool is_min)
>> +{
>> + struct cppc_cpudata *cpu_data = policy->driver_data;
>> + struct cppc_perf_caps *caps = &cpu_data->perf_caps;
>> + unsigned int cpu = policy->cpu;
>> + struct freq_qos_request *req;
>> + unsigned int freq;
>> + u32 perf;
>> + int ret;
>> +
>> + perf = clamp(val, caps->lowest_perf, caps->highest_perf);
>> + freq = cppc_perf_to_khz(caps, perf);
>> +
>> + pr_debug("cpu%d, %s_perf:%llu, update_reg:%d,
>> update_policy:%d\n", cpu,
>> + is_min ? "min" : "max", (u64)perf, update_reg,
>> update_policy);
>> +
>> + guard(mutex)(&cppc_cpufreq_update_autosel_config_lock);
>> +
>> + if (update_reg) {
>> + ret = is_min ? cppc_set_min_perf(cpu, perf) :
>> cppc_set_max_perf(cpu, perf);
>> + if (ret) {
>> + if (ret != -EOPNOTSUPP)
>> + pr_warn("Failed to set %s_perf (%llu)
>> on CPU%d (%d)\n",
>> + is_min ? "min" : "max",
>> (u64)perf, cpu, ret);
>> + return ret;
>> + }
>> +
>> + if (is_min)
>> + cpu_data->perf_ctrls.min_perf = perf;
>> + else
>> + cpu_data->perf_ctrls.max_perf = perf;
>> + }
>> +
>> + if (update_policy) {
>> + req = is_min ? policy->min_freq_req :
>> policy->max_freq_req;
>> +
>> + ret = freq_qos_update_request(req, freq);
>
> IIUC, we are adding a qos constraint to the min_freq_req or
> max_freq_req. However these constraints should match the
> scaling_min/max_freq sysfs interface. So doesn't it mean that if we set
> the 'max_perf', we are overwriting the the max_freq_req constraint ?
>
Yes.
> If you have frequencies between 600000:1200000 # Init state:
> max_perf:1200000 scaling_max_freq:1200000 # echo 10000000 > max_perf
> max_perf:1000000 scaling_max_freq:1000000 # echo 900000 >
> scaling_max_freq max_perf:1000000 scaling_max_freq:900000 # echo 1200000
> > scaling_max_freq max_perf:1000000 scaling_max_freq:1200000
>
> The 2 values are not in sync. Is it the desired behaviour ?
>
>
Making scaling_min/max_freq read-only in auto_sel mode will solve this.
We can do this by setting policy limits to min/max_perf bounds in
cppc_verify_policy() when the auto_sel is enabled.
In autonomous mode, the hardware controls performance within these
bounds, so scaling_min/max_freq is effectively read-only.
Users must use min_perf/max_perf sysfs to change limits.
Please share if you have different thoughts or another approach.
cppc_verify_policy(struct cpufreq_policy_data *policy_data)
{
...
if (caps->auto_sel) {
min_perf = cpu_data->perf_ctrls.min_perf ?:
caps->lowest_nonlinear_perf;
max_perf = cpu_data->perf_ctrls.max_perf ?: caps->nominal_perf;
/* set min/max_perf bounds (read-only behavior) */
policy_data->min = cppc_perf_to_khz(caps, min_perf);
policy_data->max = cppc_perf_to_khz(caps, max_perf);
} else {
cpufreq_verify_within_limits(policy_data, min_freq, max_freq);
}
....
}
>> + if (ret < 0) {
>> + pr_warn("Failed to update %s_freq constraint
>> for CPU%d: %d\n",
>> + is_min ? "min" : "max", cpu, ret);
>> + return ret;
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +#define cppc_cpufreq_set_min_perf(policy, val, update_reg,
>> update_policy) \
>> + cppc_cpufreq_set_mperf_limit(policy, val, update_reg,
>> update_policy, true)
>> +
>> +#define cppc_cpufreq_set_max_perf(policy, val, update_reg,
>> update_policy) \
>> + cppc_cpufreq_set_mperf_limit(policy, val, update_reg,
>> update_policy, false)
>> +
>> static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>> {
>> unsigned int cpu = policy->cpu;
>> @@ -881,16 +945,118 @@ static ssize_t
>> store_energy_performance_preference_val(struct cpufreq_policy *po
>> return cppc_cpufreq_sysfs_store_u64(policy->cpu, cppc_set_epp,
>> buf, count);
>> }
>>
>> +/**
>> + * show_min_perf - Show minimum performance as frequency (kHz)
>> + *
>> + * Reads the MIN_PERF register and converts the performance value to
>> + * frequency (kHz) for user-space consumption.
>> + */
>> +static ssize_t show_min_perf(struct cpufreq_policy *policy, char *buf)
>> +{
>> + struct cppc_cpudata *cpu_data = policy->driver_data;
>> + u64 perf;
>> + int ret;
>> +
>> + ret = cppc_get_min_perf(policy->cpu, &perf);
>> + if (ret == -EOPNOTSUPP)
>> + return sysfs_emit(buf, "<unsupported>\n");
>> + if (ret)
>> + return ret;
>> +
>> + /* Convert performance to frequency (kHz) for user */
>> + return sysfs_emit(buf, "%u\n",
>> cppc_perf_to_khz(&cpu_data->perf_caps, perf));
>> +}
>> +
>> +/**
>> + * store_min_perf - Set minimum performance from frequency (kHz)
>> + *
>> + * Converts the user-provided frequency (kHz) to a performance value
>> + * and writes it to the MIN_PERF register.
>> + */
>> +static ssize_t store_min_perf(struct cpufreq_policy *policy, const
>> char *buf, size_t count)
>> +{
>> + struct cppc_cpudata *cpu_data = policy->driver_data;
>> + unsigned int freq_khz;
>> + u64 perf;
>> + int ret;
>> +
>> + ret = kstrtouint(buf, 0, &freq_khz);
>> + if (ret)
>> + return ret;
>> +
>> + /* Convert frequency (kHz) to performance value */
>> + perf = cppc_khz_to_perf(&cpu_data->perf_caps, freq_khz);
>> +
>> + ret = cppc_cpufreq_set_min_perf(policy, perf, true,
>> cpu_data->perf_caps.auto_sel);
>> + if (ret)
>> + return ret;
>> +
>> + return count;
>> +}
>> +
>> +/**
>> + * show_max_perf - Show maximum performance as frequency (kHz)
>> + *
>> + * Reads the MAX_PERF register and converts the performance value to
>> + * frequency (kHz) for user-space consumption.
>> + */
>> +static ssize_t show_max_perf(struct cpufreq_policy *policy, char *buf)
>
> I think it might collide with the scaling_min/max_freq.
> I saw that you answered this point at:
> https://lore.kernel.org/lkml/b2bd3258-51bd-462a-ae29-71f1d6f823f3@nvidia.com/
>
>
> But I m not sure I understood why it is needed to have 2 interfaces.
> Would it be possible to explain it again ?
Separate interface for min/max_perf are kept because we are writing
to different CPPC hardware registers with that name.
>
> I don't see any case where we would like to make a distinction between:
> - scaling_max_freq, i.e. the maximal freq. the cpufreq driver is allowed
> to set
> - max_perf, i.e. the maximal perf. level the firmware will set
>
> ------------
>
> Another point is that the min/max_perf interface actually uses freq.
> values.
Changed the min/max_perf interfaces from perf to freq to sync their scale
with other cpufreq sysfs interfaces after discussion in [1].
[1]
https://lore.kernel.org/lkml/80e16de0-63e4-4ead-9577-4ebba9b1a02d@nvidia.com/
Thank you,
Sumit Gupta
Powered by blists - more mailing lists