[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7641627f-ab0d-478c-9d52-6ed4745bd61b@nvidia.com>
Date: Thu, 4 Sep 2025 17:31:56 +0530
From: Sumit Gupta <sumitg@...dia.com>
To: Mario Limonciello <superm1@...nel.org>, rafael@...nel.org,
viresh.kumar@...aro.org, lenb@...nel.org, robert.moore@...el.com,
corbet@....net, pierre.gondois@....com, zhenglifeng1@...wei.com,
ray.huang@....com, gautham.shenoy@....com, perry.yuan@....com,
linux-pm@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-doc@...r.kernel.org, acpica-devel@...ts.linux.dev,
linux-kernel@...r.kernel.org
Cc: linux-tegra@...r.kernel.org, treding@...dia.com, jonathanh@...dia.com,
vsethi@...dia.com, ksitaraman@...dia.com, sanjayc@...dia.com,
bbasu@...dia.com, sumitg@...dia.com
Subject: Re: [PATCH v2 7/7] cpufreq: CPPC: add autonomous mode boot parameter
support
On 03/09/25 01:18, Mario Limonciello wrote:
> External email: Use caution opening links or attachments
>
>
> On 9/1/2025 8:37 AM, Sumit Gupta wrote:
>>
>> On 26/08/25 00:10, Mario Limonciello wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 8/23/2025 3:01 PM, Sumit Gupta wrote:
>>>> Add kernel boot parameter 'cppc_cpufreq.auto_sel_mode' to enable CPPC
>>>> autonomous performance selection at system startup. When autonomous
>>>> mode
>>>> is enabled, the hardware automatically adjusts CPU performance
>>>> based on
>>>> workload demands using Energy Performance Preference (EPP) hints from
>>>> the OS.
>>>>
>>>> This parameter allows to configure the autonomous mode on all CPUs
>>>> without requiring runtime sysfs manipulation if the 'auto_sel'
>>>> register
>>>> is present.
>>>>
>>>> When auto_sel_mode=1:
>>>> - All CPUs are configured for autonomous operation during driver init
>>>> - EPP is set to performance preference (0x0) by default
>>>> - Min/max performance bounds use defaults
>>>> - CPU frequency scaling is handled by hardware rather than OS
>>>>
>>>> Also ensure that when autonomous mode is active, the set_target
>>>> callback
>>>> returns early since hardware controls frequency scaling directly.
>>>>
>>>> Signed-off-by: Sumit Gupta <sumitg@...dia.com>
>>>> ---
>>>> .../admin-guide/kernel-parameters.txt | 12 ++
>>>> drivers/cpufreq/cppc_cpufreq.c | 171
>>>> ++++++++++++++++--
>>>> 2 files changed, 168 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/
>>>> Documentation/admin-guide/kernel-parameters.txt
>>>> index 86f395f2933b..ea58deb88c36 100644
>>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>>> @@ -911,6 +911,18 @@
>>>> Format:
>>>> <first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
>>>>
>>>> + cppc_cpufreq.auto_sel_mode=
>>>> + [CPU_FREQ] Autonomous Performance Level
>>>> Selection.
>>>> + When Autonomous selection is enabled, then the
>>>> hardware is
>>>> + allowed to autonomously select the CPU
>>>> frequency.
>>>> + In Autonomous mode, Energy Performance
>>>> Preference(EPP)
>>>> + provides input to the hardware to favour
>>>> performance (0x0)
>>>> + or energy efficiency (0xff).
>>>> + Format: <bool>
>>>> + Default: disabled.
>>>> + 0: force disabled
>>>> + 1: force enabled
>>>
>>> I don't think you can actually force enable. If the hardware doesn't
>>> support it, setting 1 won't do anything.
>>>
>>> IoW really setting 1 is "enable if supported".
>>>
>>
>> Yes, will change in v3.
>>
>>
>>>> +
>>>> cpuidle.off=1 [CPU_IDLE]
>>>> disable the cpuidle sub-system
>>>>
>>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/
>>>> cppc_cpufreq.c
>>>> index 5e1bbb5f67b8..bbf654c56ff9 100644
>>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>>> @@ -27,6 +27,8 @@
>>>> #include <acpi/cppc_acpi.h>
>>>>
>>>> static struct cpufreq_driver cppc_cpufreq_driver;
>>>> +/* Autonomous Selection */
>>>> +static bool auto_sel_mode;
>>>>
>>>> #ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
>>>> static enum {
>>>> @@ -272,6 +274,14 @@ static int cppc_cpufreq_set_target(struct
>>>> cpufreq_policy *policy,
>>>> freqs.old = policy->cur;
>>>> freqs.new = target_freq;
>>>>
>>>> + /*
>>>> + * In autonomous mode, hardware handles frequency scaling
>>>> directly
>>>> + * based on workload demands and EPP hints, so OS frequency
>>>> requests
>>>> + * are not needed.
>>>> + */
>>>> + if (cpu_data->perf_caps.auto_sel)
>>>> + return ret;
>>>> +
>>>> cpufreq_freq_transition_begin(policy, &freqs);
>>>> ret = cppc_set_perf_ctrls(cpu, &cpu_data->perf_ctrls);
>>>> cpufreq_freq_transition_end(policy, &freqs, ret != 0);
>>>> @@ -555,6 +565,12 @@ static struct cppc_cpudata
>>>> *cppc_cpufreq_get_cpu_data(unsigned int cpu)
>>>> goto free_mask;
>>>> }
>>>>
>>>> + ret = cppc_get_perf_ctrls(cpu, &cpu_data->perf_ctrls);
>>>> + if (ret) {
>>>> + pr_debug("Err reading CPU%d perf ctrls: ret:%d\n", cpu,
>>>> ret);
>>>> + goto free_mask;
>>>> + }
>>>> +
>>>> return cpu_data;
>>>>
>>>> free_mask:
>>>> @@ -642,6 +658,79 @@ static int cppc_cpufreq_set_max_perf(struct
>>>> cpufreq_policy *policy, u64 val,
>>>> return (ret == -EOPNOTSUPP) ? 0 : ret;
>>>> }
>>>>
>>>> +static int cppc_cpufreq_update_autosel_epp(struct cpufreq_policy
>>>> *policy, int auto_sel, u32 epp)
>>>> +{
>>>> + struct cppc_cpudata *cpu_data = policy->driver_data;
>>>> + unsigned int cpu = policy->cpu;
>>>> + int ret;
>>>> +
>>>> + pr_debug("cpu%d: curr epp:%u, curr mode:%u, new epp:%u, new
>>>> mode:%d\n", cpu,
>>>> + cpu_data->perf_ctrls.energy_perf, cpu_data-
>>>> >perf_caps.auto_sel, epp, auto_sel);
>>>> +
>>>> + mutex_lock(&cppc_cpufreq_update_autosel_config_lock);
>>>
>>> As I noticed below a case you missed the mutex unlock, this feels
>>> like a
>>> good candidate for
>>>
>>> guard(mutex)();
>>>
>>
>> I will check about 'guard(mutex)()' and change accordingly in v3.
>>
>>>> +
>>>> + ret = cppc_set_epp(cpu, epp);
>>>> + if (ret) {
>>>> + pr_warn("failed to set energy_perf for cpu:%d (%d)\n",
>>>> cpu, ret);
>>>> + goto out;
>>>> + }
>>>> + cpu_data->perf_ctrls.energy_perf = epp;
>>>> +
>>>> + ret = cppc_set_auto_sel(cpu, auto_sel);
>>>> + if (ret) {
>>>> + pr_warn("failed to set auto_sel for cpu:%d (%d)\n",
>>>> cpu, ret);
>>>> + return ret;
>>>
>>> Looks like a case that you didn't unlock the mutex.
>>
>> ACK.
>>
>>>
>>>> + }
>>>> + cpu_data->perf_caps.auto_sel = auto_sel;
>>>> +
>>>> +out:
>>>> + mutex_unlock(&cppc_cpufreq_update_autosel_config_lock);
>>>> + return ret;
>>>> +}
>>>> +
>>>> +static int cppc_cpufreq_update_autosel_mperf_ctrls(struct
>>>> cpufreq_policy *policy, u32 min_p,
>>>> + u32 max_p, bool
>>>> update_reg, bool update_policy)
>>>> +{
>>>> + struct cppc_cpudata *cpu_data = policy->driver_data;
>>>> + unsigned int cpu = policy->cpu;
>>>> + int ret;
>>>> +
>>>> + pr_debug("cpu%d: curr max_perf:%u, curr min_perf:%u, new
>>>> max_perf:%u, new min_perf:%u\n",
>>>> + cpu, cpu_data->perf_ctrls.max_perf, cpu_data-
>>>> >perf_ctrls.min_perf, max_p, min_p);
>>>> +
>>>> + ret = cppc_cpufreq_set_min_perf(policy, min_p, update_reg,
>>>> update_policy);
>>>> + if (ret) {
>>>> + pr_debug("failed to set min_perf for cpu:%d (%d)\n",
>>>> cpu, ret);
>>>> + return ret;
>>>> + }
>>>> +
>>>> + ret = cppc_cpufreq_set_max_perf(policy, max_p, update_reg,
>>>> update_policy);
>>>> + if (ret) {
>>>> + pr_debug("failed to set max_perf for cpu:%d (%d)\n",
>>>> cpu, ret);
>>>> + return ret;
>>>> + }
>>>> +
>>>> + return ret;
>>>> +}
>>>> +
>>>> +static int cppc_cpufreq_update_autosel_configs(struct cpufreq_policy
>>>> *policy, int auto_sel,
>>>> + u32 epp, u32 min_perf,
>>>> u32 max_perf,
>>>> + bool update_reg, bool
>>>> update_policy)
>>>> +{
>>>> + int ret;
>>>> +
>>>> + ret = cppc_cpufreq_update_autosel_mperf_ctrls(policy, min_perf,
>>>> max_perf,
>>>> + update_reg,
>>>> update_policy);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + ret = cppc_cpufreq_update_autosel_epp(policy, auto_sel, epp);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>>>> {
>>>> unsigned int cpu = policy->cpu;
>>>> @@ -710,11 +799,28 @@ static int cppc_cpufreq_cpu_init(struct
>>>> cpufreq_policy *policy)
>>>> policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
>>>> cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
>>>>
>>>> - ret = cppc_set_perf_ctrls(cpu, &cpu_data->perf_ctrls);
>>>> - if (ret) {
>>>> - pr_debug("Err setting perf value:%d on CPU:%d.
>>>> ret:%d\n",
>>>> - caps->highest_perf, cpu, ret);
>>>> - goto out;
>>>> + if (cpu_data->perf_caps.auto_sel) {
>>>> + ret = cppc_set_enable(cpu, true);
>>>> + if (ret) {
>>>> + pr_err("Failed to enable CPPC on cpu%d (%d)\n",
>>>> cpu, ret);
>>>> + goto out;
>>>> + }
>>>> +
>>>> + ret = cppc_cpufreq_update_autosel_configs(policy, true,
>>>> + CPPC_EPP_PERFORMANCE_PREF,
>>>> + caps->lowest_nonlinear_perf,
>>>> + caps->nominal_perf, true, false);
>>>> + if (ret) {
>>>> + pr_debug("Failed to update autosel configs on
>>>> CPU%d(%d)\n", cpu, ret);
>>>> + goto out;
>>>> + }
>>>> + } else {
>>>> + ret = cppc_set_perf_ctrls(cpu, &cpu_data->perf_ctrls);
>>>> + if (ret) {
>>>> + pr_debug("Err setting perf value:%d on CPU:%d.
>>>> ret:%d\n",
>>>> + caps->highest_perf, cpu, ret);
>>>> + goto out;
>>>> + }
>>>> }
>>>>
>>>> cppc_cpufreq_cpu_fie_init(policy);
>>>> @@ -736,6 +842,13 @@ static void cppc_cpufreq_cpu_exit(struct
>>>> cpufreq_policy *policy)
>>>>
>>>> cpu_data->perf_ctrls.desired_perf = caps->lowest_perf;
>>>>
>>>> + if (cpu_data->perf_caps.auto_sel) {
>>>> + ret = cppc_cpufreq_update_autosel_epp(policy, false,
>>>> + CPPC_EPP_ENERGY_EFFICIENCY_PREF);
>>>> + if (ret)
>>>> + return;
>>>> + }
>>>> +
>>>> ret = cppc_set_perf_ctrls(cpu, &cpu_data->perf_ctrls);
>>>> if (ret)
>>>> pr_debug("Err setting perf value:%d on CPU:%d.
>>>> ret:%d\n",
>>>> @@ -920,17 +1033,10 @@ static ssize_t store_auto_select(struct
>>>> cpufreq_policy *policy,
>>>> * On enabling auto_select: set min/max_perf register and
>>>> update policy.
>>>> * On disabling auto_select: update only policy.
>>>> */
>>>> - ret = cppc_cpufreq_set_min_perf(policy, min_perf, update_reg,
>>>> true);
>>>> - if (ret) {
>>>> - pr_warn("failed to %s update min policy for cpu:%d
>>>> (%d)\n",
>>>> - val > 0 ? "set min_perf and" : "", cpu, ret);
>>>> - return ret;
>>>> - }
>>>> -
>>>> - ret = cppc_cpufreq_set_max_perf(policy, max_perf, update_reg,
>>>> true);
>>>> + ret = cppc_cpufreq_update_autosel_mperf_ctrls(policy, min_perf,
>>>> max_perf, update_reg, true);
>>>> if (ret) {
>>>> - pr_warn("failed to %s update max policy for cpu:%d
>>>> (%d)\n",
>>>> - val > 0 ? "set max_perf and" : "", cpu, ret);
>>>> + pr_warn("failed to %s update policy for cpu:%d (%d)\n",
>>>> + val > 0 ? "set min/max_perf and" : "", cpu,
>>>> ret);
>>>> return ret;
>>>> }
>>>>
>>>> @@ -1139,13 +1245,44 @@ static struct cpufreq_driver
>>>> cppc_cpufreq_driver = {
>>>> .name = "cppc_cpufreq",
>>>> };
>>>>
>>>> +static void cppc_cpufreq_set_epp_autosel_allcpus(bool auto_sel, u64
>>>> epp)
>>>> +{
>>>> + int cpu, ret;
>>>> +
>>>> + for_each_present_cpu(cpu) {
>>>> + ret = cppc_set_epp(cpu, epp);
>>>> + if (ret)
>>>> + pr_debug("failed to set energy_perf for cpu:%d
>>>> (%d)\n", cpu, ret);
>>>> +
>>>> + ret = cppc_set_auto_sel(cpu, auto_sel);
>>>> + if (ret)
>>>> + pr_debug("failed to set auto_sel for cpu:%d
>>>> (%d)\n", cpu, ret);
>>>> + }
>>>> +}
>>>> +
>>>> static int __init cppc_cpufreq_init(void)
>>>> {
>>>> + bool auto_sel;
>>>> int ret;
>>>>
>>>> if (!acpi_cpc_valid())
>>>> return -ENODEV;
>>>>
>>>> + if (auto_sel_mode) {
>>>> + /*
>>>> + * Check if autonomous selection is supported by
>>>> testing CPU 0.
>>>> + * If supported, enable autonomous mode on all CPUs.
>>>> + */
>>>> + ret = cppc_get_auto_sel(0, &auto_sel);
>>>> + if (!ret) {
>>>> + pr_info("Enabling autonomous mode on all
>>>> CPUs\n");
>>>> + cppc_cpufreq_set_epp_autosel_allcpus(true,
>>>> CPPC_EPP_PERFORMANCE_PREF);
>>>> + } else {
>>>> + pr_warn("Autonomous selection not supported,
>>>> disabling auto_sel_mode\n");
>>>> + auto_sel_mode = false;
>>>> + }
>>>> + }
>>>> +
>>>> cppc_freq_invariance_init();
>>>> populate_efficiency_class();
>>>>
>>>> @@ -1160,8 +1297,12 @@ static void __exit cppc_cpufreq_exit(void)
>>>> {
>>>> cpufreq_unregister_driver(&cppc_cpufreq_driver);
>>>> cppc_freq_invariance_exit();
>>>> + auto_sel_mode = 0;
>>>> }
>>>>
>>>> +module_param(auto_sel_mode, bool, 0000);
>>>> +MODULE_PARM_DESC(auto_sel_mode, "Enable Autonomous Performance Level
>>>> Selection");
>>>
>>> Why default to disabled? As a precaution? We enable EPP by default in
>>> the *-pstate drivers if the hardware supports it, I would think it
>>> makes
>>> sense here too.
>>>
>>
>> Kept disabled by default both as precaution and to enable it based on
>> preference.
>> Someone may want to try different config values during bringup,
>> verification etc.
>
> I personally don't think that bringup software should dictate the steady
> state decision. Bringup can trivially have a custom kernel command line
> that disables it.
>
> Otherwise this means that the "worries of stability" translate into a
> custom kernel command line option on all production hardware.
>
> So I feel once you're happy with it on a variety of hardware you should
> consider a patch for enabling it by default at some point.
>
>
Sure.
Thank you,
Sumit Gupta
Powered by blists - more mailing lists