[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f3849033-e883-4296-abb7-eb04e8c2a03c@amd.com>
Date: Fri, 7 Mar 2025 22:30:25 -0600
From: Mario Limonciello <mario.limonciello@....com>
To: Mario Limonciello <superm1@...nel.org>,
"Gautham R. Shenoy" <gautham.shenoy@....com>
Cc: Perry Yuan <perry.yuan@....com>,
Dhananjay Ugwekar <Dhananjay.Ugwekar@....com>,
"open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
<linux-kernel@...r.kernel.org>,
"open list:CPU FREQUENCY SCALING FRAMEWORK" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH v2 3/5] cpufreq/amd-pstate: Add support for platform
profile class
On 3/7/2025 10:55, Mario Limonciello wrote:
> On 3/7/2025 10:22, Gautham R. Shenoy wrote:
>> On Tue, Mar 04, 2025 at 09:23:25AM -0600, Mario Limonciello wrote:
>>> From: Mario Limonciello <mario.limonciello@....com>
>>>
>>> The platform profile core allows multiple drivers and devices to
>>> register platform profile support.
>>>
>>> When the legacy platform profile interface is used all drivers will
>>> adjust the platform profile as well.
>>>
>>> Add support for registering every CPU with the platform profile handler
>>> when dynamic EPP is enabled.
>>>
>>> The end result will be that changing the platform profile will modify
>>> EPP accordingly.
>>>
>>> Signed-off-by: Mario Limonciello <mario.limonciello@....com>
>>> ---
>>> Documentation/admin-guide/pm/amd-pstate.rst | 4 +-
>>> drivers/cpufreq/Kconfig.x86 | 1 +
>>> drivers/cpufreq/amd-pstate.c | 142 +++++++++++++++++---
>>> drivers/cpufreq/amd-pstate.h | 10 ++
>>> 4 files changed, 140 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/
>>> Documentation/admin-guide/pm/amd-pstate.rst
>>> index 8424e7119dd7e..36950fb6568c0 100644
>>> --- a/Documentation/admin-guide/pm/amd-pstate.rst
>>> +++ b/Documentation/admin-guide/pm/amd-pstate.rst
>>> @@ -321,7 +321,9 @@ Whether this behavior is enabled by default with
>>> the kernel config option
>>> at runtime by the sysfs file ``/sys/devices/system/cpu/cpufreq/
>>> policyX/dynamic_epp``.
>>> When set to enabled, the driver will select a different energy
>>> performance
>>> -profile when the machine is running on battery or AC power.
>>> +profile when the machine is running on battery or AC power. The
>>> driver will
>>> +also register with the platform profile handler to receive
>>> notifications of
>>> +user desired power state and react to those.
>>> When set to disabled, the driver will not change the energy
>>> performance profile
>>> based on the power source and will not react to user desired power
>>> state.
>>> diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
>>> index c5ef92634ddf4..905eab998b836 100644
>>> --- a/drivers/cpufreq/Kconfig.x86
>>> +++ b/drivers/cpufreq/Kconfig.x86
>>> @@ -40,6 +40,7 @@ config X86_AMD_PSTATE
>>> select ACPI_PROCESSOR
>>> select ACPI_CPPC_LIB if X86_64
>>> select CPU_FREQ_GOV_SCHEDUTIL if SMP
>>> + select ACPI_PLATFORM_PROFILE
>>> help
>>> This driver adds a CPUFreq driver which utilizes a fine grain
>>> processor performance frequency control range instead of legacy
>>> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
>>> index 9911808fe0bcf..28c02edf6e40b 100644
>>> --- a/drivers/cpufreq/amd-pstate.c
>>> +++ b/drivers/cpufreq/amd-pstate.c
>>> @@ -105,6 +105,7 @@ static struct quirk_entry *quirks;
>>> * 2 balance_performance
>>> * 3 balance_power
>>> * 4 power
>>> + * 5 custom (for raw EPP values)
>>> */
>>> enum energy_perf_value_index {
>>> EPP_INDEX_DEFAULT = 0,
>>> @@ -112,6 +113,7 @@ enum energy_perf_value_index {
>>> EPP_INDEX_BALANCE_PERFORMANCE,
>>> EPP_INDEX_BALANCE_POWERSAVE,
>>> EPP_INDEX_POWERSAVE,
>>> + EPP_INDEX_CUSTOM,
>>> };
>>> static const char * const energy_perf_strings[] = {
>>> @@ -120,6 +122,7 @@ static const char * const energy_perf_strings[] = {
>>> [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance",
>>> [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power",
>>> [EPP_INDEX_POWERSAVE] = "power",
>>> + [EPP_INDEX_CUSTOM] = "custom",
>>> NULL
>>> };
>>> @@ -1073,6 +1076,10 @@ static int
>>> amd_pstate_power_supply_notifier(struct notifier_block *nb,
>>> if (event != PSY_EVENT_PROP_CHANGED)
>>> return NOTIFY_OK;
>>> + /* dynamic actions are only applied while platform profile is in
>>> balanced */
>>> + if (cpudata->current_profile != PLATFORM_PROFILE_BALANCED)
>>> + return 0;
>>> +
>>> epp = amd_pstate_get_balanced_epp(policy);
>>> ret = amd_pstate_set_epp(policy, epp);
>>> @@ -1081,14 +1088,84 @@ static int
>>> amd_pstate_power_supply_notifier(struct notifier_block *nb,
>>> return NOTIFY_OK;
>>> }
>>> -static void amd_pstate_clear_dynamic_epp(struct cpufreq_policy *policy)
>>> +
>>> +static int amd_pstate_profile_probe(void *drvdata, unsigned long
>>> *choices)
>>> +{
>>> + set_bit(PLATFORM_PROFILE_LOW_POWER, choices);
>>> + set_bit(PLATFORM_PROFILE_BALANCED, choices);
>>> + set_bit(PLATFORM_PROFILE_PERFORMANCE, choices);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int amd_pstate_profile_get(struct device *dev,
>>> + enum platform_profile_option *profile)
>>> +{
>>> + struct amd_cpudata *cpudata = dev_get_drvdata(dev);
>>> +
>>> + *profile = cpudata->current_profile;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int amd_pstate_profile_set(struct device *dev,
>>> + enum platform_profile_option profile)
>>> +{
>>> + struct amd_cpudata *cpudata = dev_get_drvdata(dev);
>>> + struct cpufreq_policy *policy __free(put_cpufreq_policy) =
>>> cpufreq_cpu_get(cpudata->cpu);
>>> + int ret;
>>> +
>>> + switch (profile) {
>>> + case PLATFORM_PROFILE_LOW_POWER:
>>> + if (cpudata->policy != CPUFREQ_POLICY_POWERSAVE)
>>> + cpudata->policy = CPUFREQ_POLICY_POWERSAVE;
>>
>> So prior to the patch, cpudata->policy is supposed to mirror
>> policy->policy. With this patch, this assumption is no longer
>> true. So it is possible for the user to again override the choice of
>> EPP set via platform profile by changing the cpufreq governor ?
>>
>> Is this the expected behaviour?
>>
>> The bigger concern is, if the governor was previously "performance"
>> and then the platform profile requested "low power", "cat
>> /sys/devices/system/cpu/cpuX/cpufreq/scaling_governor" would still
>> show "performance", which is inconsistent with the behaviour.
>>
>>
>
> This ties back to the previous patches for dynamic EPP. My expectation
> was that when dynamic EPP is enabled that users can't manually set the
> EPP anymore (it will return -EBUSY) and likewise turning on dynamic EPP
> should keep the governor as powersave.
>
> I'll double check all those are properly enforced; but that's at least
> the intent.
FWIW - I double checked and confirmed that this is working as intended.
* I couldn't change from powersave to performance when dynamic_epp was
enabled (-EBUSY)
* I couldn't change energy_performance_preference when dynamic_epp was
enabled (-EBUSY)
>
> IMO this "should" all work because turning on Dynamic EPP sysfs file
> forces the driver to go through a state transition that it will tear
> everything down and back up. The policy will come back up in
> "powersave" even if it was previously in "performance" when the dynamic
> EPP sysfs file was turned on.
>
> Longer term; I also envision the scheduler influencing EPP values when
> dynamic_epp is turned on. The "platform profile" would be an "input" to
> that decision making process (maybe giving a weighting?), not the only
> lever.
>
> I haven't given any serious look at how to do this with the scheduler, I
> wanted to lay the foundation first being dynamic EPP and raw EPP.
>
> So even if dynamic_epp isn't interesting "right now" for server because
> the focus is around behavior for AC/DC, don't write it off just yet.
Powered by blists - more mailing lists