[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9f5f8181-7d0e-415d-b473-0e3c6601ccc3@amd.com>
Date: Thu, 23 Jan 2025 11:05:29 -0600
From: Mario Limonciello <mario.limonciello@....com>
To: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
"zhenglifeng (A)" <zhenglifeng1@...wei.com>,
Pierre Gondois <pierre.gondois@....com>,
Russell Haley <yumpusamongus@...il.com>, rafael@...nel.org, lenb@...nel.org,
robert.moore@...el.com, viresh.kumar@...aro.org
Cc: acpica-devel@...ts.linux.dev, linux-acpi@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org, linuxarm@...wei.com,
jonathan.cameron@...wei.com, gautham.shenoy@....com, ray.huang@....com,
zhanjie9@...ilicon.com, lihuisong@...wei.com, hepeng68@...wei.com,
fanghao11@...wei.com
Subject: Re: [PATCH v4 6/6] cpufreq: CPPC: Support for autonomous selection in
cppc_cpufreq
On 1/23/2025 10:46, Srinivas Pandruvada wrote:
>
> On 1/20/25 18:42, zhenglifeng (A) wrote:
>> On 2025/1/21 1:44, Mario Limonciello wrote:
>>
>>> On 1/20/2025 08:49, Pierre Gondois wrote:
>>>>
>>>> On 1/20/25 04:15, zhenglifeng (A) wrote:
>>>>> On 2025/1/17 22:30, Mario Limonciello wrote:
>>>>>
>>>>>> On 1/16/2025 21:11, zhenglifeng (A) wrote:
>>>>>>> On 2025/1/16 19:39, Russell Haley wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I noticed something here just as a user casually browsing the
>>>>>>>> mailing list.
>>>>>>>>
>>>>>>>> On 1/13/25 6:21 AM, Lifeng Zheng wrote:
>>>>>>>>> Add sysfs interfaces for CPPC autonomous selection in the
>>>>>>>>> cppc_cpufreq
>>>>>>>>> driver.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Lifeng Zheng <zhenglifeng1@...wei.com>
>>>>>>>>> ---
>>>>>>>>> .../ABI/testing/sysfs-devices-system-cpu | 54 +++++++++
>>>>>>>>> drivers/cpufreq/cppc_cpufreq.c | 109 +++++++
>>>>>>>>> ++++ +++++++
>>>>>>>>> 2 files changed, 163 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu
>>>>>>>>> b/ Documentation/ABI/testing/sysfs-devices-system-cpu
>>>>>>>>> index 206079d3bd5b..3d87c3bb3fe2 100644
>>>>>>>>> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
>>>>>>>>> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
>>>>>>>>> @@ -268,6 +268,60 @@ Description: Discover CPUs in the same
>>>>>>>>> CPU frequency coordination domain
>>>>>>>>> This file is only present if the acpi-cpufreq or
>>>>>>>>> the cppc-cpufreq
>>>>>>>>> drivers are in use.
>>>>>>>> [...snip...]
>>>>>>>>
>>>>>>>>> +What: /sys/devices/system/cpu/cpuX/cpufreq/energy_perf
>>>>>>>>> +Date: October 2024
>>>>>>>>> +Contact: linux-pm@...r.kernel.org
>>>>>>>>> +Description: Energy performance preference
>>>>>>>>> +
>>>>>>>>> + Read/write an 8-bit integer from/to this file. This file
>>>>>>>>> + represents a range of values from 0 (performance
>>>>>>>>> preference) to
>>>>>>>>> + 0xFF (energy efficiency preference) that influences
>>>>>>>>> the rate of
>>>>>>>>> + performance increase/decrease and the result of the
>>>>>>>>> hardware's
>>>>>>>>> + energy efficiency and performance optimization policies.
>>>>>>>>> +
>>>>>>>>> + Writing to this file only has meaning when Autonomous
>>>>>>>>> Selection is
>>>>>>>>> + enabled.
>>>>>>>>> +
>>>>>>>>> + This file only presents if the cppc-cpufreq driver is
>>>>>>>>> in use.
>>>>>>>> In intel_pstate driver, there is file with near-identical
>>>>>>>> semantics:
>>>>>>>>
>>>>>>>> /sys/devices/system/cpu/cpuX/cpufreq/energy_performance_preference
>>>>>>>>
>>>>>>>> It also accepts a few string arguments and converts them to
>>>>>>>> integers.
>>>>>>>>
>>>>>>>> Perhaps the same name should be used, and the semantics made
>>>>>>>> exactly
>>>>>>>> identical, and then it could be documented as present for either
>>>>>>>> cppc_cpufreq OR intel_pstate?
>>>>>>>>
>>>>>>>> I think would be more elegant if userspace tooling could Just
>>>>>>>> Work with
>>>>>>>> either driver.
>>>>>>>>
>>>>>>>> One might object that the frequency selection behavior that
>>>>>>>> results from
>>>>>>>> any particular value of the register itself might be different,
>>>>>>>> but they
>>>>>>>> are *already* different between Intel's P and E-cores in the
>>>>>>>> same CPU
>>>>>>>> package. (Ugh.)
>>>>>>> Yes, I should use the same name. Thanks.
>>>>>>>
>>>>>>> As for accepting string arguments and converting them to
>>>>>>> integers, I don't
>>>>>>> think it is necessary. It'll be a litte confused if someone
>>>>>>> writes a raw
>>>>>>> value and reads a string I think. I prefer to let users freely
>>>>>>> set this
>>>>>>> value.
>>>>>>>
>>>>>>> In addition, there are many differences between the
>>>>>>> implementations of
>>>>>>> energy_performance_preference in intel_pstate and cppc_cpufreq (and
>>>>>>> amd-pstate...). It is really difficult to explain all this
>>>>>>> differences in
>>>>>>> this document. So I'll leave it to be documented as present for
>>>>>>> cppc_cpufreq only.
>>>>>> At least the interface to userspace I think we should do the best
>>>>>> we can to be the same between all the drivers if possible.
>>>>>>
>>>>>> For example; I've got a patch that I may bring up in a future
>>>>>> kernel cycle that adds raw integer writes to amd-pstates
>>>>>> energy_performance_profile to behave the same way intel-pstate does.
>>>>> I agree that it's better to keep this interface consistent across
>>>>> different
>>>>> drivers. But in my opinion, the implementation of intel_pstate
>>>>> energy_performance_preference is not really nice. Someone may write
>>>>> a raw
>>>>> value but read a string, or read strings for some values and read raw
>>>>> values for some other values. It is inconsistent. It may be better
>>>>> to use
>>>>> some other implementation, such as seperating the operations of r/w
>>>>> strings
>>>>> and raw values into two files.
>>>> I agree it would be better to be sure of the type to expect when
>>>> reading the
>>>> energy_performance_preference file. The epp values in the range
>>>> 0-255 with 0
>>>> being the performance value for all interfaces.
>>>>
>>>> In the current epp strings, it seems there is a big gap between the
>>>> PERFORMANCE
>>>> and the BALANCE_PERFORMANCE strings. Maybe it would be good to
>>>> complete it:
>>>> EPP_PERFORMANCE 0x00
>>>> EPP_BALANCE_PERFORMANCE 0x40 // state value changed
>>>> EPP_BALANCE 0x80 // new state
>>>> EPP_BALANCE_POWERSAVE 0xC0
>>>> EPP_POWERSAVE 0xFF
>>>>
>>>> NIT: The mapping seems to be slightly different for intel_pstate and
>>>> amd-pstate
>>>> currently:
>>>> drivers/cpufreq/amd-pstate.c
>>>> #define AMD_CPPC_EPP_PERFORMANCE 0x00
>>>> #define AMD_CPPC_EPP_BALANCE_PERFORMANCE 0x80
>>>> #define AMD_CPPC_EPP_BALANCE_POWERSAVE 0xBF
>>>> #define AMD_CPPC_EPP_POWERSAVE 0xFF
>>>>
>>>> arch/x86/include/asm/msr-index.h
>>>> #define HWP_EPP_PERFORMANCE 0x00
>>>> #define HWP_EPP_BALANCE_PERFORMANCE 0x80
>>>> #define HWP_EPP_BALANCE_POWERSAVE 0xC0 <------ Different from
>>>> AMD_CPPC_EPP_BALANCE_POWERSAVE
>>>> #define HWP_EPP_POWERSAVE 0xFF
>>>>
>>>>> I think it's better to consult Rafael and Viresh about how this should
>>>>> evolve.
>>>> Yes indeed
>>> Maybe it's best to discuss what the goal of raw EPP number writes is
>>> to decide what to do with it.
>>>
>>> IE in intel-pstate is it for userspace to be able to actually utilize
>>> something besides the strings all the time? Or is it just for
>>> debugging to find better values for strings in the future?
>>>
>>> If the former maybe we're better off splitting to
>>> 'energy_performance_preference' and 'energy_performance_preference_int'.
>>>
>>> If the latter maybe we're better off putting the integer writes and
>>> reads into debugfs instead and making 'energy_performance_preference'
>>> return -EINVAL while a non-predefined value is in use.
>
> In Intel case EPP values can be different based on processor. In some
> case they they end up sharing the same CPU model. So strings are not
> suitable for all cases. Also there is different preference of EPP
> between Chrome systems and non chrome distro. For example Chrome has
> some resource manager which can change and same on Intel distros with LPMD.
>
Thanks for confirming it is intentional and changing it would break
existing userspace.
And FWIW even in Windows there are more than 4 situational values used
like we have in Linux today.
As the status quo is there I personally feel that we should do the exact
same for other implementation of energy_performance_preference.
>
> Thanks,
>
> Srinivas
>
>
>> I think it's the former.
>>
>> I added the author of the patch that allows raw energy performance
>> preference value in intel_pstate to ask about what the goal is and if
>> this
>> would be ok to do the modification mentioned above.
>>
>> To see the patch from https://lore.kernel.org/
>> all/20200626183401.1495090-3-srinivas.pandruvada@...ux.intel.com/
>>
>> Anyway, the purpose of this patch is to allow users write and read raw
>> EPP
>> number. So maybe I can just rename the file to
>> 'energy_performance_preference_int'?
>>
Powered by blists - more mailing lists