lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be69bd58-e5f0-4b4c-8eaf-37eefaef4f90@huawei.com>
Date: Wed, 5 Feb 2025 14:13:14 +0800
From: "zhenglifeng (A)" <zhenglifeng1@...wei.com>
To: srinivas pandruvada <srinivas.pandruvada@...ux.intel.com>, Mario
 Limonciello <mario.limonciello@....com>, Pierre Gondois
	<pierre.gondois@....com>, Russell Haley <yumpusamongus@...il.com>,
	<rafael@...nel.org>, <lenb@...nel.org>, <robert.moore@...el.com>,
	<viresh.kumar@...aro.org>
CC: <acpica-devel@...ts.linux.dev>, <linux-acpi@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <linux-pm@...r.kernel.org>,
	<linuxarm@...wei.com>, <jonathan.cameron@...wei.com>,
	<gautham.shenoy@....com>, <ray.huang@....com>, <zhanjie9@...ilicon.com>,
	<lihuisong@...wei.com>, <hepeng68@...wei.com>, <fanghao11@...wei.com>
Subject: Re: [PATCH v4 6/6] cpufreq: CPPC: Support for autonomous selection in
 cppc_cpufreq

Hello,

Sorry for my delay due to the recent holiday.

On 2025/1/24 22:18, srinivas pandruvada wrote:

> On Fri, 2025-01-24 at 11:53 +0800, zhenglifeng (A) wrote:
>> On 2025/1/24 1:05, Mario Limonciello wrote:
>>
>>> On 1/23/2025 10:46, Srinivas Pandruvada wrote:
>>>>
>>>> On 1/20/25 18:42, zhenglifeng (A) wrote:
>>>>> On 2025/1/21 1:44, Mario Limonciello wrote:
>>>>>
>>>>>> On 1/20/2025 08:49, Pierre Gondois wrote:
>>>>>>>
>>>>>>> On 1/20/25 04:15, zhenglifeng (A) wrote:
>>>>>>>> On 2025/1/17 22:30, Mario Limonciello wrote:
>>>>>>>>
>>>>>>>>> On 1/16/2025 21:11, zhenglifeng (A) wrote:
>>>>>>>>>> On 2025/1/16 19:39, Russell Haley wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I noticed something here just as a user casually
>>>>>>>>>>> browsing the mailing list.
>>>>>>>>>>>
>>>>>>>>>>> On 1/13/25 6:21 AM, Lifeng Zheng wrote:
>>>>>>>>>>>> Add sysfs interfaces for CPPC autonomous
>>>>>>>>>>>> selection in the cppc_cpufreq
>>>>>>>>>>>> driver.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Lifeng Zheng
>>>>>>>>>>>> <zhenglifeng1@...wei.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>     .../ABI/testing/sysfs-devices-system-cpu     
>>>>>>>>>>>> |  54 +++++++++
>>>>>>>>>>>>     drivers/cpufreq/cppc_cpufreq.c               
>>>>>>>>>>>> | 109 +++++++ ++++ +++++++
>>>>>>>>>>>>     2 files changed, 163 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/Documentation/ABI/testing/sysfs-
>>>>>>>>>>>> devices-system-cpu b/
>>>>>>>>>>>> Documentation/ABI/testing/sysfs-devices-system-
>>>>>>>>>>>> cpu
>>>>>>>>>>>> index 206079d3bd5b..3d87c3bb3fe2 100644
>>>>>>>>>>>> --- a/Documentation/ABI/testing/sysfs-devices-
>>>>>>>>>>>> system-cpu
>>>>>>>>>>>> +++ b/Documentation/ABI/testing/sysfs-devices-
>>>>>>>>>>>> system-cpu
>>>>>>>>>>>> @@ -268,6 +268,60 @@ Description:    Discover
>>>>>>>>>>>> CPUs in the same CPU frequency coordination
>>>>>>>>>>>> domain
>>>>>>>>>>>>             This file is only present if the
>>>>>>>>>>>> acpi-cpufreq or the cppc-cpufreq
>>>>>>>>>>>>             drivers are in use.
>>>>>>>>>>> [...snip...]
>>>>>>>>>>>
>>>>>>>>>>>> +What:       
>>>>>>>>>>>> /sys/devices/system/cpu/cpuX/cpufreq/energy_perf
>>>>>>>>>>>> +Date:        October 2024
>>>>>>>>>>>> +Contact:    linux-pm@...r.kernel.org
>>>>>>>>>>>> +Description:    Energy performance preference
>>>>>>>>>>>> +
>>>>>>>>>>>> +        Read/write an 8-bit integer from/to this
>>>>>>>>>>>> file. This file
>>>>>>>>>>>> +        represents a range of values from 0
>>>>>>>>>>>> (performance preference) to
>>>>>>>>>>>> +        0xFF (energy efficiency preference) that
>>>>>>>>>>>> influences the rate of
>>>>>>>>>>>> +        performance increase/decrease and the
>>>>>>>>>>>> result of the hardware's
>>>>>>>>>>>> +        energy efficiency and performance
>>>>>>>>>>>> optimization policies.
>>>>>>>>>>>> +
>>>>>>>>>>>> +        Writing to this file only has meaning
>>>>>>>>>>>> when Autonomous Selection is
>>>>>>>>>>>> +        enabled.
>>>>>>>>>>>> +
>>>>>>>>>>>> +        This file only presents if the cppc-
>>>>>>>>>>>> cpufreq driver is in use.
>>>>>>>>>>> In intel_pstate driver, there is file with near-
>>>>>>>>>>> identical semantics:
>>>>>>>>>>>
>>>>>>>>>>> /sys/devices/system/cpu/cpuX/cpufreq/energy_perform
>>>>>>>>>>> ance_preference
>>>>>>>>>>>
>>>>>>>>>>> It also accepts a few string arguments and converts
>>>>>>>>>>> them to integers.
>>>>>>>>>>>
>>>>>>>>>>> Perhaps the same name should be used, and the
>>>>>>>>>>> semantics made exactly
>>>>>>>>>>> identical, and then it could be documented as
>>>>>>>>>>> present for either
>>>>>>>>>>> cppc_cpufreq OR intel_pstate?
>>>>>>>>>>>
>>>>>>>>>>> I think would be more elegant if userspace tooling
>>>>>>>>>>> could Just Work with
>>>>>>>>>>> either driver.
>>>>>>>>>>>
>>>>>>>>>>> One might object that the frequency selection
>>>>>>>>>>> behavior that results from
>>>>>>>>>>> any particular value of the register itself might
>>>>>>>>>>> be different, but they
>>>>>>>>>>> are *already* different between Intel's P and E-
>>>>>>>>>>> cores in the same CPU
>>>>>>>>>>> package. (Ugh.)
>>>>>>>>>> Yes, I should use the same name. Thanks.
>>>>>>>>>>
>>>>>>>>>> As for accepting string arguments and converting them
>>>>>>>>>> to integers, I don't
>>>>>>>>>> think it is necessary. It'll be a litte confused if
>>>>>>>>>> someone writes a raw
>>>>>>>>>> value and reads a string I think. I prefer to let
>>>>>>>>>> users freely set this
>>>>>>>>>> value.
>>>>>>>>>>
>>>>>>>>>> In addition, there are many differences between the
>>>>>>>>>> implementations of
>>>>>>>>>> energy_performance_preference in intel_pstate and
>>>>>>>>>> cppc_cpufreq (and
>>>>>>>>>> amd-pstate...). It is really difficult to explain all
>>>>>>>>>> this differences in
>>>>>>>>>> this document. So I'll leave it to be documented as
>>>>>>>>>> present for
>>>>>>>>>> cppc_cpufreq only.
>>>>>>>>> At least the interface to userspace I think we should
>>>>>>>>> do the best we can to be the same between all the
>>>>>>>>> drivers if possible.
>>>>>>>>>
>>>>>>>>> For example; I've got a patch that I may bring up in a
>>>>>>>>> future kernel cycle that adds raw integer writes to
>>>>>>>>> amd-pstates energy_performance_profile to behave the
>>>>>>>>> same way intel-pstate does.
>>>>>>>> I agree that it's better to keep this interface
>>>>>>>> consistent across different
>>>>>>>> drivers. But in my opinion, the implementation of
>>>>>>>> intel_pstate
>>>>>>>> energy_performance_preference is not really nice. Someone
>>>>>>>> may write a raw
>>>>>>>> value but read a string, or read strings for some values
>>>>>>>> and read raw
>>>>>>>> values for some other values. It is inconsistent. It may
>>>>>>>> be better to use
>>>>>>>> some other implementation, such as seperating the
>>>>>>>> operations of r/w strings
>>>>>>>> and raw values into two files.
>>>>>>> I agree it would be better to be sure of the type to expect
>>>>>>> when reading the
>>>>>>> energy_performance_preference file. The epp values in the
>>>>>>> range 0-255 with 0
>>>>>>> being the performance value for all interfaces.
>>>>>>>
>>>>>>> In the current epp strings, it seems there is a big gap
>>>>>>> between the PERFORMANCE
>>>>>>> and the BALANCE_PERFORMANCE strings. Maybe it would be good
>>>>>>> to complete it:
>>>>>>> EPP_PERFORMANCE        0x00
>>>>>>> EPP_BALANCE_PERFORMANCE    0x40      // state value changed
>>>>>>> EPP_BALANCE        0x80      // new state
>>>>>>> EPP_BALANCE_POWERSAVE    0xC0
>>>>>>> EPP_POWERSAVE        0xFF
>>>>>>>
>>>>>>> NIT: The mapping seems to be slightly different for
>>>>>>> intel_pstate and amd-pstate
>>>>>>> currently:
>>>>>>> drivers/cpufreq/amd-pstate.c
>>>>>>> #define AMD_CPPC_EPP_PERFORMANCE        0x00
>>>>>>> #define AMD_CPPC_EPP_BALANCE_PERFORMANCE    0x80
>>>>>>> #define AMD_CPPC_EPP_BALANCE_POWERSAVE        0xBF
>>>>>>> #define AMD_CPPC_EPP_POWERSAVE            0xFF
>>>>>>>
>>>>>>> arch/x86/include/asm/msr-index.h
>>>>>>> #define HWP_EPP_PERFORMANCE        0x00
>>>>>>> #define HWP_EPP_BALANCE_PERFORMANCE    0x80
>>>>>>> #define HWP_EPP_BALANCE_POWERSAVE    0xC0   <------
>>>>>>> Different from AMD_CPPC_EPP_BALANCE_POWERSAVE
>>>>>>> #define HWP_EPP_POWERSAVE        0xFF
>>>>>>>
>>>>>>>> I think it's better to consult Rafael and Viresh about
>>>>>>>> how this should
>>>>>>>> evolve.
>>>>>>> Yes indeed
>>>>>> Maybe it's best to discuss what the goal of raw EPP number
>>>>>> writes is to decide what to do with it.
>>>>>>
>>>>>> IE in intel-pstate is it for userspace to be able to actually
>>>>>> utilize something besides the strings all the time?  Or is it
>>>>>> just for debugging to find better values for strings in the
>>>>>> future?
>>>>>>
>>>>>> If the former maybe we're better off splitting to
>>>>>> 'energy_performance_preference' and
>>>>>> 'energy_performance_preference_int'.
>>>>>>
>>>>>> If the latter maybe we're better off putting the integer
>>>>>> writes and reads into debugfs instead and making
>>>>>> 'energy_performance_preference' return -EINVAL while a non-
>>>>>> predefined value is in use.
>>>>
>>>> In Intel case EPP values can be different based on processor. In
>>>> some case they they end up sharing the same CPU model. So strings
>>>> are not suitable for all cases. Also there is different
>>>> preference of EPP between Chrome systems and non chrome distro.
>>>> For example Chrome has some resource manager which can change and
>>>> same on Intel distros with LPMD.
>>>>
>>>
>>> Thanks for confirming it is intentional and changing it would break
>>> existing userspace.
>>>
>>> And FWIW even in Windows there are more than 4 situational values
>>> used like we have in Linux today.
>>>
>>> As the status quo is there I personally feel that we should do the
>>> exact same for other implementation of
>>> energy_performance_preference.
>>
>> I still don't think this implementation is nice, for the following
>> reasons:
>>
>> 1. Users may write raw value but read string. It's odd.
>>
>> 2. Sometimes a raw value is read and sometimes a character string is
>> read.
>> The userspace tool needs to adapt this.
>>
>> 3. Reading and writing EPP strings is not really general in driver.
>> It is
>> more reasonable to use the userspace tool to implement it.
>>
>> In order not to break existing userspace, I'll rename the file to
>> 'energy_performance_preference_int' or
>> 'energy_performance_preference_val'
>> in cppc_cpufreq and only support reading and writing raw value. As
>> for
>> accepting string arguments, it's not necessary for cppc_cpufreq for
>> now.
>> It's easy to add this feature but hard to remove, so I'll leave it to
>> the
>> future if it is really needed.
>>
>> As for amd-pstate and intel_pstate, you can decide how
>> energy_performance_preference should evolve. But I strongly recommend
>> splitting it.
> 
> No. User space can deal with this already. 

I know that. What I mean is that user space had to do more to adapt this
"sometims value sometimes string" situation. This could have been avoided.

> Atleast this has one
> interface. If you split you need to keep them consistent. You can write
> a raw value  to new attribute and then can read a string from the other
> attribute, which means different.

I won't do any change in amd-pstate and intel_pstate because it involves
too much. I'll only add a new file in cppc_cpufreq for reading and writing
EPP raw value.

> 
> This is not the only place where strings and raw values can be written
> in sysfs. Also true for energy_perf_bias.
> 
> 
> Thanks,
> Srinivas
> 
> 
> 
>>
>> Regards,
>>
>> Lifeng
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Srinivas
>>>>
>>>>
>>>>> I think it's the former.
>>>>>
>>>>> I added the author of the patch that allows raw energy
>>>>> performance
>>>>> preference value in intel_pstate to ask about what the goal is
>>>>> and if this
>>>>> would be ok to do the modification mentioned above.
>>>>>
>>>>> To see the patch from
>>>>> https://lore.kernel.org/ all/20200626183401.1495090-3-srinivas.pandruvada@...ux.intel.c
>>>>> om/
>>>>>
>>>>> Anyway, the purpose of this patch is to allow users write and
>>>>> read raw EPP
>>>>> number. So maybe I can just rename the file to
>>>>> 'energy_performance_preference_int'?
>>>>>
>>>
>>
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ