linux-kernel - Re: [PATCH v2] cpufreq/amd-pstate: Refactor max frequency calculation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d8bfa42-8155-4b12-ad33-ab76c4e78a88@amd.com>
Date: Wed, 8 Jan 2025 09:33:59 +0530
From: Dhananjay Ugwekar <Dhananjay.Ugwekar@....com>
To: Mario Limonciello <mario.limonciello@....com>,
 "Gautham R. Shenoy" <gautham.shenoy@....com>,
 Naresh Solanki <naresh.solanki@...ements.com>
Cc: Huang Rui <ray.huang@....com>, Perry Yuan <perry.yuan@....com>,
 "Rafael J. Wysocki" <rafael@...nel.org>,
 Viresh Kumar <viresh.kumar@...aro.org>, linux-pm@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] cpufreq/amd-pstate: Refactor max frequency calculation

On 1/8/2025 12:36 AM, Mario Limonciello wrote:
> On 12/26/2024 23:49, Dhananjay Ugwekar wrote:
>> On 12/20/2024 11:46 AM, Gautham R. Shenoy wrote:
>>> On Fri, Dec 20, 2024 at 12:51:43AM +0530, Naresh Solanki wrote:
>>>> The previous approach introduced roundoff errors during division when
>>>> calculating the boost ratio. This, in turn, affected the maximum
>>>> frequency calculation, often resulting in reporting lower frequency
>>>> values.
>>>>
>>>> For example, on the Glinda SoC based board with the following
>>>> parameters:
>>>>
>>>> max_perf = 208
>>>> nominal_perf = 100
>>>> nominal_freq = 2600 MHz
>>>>
>>>> The Linux kernel previously calculated the frequency as:
>>>> freq = ((max_perf * 1024 / nominal_perf) * nominal_freq) / 1024
>>>> freq = 5405 MHz  // Integer arithmetic.
>>>>
>>>> With the updated formula:
>>>> freq = (max_perf * nominal_freq) / nominal_perf
>>>> freq = 5408 MHz
>>>>
>>>> This change ensures more accurate frequency calculations by eliminating
>>>> unnecessary shifts and divisions, thereby improving precision.
>>>>
>>>> Signed-off-by: Naresh Solanki <naresh.solanki@...ements.com>
>>>>
>>>> Changes in V2:
>>>> 1. Rebase on superm1.git/linux-next branch
>>>> ---
>>>>   drivers/cpufreq/amd-pstate.c | 9 ++++-----
>>>>   1 file changed, 4 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
>>>> index d7b1de97727a..02a851f93fd6 100644
>>>> --- a/drivers/cpufreq/amd-pstate.c
>>>> +++ b/drivers/cpufreq/amd-pstate.c
>>>> @@ -908,9 +908,9 @@ static int amd_pstate_init_freq(struct amd_cpudata *cpudata)
>>>>   {
>>>>       int ret;
>>>>       u32 min_freq, max_freq;
>>>> -    u32 nominal_perf, nominal_freq;
>>>> +    u32 highest_perf, nominal_perf, nominal_freq;
>>>>       u32 lowest_nonlinear_perf, lowest_nonlinear_freq;
>>>> -    u32 boost_ratio, lowest_nonlinear_ratio;
>>>> +    u32 lowest_nonlinear_ratio;
>>>>       struct cppc_perf_caps cppc_perf;
>>>>         ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
>>>> @@ -927,10 +927,9 @@ static int amd_pstate_init_freq(struct amd_cpudata *cpudata)
>>>>       else
>>>>           nominal_freq = cppc_perf.nominal_freq;
>>>>   +    highest_perf = READ_ONCE(cpudata->highest_perf);
>>>>       nominal_perf = READ_ONCE(cpudata->nominal_perf);
>>>> -
>>>> -    boost_ratio = div_u64(cpudata->highest_perf << SCHED_CAPACITY_SHIFT, nominal_perf);
>>>> -    max_freq = (nominal_freq * boost_ratio >> SCHED_CAPACITY_SHIFT);
>>>
>>>
>>> The patch looks obviously correct to me. And the suggested method
>>> would work because nominal_freq is larger than the nominal_perf and
>>> thus scaling is really necessary.
>>>
>>> Besides, before this patch, there was another obvious issue that we
>>> were computing the boost_ratio when we should have been computing the
>>> ratio of nominal_freq and nominal_perf and then multiplied this with
>>> max_perf without losing precision.
>>>
>>> This is just one instance, but it can be generalized so that any
>>> freq --> perf and perf --> freq can be computed without loss of precision.
>>>
>>> We need two things:
>>>
>>> 1. The mult_factor should be computed as a ratio of nominal_freq and
>>> nominal_perf (and vice versa) as they are always known.
>>>
>>> 2. Use DIV64_U64_ROUND_UP instead of div64() which rounds up instead of rounding down.
>>>
>>> So if we have the shifts defined as follows:
>>>
>>> #define PERF_SHIFT   12UL //shift used for freq --> perf conversion
>>> #define FREQ_SHIFT   10UL //shift used for perf --> freq conversion.
>>>
>>> And in amd_pstate_init_freq() code, we initialize the two global variables:
>>>
>>> u64 freq_mult_factor = DIV64_U64_ROUND_UP(nominal_freq  << FREQ_SHIFT, nominal_perf);
>>> u64 perf_mult_factor = DIV64_U64_ROUND_UP(nominal_perf  << PERF_SHIFT, nominal_freq);
>>
>> I like this approach, but can we assume the nominal freq/perf values to be the same for
>> all CPUs, otherwise we would need to make these factors a per-CPU or per-domain(where
>> all CPUs within a "domain" have the same nominal_freq/perf). At which point the benefit
>> of caching these ratios might diminish.
>>
>> Thoughts, Gautham, Mario?
> 
> No; in this day of heterogeneous designs I don't think that you can make that assumption, so yes if we had helpers they would have to apply to a group of CPUs, and I agree at that point the caching isn't very beneficial anymore.
> 
> If the main argument is to make it easier to follow we could have some macros though?

Agreed, I'm working on the helper functions patchset, will post it shortly.

> 
>>
>> Thanks,
>> Dhananjay
>>
>>>
>>> .. and have a couple of helper functions:
>>>
>>> /* perf to freq conversion */
>>> static inline unsigned int perf_to_freq(perf)
>>> {
>>>     return (perf * freq_mult_factor) >> FREQ_SHIFT;
>>> }
>>>
>>>
>>> /* freq to perf conversion */
>>> static inline unsigned int freq_to_perf(freq)
>>> {
>>>     return (freq * perf_mult_factor) >> PERF_SHIFT;
>>> }
>>>
>>>
>>>> +    max_freq = div_u64((u64)highest_perf * nominal_freq, nominal_perf);
>>>
>>> Then,
>>>          max_freq = perf_to_freq(highest_perf);
>>>     min_freq = perf_to_freq(lowest_non_linear_perf);
>>>
>>>
>>> and so on.
>>>
>>> This should just work.
>>>
>>>
>>>>         lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
>>>>       lowest_nonlinear_ratio = div_u64(lowest_nonlinear_perf << SCHED_CAPACITY_SHIFT,
>>>> -- 
>>>
>>> -- 
>>> Thanks and Regards
>>> gautham.
>>
>