[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e239024-91d8-ea06-25a4-631496576319@os.amperecomputing.com>
Date: Fri, 7 Apr 2023 15:19:25 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Pierre Gondois <pierre.gondois@....com>
Cc: viresh.kumar@...aro.org, scott@...amperecomputing.com,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
"Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: [PATCH] cpufreq: CPPC: use 10ms delay instead of 2us to avoid
high error
On 4/7/23 1:31 AM, Pierre Gondois wrote:
> Hello Yang,
>
> On 4/6/23 23:52, Yang Shi wrote:
>>
>>
>> On 4/5/23 10:57 AM, Pierre Gondois wrote:
>>>
>>>
>>> On 4/4/23 21:07, Yang Shi wrote:
>>>>
>>>>
>>>> On 3/29/23 11:43 AM, Rafael J. Wysocki wrote:
>>>>> On Tue, Mar 28, 2023 at 9:39 PM Yang Shi
>>>>> <yang@...amperecomputing.com> wrote:
>>>>>> When testing CPPC cpufreq on our platform, we noticed the error may
>>>>>> be quite
>>>>>> high and the high error may happen quite often. For example, on a
>>>>>> platform
>>>>>> with a maximum frequency of 2.8GHz when the CPUs were fully loaded
>>>>>> (100% load),
>>>>>> we saw cpuinfo_cur_freq may show 4GHz, it means the error is >
>>>>>> 40%. And the
>>>>>> high error (> 1%) happened 256 times out of 2127 samples (sampled
>>>>>> every 3
>>>>>> seconds) in an approximate 2hrs test.
>>>>> The description above is a bit cryptic IMV. For example, it is not
>>>>> particularly clear what "high error" means.
>>>>>
>>>>>> We tried to enlarge the delay, and tested with 100us, 1ms and
>>>>>> 10ms. The
>>>>>> below is the results.
>>>>>>
>>>>>> 100us:
>>>>>> The highest error is 4GHz, 22 times out of 3623 samples
>>>>>>
>>>>>> 1ms:
>>>>>> The highest error is 3.3GHz, 3 times out of 2814 samples
>>>>>>
>>>>>> 10ms:
>>>>>> No high error anymore
>>>>>>
>>>>>> Increase the measurement delay in cppc_cpufreq_get_rate to 10ms to
>>>>>> avoid
>>>>>> high measurement errors.
>>>>>>
>>>>>> Signed-off-by: Yang Shi <yang@...amperecomputing.com>
>>>>>> ---
>>>>>> drivers/cpufreq/cppc_cpufreq.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c
>>>>>> b/drivers/cpufreq/cppc_cpufreq.c
>>>>>> index 022e3555407c..c2bf65448d3d 100644
>>>>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>>>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>>>>> @@ -851,7 +851,7 @@ static unsigned int
>>>>>> cppc_cpufreq_get_rate(unsigned int cpu)
>>>>>> if (ret)
>>>>>> return ret;
>>>>>>
>>>>>> - udelay(2); /* 2usec delay between sampling */
>>>>>> + mdelay(10); /* 10msec delay between sampling */
>>>>> This function can be called with interrupts off, so it cannot spin
>>>>> for 10 ms.
>>>>
>>>> Per Pierre's comment, the delay may still be ms. Is it still too
>>>> long? A
>>>> quick look at the code shows cpufreq_quick_get() is the only caller
>>>> with
>>>> irq off IIRC. So can we have another callback for it, for example,
>>>> get_quick() which does spin for shorter time (for example, keep 2us
>>>> delay). Then have ->get() callback use longer delay?
>>>
>>> -
>>> According to the same ACPI 6.5 s8.4.6.1.2.5 "Time Window Register"
>>> paragraph,
>>> and assuming we are in the 'Autonomous Selection is not enabled' case,
>>> the OS is supposed to write (not read) the delta between successive
>>> reads of the
>>> counter, so using this field as is would be bending the definition I
>>> think.
>>>
>>> -
>>> It is correct that the "Time Window Register" field specifies a value
>>> in ms,
>>> but it seems a long time to wait for with irqs off.
>>
>> AFAIK, our platforms don't support "Time Window Register".
>>
>>>
>>> -
>>> Theoretically, the perf/ref counters should accumulate to allow
>>> computing
>>> a correct frequency. Is it possible to know how these counters are
>>> accessed ?
>>> Is it through PCC channels and there is some undesired delay between
>>> the
>>> reads of the perf/ref counters ?
>>
>> The counters are implemented via mmio instead of PCC channels. So the
>> cpc_read() calls should go to ACPI_ADR_SPACE_SYSTEM_MEMORY IIRC.
>>
>>>
>>> -
>>> About making the delay:
>>> max(cppc_cpufreq_get_transition_delay_us(), Time Winder Register)
>>> I think it would be good to know why the values of the counters
>>> don't accumulate correctly, ideally by getting a trace where a
>>> frequency
>>> above the maximum frequency is computed, and with the timestamps at
>>> which
>>> the counters are read.
>>> If the values are coming from PCC channels / the firmware, it might be
>>> difficult
>>> to get.
>>
>> I wrote a bpftrace script to trace the below data:
>> - The CPU number
>> - The frequency
>> - The start and end timestamp of the first cppc_get_perf_ctrs()
>> call
>> - The duration/latency of the first cppc_get_perf_ctrs() call
>> - The start and end timestamp of the second
>> cppc_get_perf_ctrs() call
>> - The duration/latency of the second cppc_get_perf_ctrs() call
>>
>> The typical logs look like below.
>> Good
>> CPU: 1
>> Freq: 2801485KHz
>> First: 2489382384 2489387084 4700ns
>> Second: 2489390824 2489394024 3200ns
>> --------------------------------------------------
>> CPU: 2
>> Freq: 2797956KHz
>> First: 2490406524 2490411204 4680ns
>> Second: 2490414764 2490417684 2920ns
>>
>> Bad:
>> CPU: 55
>> Freq: 3969372KHz
>> First: 875659868 875721568 61700ns
>> Second: 875725148 875727708 2560ns
>> --------------------------------------------------
>> CPU: 65
>> Freq: 3829744KHz
>> First: 3854951136 3854995896 44760ns
>> Second: 3854999416 3855002696 3280ns
>> --------------------------------------------------
>> CPU: 21
>> Freq: 4279242KHz
>> First: 240834204 240910484 76280ns
>> Second: 240914264 240916944 2680ns
>>
>>
>> The first line is cpu number, the second line is frequency returned by
>> cppc_cpufreq_get_rate(), the third line is the start and end timestamps
>> and duration of the first cppc_get_perf_ctrs(), the fourth line is the
>> start and end timestamps and duration of the second
>> cppc_get_perf_ctrs().
>>
>> So per the log I think we can tell basically the longer the duration the
>> higher the error. The 2us delay is not long enough to offset the impact
>> from unexpected latency of reading the counters.
>>
>> In the worst case the frequency is 4279242KHz, comparing 2800000KHz the
>> error is over 50%. So the delay should be 4ms ~ 5ms in order to offset
>> the impact from reading the counters if I do the math correctly.
>>
>> Hope the trace data is helpful to diagnose the problem.
>
>
> Thanks for the data. I was thinking the following was happening:
>
> cppc_get_perf_ctrs()[0] cppc_get_perf_ctrs()[1]
> / \ / \
> ref[0] delivered[0] ref[1] delivered[1]
> | | | |
> v v v v
> ---------------------------------------------------------------------->
> time
> <-delta[0]-> <-------------2us------------> <----delta[1]---->
>
> If delta[0] is really different from delta[1] like above, then the
> reference and delivered counters would have accumulated during different
> intervals, resulting in a wrong frequency.
Yeah, it looks like so.
> If more/less than 2us elapse between the two cppc_get_perf_ctrs() calls,
> then it shouldn't have any impact. So waiting ~10ms should theoretically
> not solve the issue.
I'm not sure whether the 10ms delay really resolved the issue, but it
did reduce the magnitude of the error.
BTW, I don't see irq is disabled when reading cpuinfo_cur_freq, so it
looks like interrupts could easily result in the difference between
delta[0] and delta[1]. And it seems like the difference matters.
And the counters are accessed through an interconnect on our platform,
so the interconnect congestion may result in the difference as well.
>
> freq = ref_freq * (delivered[1] - delivered[0]) / (ref[1] - ref[0])
>
> If the counters are accessed through mmio, I don't see anything that
> would
> make delta[x] vary when calling cppc_get_perf_ctrs(), cf. cpc_read().
> Do you know if the address represents real counters or a place in memory
> representing something else ?
The address does represent real counters.
>
> Would it be possible to try setting the CPU frequency to one unique value
> and get a serie of values like:
> [timestamp, ref_counter_value, deliverd_counter_value]
Could you please elaborate regarding "setting the CPU frequency to one
unique value"? What value is unique?
>
> This would allow to check that the counters are accumulating at a valid
> pace. Also you said there were frequencies above the maximum value, but
> are there also frequencies below the minimum value ?
I've never seen the frequency below the minimum value.
>
> Regards,
> Pierre
Powered by blists - more mailing lists