[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef3f933a-742c-9e8e-9da4-762b33f2de94@hisilicon.com>
Date: Fri, 27 Jun 2025 15:54:59 +0800
From: Jie Zhan <zhanjie9@...ilicon.com>
To: Prashant Malani <pmalani@...gle.com>
CC: Ben Segall <bsegall@...gle.com>, Dietmar Eggemann
<dietmar.eggemann@....com>, Ingo Molnar <mingo@...hat.com>, Juri Lelli
<juri.lelli@...hat.com>, open list <linux-kernel@...r.kernel.org>, "open
list:CPU FREQUENCY SCALING FRAMEWORK" <linux-pm@...r.kernel.org>, Mel Gorman
<mgorman@...e.de>, Peter Zijlstra <peterz@...radead.org>, "Rafael J. Wysocki"
<rafael@...nel.org>, Steven Rostedt <rostedt@...dmis.org>, Valentin Schneider
<vschneid@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, Viresh
Kumar <viresh.kumar@...aro.org>, Ionela Voinescu <ionela.voinescu@....com>,
Beata Michalska <beata.michalska@....com>, z00813676
<zhenglifeng1@...wei.com>
Subject: Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
Hi Prashant,
Sorry for a late reply as I'm busy on other stuff and this doesn't seem to
be an easy issue to solve.
I may provide some thoughts but probably need more time to go through the
history and come up with a good solution.
Actually, the inaccuracy of cppc_cpufreq_get_rate() has been reported and
discussed many times. I believe your issue is just one of the cases.
For the latest kernel, [1] provides a new 'cpuinfo_avg_freq' sysfs file to
reflect the frequency base on AMUs, which is supposed to be more stable.
Though it usually shows 'Resource temporarily unavailable' on my platform
at the moment and looks a bit buggy.
Most of the related discussions can be found in the reference links in [1].
[1] https://lore.kernel.org/linux-pm/20250131162439.3843071-1-beata.michalska@arm.com/
As reported, the current frequency sampling method may show an large error
on 1) 100% load, 2) high memory access pressure, 3) idle cpus in your case.
AFAICS, they may all come from the unstable latency accessing remote AMUs
for 4 times but delaying a fixed 2us sampling window.
Increase the sampling windows seems to help but also increase the time
overhead, so that's not favoured by people.
On 20/06/2025 13:07, Prashant Malani wrote:
> Hi Jie,
>
> Thanks for taking a look at the patch.
>
> On Thu, 19 Jun 2025 at 20:53, Jie Zhan <zhanjie9@...ilicon.com> wrote:
>> On 19/06/2025 08:09, Prashant Malani wrote:
>>> AMU performance counters tend to be inaccurate when measured on idle CPUs.
>>> On an idle CPU which is programmed to 3.4 GHz (verified through firmware),
>>> here is a measurement and calculation of operating frequency:
>>>
>>> t0: ref=899127636, del=3012458473
>>> t1: ref=899129626, del=3012466509
>>> perf=40
>>
>> In this case, the target cpu is mostly idle but not fully idle during the
>> sampling window since the counter grows a little bit.
>> Perhaps some interrupts happen to run on the cpu shortly.
Check back here again, I don't think it 'mostly idle'.
Diff of ref counters is around 2000, and I guess the ref counter freq is
1GHz on your platform? That's exactly 2us, so the target cpu is mostly
busy.
So that might be some other issue. Let's forget the minimum threshold
stuff below for now.
>>
>> Thus, the actual issue is the accuracy of frequency sampling becomes poor
>> when the delta of counters are too small to obtain a reliable accuracy.
>>
>> Would it be more sensible to put a minimum threshold of the delta of
>> counters when sampling the frequency?
>
> I'm happy to throw together a patch if there is some safe
> threshold the experts here can agree on for the minimum delta for
> the ref counter. I would caution that with this sort of approach we
> start running into the familiar issue:
> - What value is appropriate? Too large and you get false
> positives (falling back to the idle invalid path when we shouldn't), and
> too less and you get false negatives (we still report inaccurate
> counter values).
> - Is the threshold the same across platforms?
> - Will it remain the same 5/10 years from now?
>
>> BTW, that ABI
>> doesn't seem to be synchronous at all, i.e. the cpu might be busy when we
>> check and then become idle when sampling.
>>
>
> I don't think this is necessarily an issue. The ABI doesn't need to be
> synchronous; it is merely a snapshot of the scheduler view of that CPU
> at a point in time. Even the current method of perf counters sampling
> is purely hueristic. The CPU might be idle for the 2 usec the
> sampling is done, and servicing traffic before and after that.
> This is inherent whenever you are sampling any system state.
Then the issue is not totally solved, just less often?
>
> I would imagine it is more reliable to trust the kernel scheduler's view
> of whether a CPU is idle, than relying on counters and a calculation
> method which are sensitive and unreliable for idle systems
> (i.e stray interrupts can throw off the calculations).
>
> That said, I'm happy to go with the approach folks on this list recommend.
>
> Cheers,
>
Powered by blists - more mailing lists