[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFivqmLCW2waDnJ0nGbjBd5gs+w+DeszPKe0be3VRLVu06-Ytg@mail.gmail.com>
Date: Fri, 27 Jun 2025 10:07:16 -0700
From: Prashant Malani <pmalani@...gle.com>
To: Jie Zhan <zhanjie9@...ilicon.com>
Cc: Ben Segall <bsegall@...gle.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
open list <linux-kernel@...r.kernel.org>,
"open list:CPU FREQUENCY SCALING FRAMEWORK" <linux-pm@...r.kernel.org>, Mel Gorman <mgorman@...e.de>,
Peter Zijlstra <peterz@...radead.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>, Valentin Schneider <vschneid@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>, Viresh Kumar <viresh.kumar@...aro.org>,
Ionela Voinescu <ionela.voinescu@....com>, Beata Michalska <beata.michalska@....com>,
z00813676 <zhenglifeng1@...wei.com>
Subject: Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
Hi Jie,
On Fri, 27 Jun 2025 at 00:55, Jie Zhan <zhanjie9@...ilicon.com> wrote:
>
>
> Hi Prashant,
>
> Sorry for a late reply as I'm busy on other stuff and this doesn't seem to
> be an easy issue to solve.
>
No worries, the ping was in general to all the people in the thread :)
> For the latest kernel, [1] provides a new 'cpuinfo_avg_freq' sysfs file to
> reflect the frequency base on AMUs, which is supposed to be more stable.
> Though it usually shows 'Resource temporarily unavailable' on my platform
> at the moment and looks a bit buggy.
>
> Most of the related discussions can be found in the reference links in [1].
> [1] https://lore.kernel.org/linux-pm/20250131162439.3843071-1-beata.michalska@arm.com/
>
> As reported, the current frequency sampling method may show an large error
> on 1) 100% load, 2) high memory access pressure, 3) idle cpus in your case.
>
> AFAICS, they may all come from the unstable latency accessing remote AMUs
> for 4 times but delaying a fixed 2us sampling window.
I tried applying [1] which consolidates the ref and del register reads
into 1 IPI, but that did not make a difference. The values still
fluctuate wildly.
>
> Increase the sampling windows seems to help but also increase the time
> overhead, so that's not favoured by people.
>
This experiment did not appear to help in our case. It's a point in
the direction that this method is inherently inaccurate during idle
situations.
> On 20/06/2025 13:07, Prashant Malani wrote:
> > Hi Jie,
> > On Thu, 19 Jun 2025 at 20:53, Jie Zhan <zhanjie9@...ilicon.com> wrote:
> >> On 19/06/2025 08:09, Prashant Malani wrote:
> >>> t0: ref=899127636, del=3012458473
> >>> t1: ref=899129626, del=3012466509
> >>> perf=40
> >>
> >> In this case, the target cpu is mostly idle but not fully idle during the
> >> sampling window since the counter grows a little bit.
> >> Perhaps some interrupts happen to run on the cpu shortly.
>
> Check back here again, I don't think it 'mostly idle'.
> Diff of ref counters is around 2000, and I guess the ref counter freq is
> 1GHz on your platform? That's exactly 2us, so the target cpu is mostly
> busy.
I don't think the reference counter increment means that the CPU is
"busy" or "not idle". Per [2], it just means that the "processor is
active".
idle_cpu() returning true means that the CPU is just running the idle
task, and has nothing in its runqueue.
In our experiments, this is always the case at least when the cpu is
being brought online (which kind of makes sense).
> > I don't think this is necessarily an issue. The ABI doesn't need to be
> > synchronous; it is merely a snapshot of the scheduler view of that CPU
> > at a point in time. Even the current method of perf counters sampling
> > is purely hueristic. The CPU might be idle for the 2 usec the
> > sampling is done, and servicing traffic before and after that.
> > This is inherent whenever you are sampling any system state.
>
> Then the issue is not totally solved, just less often?
>
Yes. I don't think this can be completely solved, given the inherent
inaccuracy in hardware. What this *does* do is mitigate one of the
scenarios, while not impacting sampling when the CPU is actually doing
something useful; as such I don't see much downside to including it.
Best regards,
[1] https://patchew.org/linux/20240229162520.970986-1-vanshikonda@os.amperecomputing.com/20240229162520.970986-4-vanshikonda@os.amperecomputing.com/
[2] https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?highlight=cppc#performance-counters
--
-Prashant
Powered by blists - more mailing lists