[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAFivqm+Xi9FYtzPmT0QkAUxC2Kx_AkrH2NuQE_sVnJVuo48ypA@mail.gmail.com>
Date: Wed, 20 Aug 2025 14:25:16 -0700
From: Prashant Malani <pmalani@...gle.com>
To: Beata Michalska <beata.michalska@....com>
Cc: Yang Shi <yang@...amperecomputing.com>, open list <linux-kernel@...r.kernel.org>,
"open list:CPU FREQUENCY SCALING FRAMEWORK" <linux-pm@...r.kernel.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>, Catalin Marinas <catalin.marinas@....com>,
Ionela Voinescu <Ionela.Voinescu@....com>
Subject: Re: [PATCH] cpufreq: CPPC: Increase delay between perf counter reads
Hi Beata,
On Wed, 20 Aug 2025 at 13:49, Beata Michalska <beata.michalska@....com> wrote:
>
> Kinda working on that one.
OK. I'm eager to see what the solution is!
> >
> > Outside of that, I can't think of another mitigation beyond adding delay to make
> > the time deltas not matter so much.
> I'm not entirely sure what 'so much' means in this context.
> How one would quantify whether the added delay is actually mitigating the issue?
>
I alluded to it in the commit description, but here is the my basic
numerical analysis:
The effective timestamps for the 4 readings right now are:
Timestamp t0: del0
Timestamp t0 + m: ref0
(Time delay X us)
Timestamp t1: del1
Timestamp t1 + n: ref1
Timestamp t1 = t0 + m + X
The perf calculation is:
Per = del1 - del0 / ref1 - ref0
= Del_counter_diff_over_time(t1 - t0) /
ref_counter_diff_over_time(t1 + n - (t0 + m))
= Del_counter_diff_over time(t0 + m + X - t0) /
ref_counter_diff_over_time((t0 + m + X + n - t0 - m)
= Del_counter_diff_over_time(m + X) / ref_counter_diff_over_time(n + X)
If X >> (m,n) this becomes:
= Del_counter_diff_over_time(X) / ref_counter_diff_over_time(X)
which is what the actual calculation is supposed to be.
if X ~ (m, N) (which is what the case is right now), the calculation
becomes erratic.
> >
> > Perhaps ARM should introduce a "snapshot" feature which takes the snapshot
> > of the AMU counters on a register write; IAC that's outside the scope
> What do you mean by register write ?
I mean that we should have a snapshot register:
SYS_AMEVCNTR0_SNAPSHOT_EL0
writing to this will save the current values of SYS_AMEVCNTR0_CORE_EL0 and
SYS_AMEVCNTR0_CONST_EL0 into two shadow registers (say
SYS_AMEVCNTR0_CORE_SNAPSHOT_EL0
and SYS_AMEVCNTR0_CONST_SNAPSHOT_EL0)
That way the OS specifics in how the registers are read don't matter.
Of course, I'm not a H/W architect so this proposal is very rough, but hopefully
helps illustrate my idea.
> > of this discussion.
> >
> > Could you kindly explain why adding the udelay here is discouraged?
>
> Would you mind clarifying how the specific value of 200 µs was determined?
> Was it carefully derived from measurements across multiple platforms and
> workloads, or simply observed to “work” on one relatively stable setup?
>
> Understanding the basis for choosing that delay will help put the discussion
> into the right context.
TBH, I just experimented with values on our system and observed the readings of
cores running stress-ng. I tried 100us and that still gave variability
(values greater than
the theoretical max frequency). It's possible that the "optimal value"
is somewhere
between these two.
There have been other observations on this topic [1], that suggest
that even 100us
improves the error rate significantly from what it is with 2us.
BR,
-Prashant
[1] https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.com/
Powered by blists - more mailing lists