[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zh6dSrUnckoa-thV@arm.com>
Date: Tue, 16 Apr 2024 17:46:18 +0200
From: Beata Michalska <beata.michalska@....com>
To: Vanshidhar Konda <vanshikonda@...amperecomputing.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
ionela.voinescu@....com, sudeep.holla@....com, will@...nel.org,
catalin.marinas@....com, vincent.guittot@...aro.org,
sumitg@...dia.com, yang@...amperecomputing.com,
lihuisong@...wei.com, viresh.kumar@...aro.org
Subject: Re: [PATCH v4 4/4] cpufreq: Use arch specific feedback for
cpuinfo_cur_freq
On Mon, Apr 15, 2024 at 09:23:10PM -0700, Vanshidhar Konda wrote:
> On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote:
> > Some architectures provide a way to determine an average frequency over
> > a certain period of time based on available performance monitors (AMU on
> > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu
> > into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to
> > represent the current frequency of a given CPU, as obtained by the hardware.
> > This is the type of feedback that counters do provide.
> >
>
> --- snip ---
>
> While testing this patch series on AmpereOne system, I simulated CPU
> frequency throttling when the system is under power or thermal
> constraints.
>
> In this scenario, based on the user guilde, I expect scaling_cur_freq
> is the frequency the kernel requests from the hardware; cpuinfo_cur_freq
> is the actual frequency that the hardware is able to run at during the
> power or thermal constraints.
There has been a discussion on scaling_cur_freq vs cpuinfo_cur_freq [1].
The guidelines you are referring here (assuming you mean [2]) are kinda
out-of-sync already as scaling_cur_freq has been wired earlier to use arch
specific feedback. As there was no Arm dedicated implementation of
arch_freq_get_on_cpu, this went kinda unnoticed.
The conclusion of the above mentioned discussion (though rather unstated
explicitly) was to keep the current behaviour of scaling_cur_freq and align
both across different archs: so with the patches, both attributes will provide
hw feedback on current frequency, when available.
Note that if we are to adhere to the docs cpuinfo_cur_freq is the place to use
the counters really.
That change was also requested through [3]
Adding @Viresh in case there was any shift in the tides ....
>
> The AmpereOne system I'm testing on has the following configuration:
> - Max frequency is 3000000
> - Support for AMU registers
> - ACPI CPPC feedback counters use PCC register space
> - Fedora 39 with 6.7.5 kernel
> - Fedora 39 with 6.9.0-rc3 + this patch series
>
> With 6.7.5 kernel:
> Core scaling_cur_freq cpuinfo_cur_freq
> ---- ---------------- ----------------
> 0 3000000 2593000
> 1 3000000 2613000
> 2 3000000 2625000
> 3 3000000 2632000
>
So if I got it right from the info you have provided the numbers above are
obtained without applying the patches. In that case, scaling_cur_freq will
use policy->cur (in your case) showing last frequency set, not necessarily
the actual freq, whereas cpuinfo_cur_freq uses __cpufreq_get and AMU counters.
> With 6.9.0-rc3 + this patch series:
> Core scaling_cur_freq cpuinfo_cur_freq
> ---- ---------------- ----------------
> 0 2671875 2671875
> 1 2589632 2589632
> 2 2648437 2648437
> 3 2698242 2698242
>
With the patches applied both scaling_cur_freq and cpuinfo_cur_freq will use AMU
counters, or fie scale factor obtained based on AMU counters to be more precise:
both should now show similar/same frequency (as discussed in [1])
I'd say due to existing implementation for scaling_cur_freq (which we cannot
change at this point) this is unavoidable.
> In the second case we can't identify that the CPU frequency is
> being throttled by the hardware. I noticed this behavior with
> or without this patch.
>
I am not entirely sure comparing the two should be a way to go about throttling
(whether w/ or w/o the changes).
It would probably be best to refer to thermal sysfs and get a hold of cur_state
which should indicate current throttle state:
/sys/class/thermal/thermal_zone[0-*]/cdev[0-*]/cur_state
with values above '0' implying ongoing throttling.
The appropriate thermal_zone can be identified through 'type' attribute.
Thank you for giving those patches a spin.
---
BR
Beata
---
[1] https://lore.kernel.org/all/20230609043922.eyyqutbwlofqaddz@vireshk-i7/
[2] https://elixir.bootlin.com/linux/latest/source/Documentation/admin-guide/pm/cpufreq.rst#L197
[3] https://lore.kernel.org/lkml/2cfbc633-1e94-d741-2337-e1b0cf48b81b@nvidia.com/
---
> Thanks,
> Vanshi
Powered by blists - more mailing lists