lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0j1bqhmKrJirw+WgEVDdszZ9xQSgmfazVKMVa8H6_5TSw@mail.gmail.com>
Date: Mon, 3 Jun 2024 15:43:12 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Viresh Kumar <viresh.kumar@...aro.org>
Cc: Beata Michalska <beata.michalska@....com>, linux-kernel@...r.kernel.org, 
	linux-pm@...r.kernel.org, rafael@...nel.org, len.brown@...el.com, 
	ionela.voinescu@....com, vanshikonda@...amperecomputing.com, 
	sumitg@...dia.com
Subject: Re: [PATCH 1/1] cpufreq: Rewire arch specific feedback for cpuinfo/scaling_cur_freq

On Mon, Jun 3, 2024 at 1:48 PM Viresh Kumar <viresh.kumar@...aro.org> wrote:
>
> Hi Beata,
>
> Thanks for taking this forward.
>
> On 03-06-24, 09:13, Beata Michalska wrote:
> > Some architectures provide a way to determine an average frequency over
> > a certain period of time, based on available performance monitors (AMU on
> > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu
> > into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to
> > represent the current frequency of a given CPU,as obtained by the hardware.
> > This is the type of feedback that counters do provide.
>
> Please add blank line between paragraphs, it makes it easier to read
> them.
>
> > At the same time, keep the scaling_cur_freq attribute align with the docs
> > and make it provide most recently requested frequency, still allowing to
> > fallback to using arch_freq_get_on_cpu for cases when cpuinfo_cur_freq is
> > not available.
>
> Please split this patch into two parts, they are very distinct changes
> and should be kept separate.
>
> > Signed-off-by: Beata Michalska <beata.michalska@....com>
> > ---
> >  drivers/cpufreq/cpufreq.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index a45aac17c20f..3b0eabe4a983 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -758,7 +758,8 @@ static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> >       ssize_t ret;
> >       unsigned int freq;
> >
> > -     freq = arch_freq_get_on_cpu(policy->cpu);
> > +     freq = !cpufreq_driver->get ? arch_freq_get_on_cpu(policy->cpu)
> > +                                 : 0;
>
> This is getting trickier than I thought as I dived into more details
> of all the changes to the file.
>
> Rafael,
>
> We probably need to decide on a policy for these two files, it is
> getting a bit confusing.
>
> cpuinfo_cur_freq:
>
> The purpose of this file is abundantly clear. This returns the best
> possible guess of the current hardware frequency. It should rely on
> arch_freq_get_on_cpu() or ->get() to get the value.

Let me quote the documentation:

"This is expected to be the frequency the hardware actually runs at.
If that frequency cannot be determined, this attribute should not be
present."

In my reading, this has nothing to do with arch_freq_get_on_cpu(), at
least on x86.

> Perhaps we can
> make this available all the time, instead of conditionally on ->get()
> callback (which isn't present for intel-pstate for example).

We could, but then on x86 there is no expectation that this file will
be present and changing this may introduce significant confusion
because of the way it is documented (which would need to be changed,
but people might be forgiven for failing to notice the change of
interpretation of this file).

> scaling_cur_freq:
>
> This should better reflect the last requested frequency, but since a
> significant time now it is trying to show what cpuinfo_cur_freq shows.

Well, not really.

> commit c034b02e213d ("cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers")
> commit f8475cef9008 ("x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF")

"In the majority of cases, this is the frequency of the last P-state
requested by the scaling driver from the hardware using the scaling
interface provided by it, which may or may not reflect the frequency
the CPU is actually running at (due to hardware design and other
limitations).

Some architectures (e.g. x86) may attempt to provide information more
precisely reflecting the current CPU frequency through this attribute,
but that still may not be the exact current CPU frequency as seen by
the hardware at the moment."

So the problem is that on Intel x86 with HWP and intel_pstate in the
active mode, say, "the frequency of the last P-state requested by the
scaling driver from the hardware" is actually never known, so exposing
it via scaling_cur_freq is not possible.

Moreover, because cpuinfo_cur_freq is not present at all in that case,
scaling_cur_freq is the only way to allow user space to get an idea
about the CPU current frequency.  I don't think it can be changed now
without confusing users.

> What should we do ? I wonder if we will break some userspace tools
> (which may have started relying on these changes).

We will.

IIUC, it is desirable to expose "the frequency of the last P-state
requested by the scaling driver from the hardware" via
scaling_cur_freq on ARM, but it is also desirable to expose an
approximation of the actual current CPU frequency, so the only way to
do that without confusing the heck out of everybody downstream would
be to introduce a new attribute for this purpose and document it
precisely.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ