lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0h98XXvSOpBFn4vV1QivFtsSzzVg8sJGq4v04uf5bi5Jw@mail.gmail.com>
Date:   Sun, 19 Dec 2021 15:30:34 +0100
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Julia Lawall <julia.lawall@...ia.fr>
Cc:     Francisco Jerez <currojerez@...eup.net>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Len Brown <lenb@...nel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: cpufreq: intel_pstate: map utilization into the pstate range

On Sun, Dec 19, 2021 at 3:19 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Sun, Dec 19, 2021 at 7:42 AM Julia Lawall <julia.lawall@...ia.fr> wrote:
> >
> >
> >
> > On Sat, 18 Dec 2021, Francisco Jerez wrote:
> >
> > > Julia Lawall <julia.lawall@...ia.fr> writes:
> > >
> > > > On Sat, 18 Dec 2021, Francisco Jerez wrote:
> > > >
> > > >> Julia Lawall <julia.lawall@...ia.fr> writes:
> > > >>
> > > >> >> As you can see in intel_pstate.c, min_pstate is initialized on core
> > > >> >> platforms from MSR_PLATFORM_INFO[47:40], which is "Maximum Efficiency
> > > >> >> Ratio (R/O)".  However that seems to deviate massively from the most
> > > >> >> efficient ratio on your system, which may indicate a firmware bug, some
> > > >> >> sort of clock gating problem, or an issue with the way that
> > > >> >> intel_pstate.c processes this information.
> > > >> >
> > > >> > I'm not sure to understand the bug part.  min_pstate gives the frequency
> > > >> > that I find as the minimum frequency when I look for the specifications of
> > > >> > the CPU.  Should one expect that it should be something different?
> > > >> >
> > > >>
> > > >> I'd expect the minimum frequency on your processor specification to
> > > >> roughly match the "Maximum Efficiency Ratio (R/O)" value from that MSR,
> > > >> since there's little reason to claim your processor can be clocked down
> > > >> to a frequency which is inherently inefficient /and/ slower than the
> > > >> maximum efficiency ratio -- In fact they both seem to match in your
> > > >> system, they're just nowhere close to the frequency which is actually
> > > >> most efficient, which smells like a bug, like your processor
> > > >> misreporting what the most efficient frequency is, or it deviating from
> > > >> the expected one due to your CPU static power consumption being greater
> > > >> than it would be expected to be under ideal conditions -- E.g. due to
> > > >> some sort of clock gating issue, possibly due to a software bug, or due
> > > >> to our scheduling of such workloads with a large amount of lightly
> > > >> loaded threads being unnecessarily inefficient which could also be
> > > >> preventing most of your CPU cores from ever being clock-gated even
> > > >> though your processor may be sitting idle for a large fraction of their
> > > >> runtime.
> > > >
> > > > The original mail has results from two different machines: Intel 6130
> > > > (skylake) and Intel 5218 (cascade lake).  I have access to another cluster
> > > > of 6130s and 5218s.  I can try them.
> > > >
> > > > I tried 5.9 in which I just commented out the schedutil code to make
> > > > frequency requests.  I only tested avrora (tiny pauses) and h2 (longer
> > > > pauses) and in both case the execution is almost entirely in the turbo
> > > > frequencies.
> > > >
> > > > I'm not sure to understand the term "clock-gated".  What C state does that
> > > > correspond to?  The turbostat output for one run of avrora is below.
> > > >
> > >
> > > I didn't have any specific C1+ state in mind, most of the deeper ones
> > > implement some sort of clock gating among other optimizations, I was
> > > just wondering whether some sort of software bug and/or the highly
> > > intermittent CPU utilization pattern of these workloads are preventing
> > > most of your CPU cores from entering deep sleep states.  See below.
> > >
> > > > julia
> > > >
> > > > 78.062895 sec
> > > > Package Core  CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ     SMI     POLL    C1      C1E     C6      POLL%   C1%     C1E%    C6%     CPU%c1  CPU%c6  CoreTmp PkgTmp  Pkg%pc2 Pkg%pc6 Pkg_J   RAM_J   PKG_%   RAM_%
> > > > -     -       -       31      2.95    1065    2096    156134  0       1971    155458  2956270 657130  0.00    0.20    4.78    92.26   14.75   82.31   40      41      45.14   0.04    4747.52 2509.05 0.00    0.00
> > > > 0     0       0       13      1.15    1132    2095    11360   0       0       2       39      19209   0.00    0.00    0.01    99.01   8.02    90.83   39      41      90.24   0.04    2266.04 1346.09 0.00    0.00
> > >
> > > This seems suspicious:                                                                                                                                                          ^^^^    ^^^^^^^
> > >
> > > I hadn't understood that you're running this on a dual-socket system
> > > until I looked at these results.
> >
> > Sorry not to have mentioned that.
> >
> > > It seems like package #0 is doing
> > > pretty much nothing according to the stats below, but it's still
> > > consuming nearly half of your energy, apparently because the idle
> > > package #0 isn't entering deep sleep states (Pkg%pc6 above is close to
> > > 0%).  That could explain your unexpectedly high static power consumption
> > > and the deviation of the real maximum efficiency frequency from the one
> > > reported by your processor, since the reported maximum efficiency ratio
> > > cannot possibly take into account the existence of a second CPU package
> > > with dysfunctional idle management.
> >
> > Our assumption was that if anything happens on any core, all of the
> > packages remain in a state that allows them to react in a reasonable
> > amount of time ot any memory request.
> >
> > > I'm guessing that if you fully disable one of your CPU packages and
> > > repeat the previous experiment forcing various P-states between 10 and
> > > 37 you should get a maximum efficiency ratio closer to the theoretical
> > > one for this CPU?
> >
> > OK, but that's not really a natural usage context...  I do have a
> > one-socket Intel 5220.  I'll see what happens there.
> >
> > I did some experiements with forcing different frequencies.  I haven't
> > finished processing the results, but I notice that as the frequency goes
> > up, the utilization (specifically the value of
> > map_util_perf(sg_cpu->util) at the point of the call to
> > cpufreq_driver_adjust_perf in sugov_update_single_perf) goes up as well.
> > Is this expected?
>
> It isn't, as long as the scale-invariance mechanism mentioned in my
> previous message works properly.

But even if it doesn't, the utilization should decrease when the
frequency increases.

Increasing frequency should cause more instructions to be retired per
unit of time and so there should be more idle time in the workload.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ