linux-kernel - Re: cpufreq: intel_pstate: map utilization into the pstate range

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.22.394.2201062044340.3098@hadrien>
Date:   Thu, 6 Jan 2022 20:49:08 +0100 (CET)
From:   Julia Lawall <julia.lawall@...ia.fr>
To:     Francisco Jerez <currojerez@...eup.net>
cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Len Brown <lenb@...nel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: cpufreq: intel_pstate: map utilization into the pstate range



On Wed, 5 Jan 2022, Francisco Jerez wrote:

> Julia Lawall <julia.lawall@...ia.fr> writes:
>
> > On Tue, 4 Jan 2022, Rafael J. Wysocki wrote:
> >
> >> On Tue, Jan 4, 2022 at 4:49 PM Julia Lawall <julia.lawall@...ia.fr> wrote:
> >> >
> >> > I tried the whole experiment again on an Intel w2155 (one socket, 10
> >> > physical cores, pstates 12, 33, and 45).
> >> >
> >> > For the CPU there is a small jump a between 32 and 33 - less than for the
> >> > 6130.
> >> >
> >> > For the RAM, there is a big jump between 21 and 22.
> >> >
> >> > Combining them leaves a big jump between 21 and 22.
> >>
> >> These jumps are most likely related to voltage increases.
> >>
> >> > It seems that the definition of efficient is that there is no more cost
> >> > for the computation than the cost of simply having the machine doing any
> >> > computation at all.  It doesn't take into account the time and energy
> >> > required to do some actual amount of work.
> >>
> >> Well, that's not what I wanted to say.
> >
> > I was referring to Francisco's comment that the lowest indicated frequency
> > should be the most efficient one.  Turbostat also reports the lowest
> > frequency as the most efficient one.  In my graph, there are the pstates 7
> > and 10, which give exactly the same energy consumption as 12.  7 and 10
> > are certainly less efficient, because the energy consumption is the same,
> > but the execution speed is lower.
> >
> >> Of course, the configuration that requires less energy to be spent to
> >> do a given amount of work is more energy-efficient.  To measure this,
> >> the system needs to be given exactly the same amount of work for each
> >> run and the energy spent by it during each run needs to be compared.
>
> I disagree that the system needs to be given the exact same amount of
> work in order to measure differences in energy efficiency.  The average
> energy efficiency of Julia's 10s workloads can be calculated easily in
> both cases (e.g. as the W/E ratio below, W will just be a different
> value for each run), and the result will likely approximate the
> instantaneous energy efficiency of the fixed P-states we're comparing,
> since her workload seems to be fairly close to a steady state.
>
> >
> > This is bascially my point of view, but there is a question about it.  If
> > over 10 seconds you consume 10J and by running twice as fast you would
> > consume only 6J, then how do you account for the nest 5 seconds?  If the
> > machine is then idle for the next 5 seconds, maybe you would end up
> > consuming 8J in total over the 10 seconds.  But if you take advantage of
> > the free 5 seconds to pack in another job, then you end up consuming 12J.
> >
>
> Geometrically, such an oscillatory workload with periods of idling and
> periods of activity would give an average power consumption along the
> line that passes through the points corresponding to both states on the
> CPU's power curve -- IOW your average power consumption will just be the
> weighted average of the power consumption of each state (with the duty
> cycle t_i/t_total of each state being its weight):
>
> P_avg = t_0/t_total * P_0 + t_1/t_total * P_1
>
> Your energy usage would just be 10s times that P_avg, since you're
> assuming that the total runtime of the workload is fixed at 10s
> independent of how long the CPU actually takes to complete the
> computation.  In cases where the P-state during the period of activity
> t_1 is equal or lower to the maximum efficiency P-state, that line
> segment is guaranteed to lie below the power curve, indicating that such
> oscillation is more efficient than running the workload fixed to its
> average P-state.
>
> That said, this scenario doesn't really seem very relevant to your case,
> since the last workload you've provided turbostat traces for seems to
> show almost no oscillation.  If there was such an oscillation, your
> total energy usage would still be greater for oscillations between idle
> and some P-state different from the most efficient one.  Such an
> oscillation doesn't explain the anomaly we're seeing on your traces,
> which show more energy-efficient instantaneous behavior for a P-state 2x
> the one reported by your processor as the most energy-efficient.

All the turbostat output and graphs I have sent recently were just for
continuous spinning:

for(;;);

Now I am trying running for the percentage of the time corresponding to
10 / P for pstate P (ie 0.5 of the time for pstate 20), and then sleeping,
to see whether one can just add the sleeping power consumption of the
machine to compute the efficiency as Rafael suggested.

julia

>
> >> However, I think that you are interested in answering a different
> >> question: Given a specific amount of time (say T) to run the workload,
> >> what frequency to run the CPUs doing the work at in order to get the
> >> maximum amount of work done per unit of energy spent by the system (as
> >> a whole)?  Or, given 2 different frequency levels, which of them to
> >> run the CPUs at to get more work done per energy unit?
> >
> > This is the approach where you assume that the machine will be idle in any
> > leftover time.  And it accounts for the energy consumed in that idle time.
> >
> >> The work / energy ratio can be estimated as
> >>
> >> W / E = C * f / P(f)
> >>
> >> where C is a constant and P(f) is the power drawn by the whole system
> >> while the CPUs doing the work are running at frequency f, and of
> >> course for the system discussed previously it is greater in the 2 GHz
> >> case.
> >>
> >> However P(f) can be divided into two parts, P_1(f) that really depends
> >> on the frequency and P_0 that does not depend on it.  If P_0 is large
> >> enough to dominate P(f), which is the case in the 10-20 range of
> >> P-states on the system in question, it is better to run the CPUs doing
> >> the work faster (as long as there is always enough work to do for
> >> them; see below).  This doesn't mean that P(f) is not a convex
> >> function of f, though.
> >>
> >> Moreover, this assumes that there will always be enough work for the
> >> system to do when running the busy CPUs at 2 GHz, or that it can go
> >> completely idle when it doesn't do any work, but let's see what
> >> happens if the amount of work to do is W_1 = C * 1 GHz * T and the
> >> system cannot go completely idle when the work is done.
> >>
> >> Then, nothing changes for the busy CPUs running at 1 GHz, but in the 2
> >> GHz case we get W = W_1 and E = P(2 GHz) * T/2 + P_0 * T/2, because
> >> the busy CPUs are only busy 1/2 of the time, but power P_0 is drawn by
> >> the system regardless.  Hence, in the 2 GHz case (assuming P(2 GHz) =
> >> 120 W and P_0 = 90 W), we get
> >>
> >> W / E = 2 * C * 1 GHz / (P(2 GHz) + P_0) = 0.0095 * C * 1 GHz
> >>
> >> which is slightly less than the W / E ratio at 1 GHz approximately
> >> equal to 0.01 * C * 1 GHz (assuming P(1 GHz) = 100 W), so in these
> >> conditions it would be better to run the busy CPUs at 1 GHz.
> >
> > OK, I'll try to measure this.
> >
> > thanks,
> > julia
>