lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b2be990d5c31f62d9ce33aa2eb2530708d5607a.camel@linux.intel.com>
Date:   Thu, 06 Jan 2022 12:28:26 -0800
From:   Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
To:     Julia Lawall <julia.lawall@...ia.fr>,
        Francisco Jerez <currojerez@...eup.net>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Len Brown <lenb@...nel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: cpufreq: intel_pstate: map utilization into the pstate range

On Thu, 2022-01-06 at 20:49 +0100, Julia Lawall wrote:
> 
> On Wed, 5 Jan 2022, Francisco Jerez wrote:
> 
> > Julia Lawall <julia.lawall@...ia.fr> writes:
> > 
> > > On Tue, 4 Jan 2022, Rafael J. Wysocki wrote:
> > > 
> > > > On Tue, Jan 4, 2022 at 4:49 PM Julia Lawall <
> > > > julia.lawall@...ia.fr> wrote:
> > > > > I tried the whole experiment again on an Intel w2155 (one
> > > > > socket, 10
> > > > > physical cores, pstates 12, 33, and 45).
> > > > > 
> > > > > For the CPU there is a small jump a between 32 and 33 - less
> > > > > than for the
> > > > > 6130.
> > > > > 
> > > > > For the RAM, there is a big jump between 21 and 22.
> > > > > 
> > > > > Combining them leaves a big jump between 21 and 22.
> > > > 
> > > > These jumps are most likely related to voltage increases.
> > > > 
> > > > > It seems that the definition of efficient is that there is no
> > > > > more cost
> > > > > for the computation than the cost of simply having the
> > > > > machine doing any
> > > > > computation at all.  It doesn't take into account the time
> > > > > and energy
> > > > > required to do some actual amount of work.
> > > > 
> > > > Well, that's not what I wanted to say.
> > > 
> > > I was referring to Francisco's comment that the lowest indicated
> > > frequency
> > > should be the most efficient one.  Turbostat also reports the
> > > lowest
> > > frequency as the most efficient one.  In my graph, there are the
> > > pstates 7
> > > and 10, which give exactly the same energy consumption as 12.  7
> > > and 10
> > > are certainly less efficient, because the energy consumption is
> > > the same,
> > > but the execution speed is lower.
> > > 
> > > > Of course, the configuration that requires less energy to be
> > > > spent to
> > > > do a given amount of work is more energy-efficient.  To measure
> > > > this,
> > > > the system needs to be given exactly the same amount of work
> > > > for each
> > > > run and the energy spent by it during each run needs to be
> > > > compared.
> > 
> > I disagree that the system needs to be given the exact same amount
> > of
> > work in order to measure differences in energy efficiency.  The
> > average
> > energy efficiency of Julia's 10s workloads can be calculated easily
> > in
> > both cases (e.g. as the W/E ratio below, W will just be a different
> > value for each run), and the result will likely approximate the
> > instantaneous energy efficiency of the fixed P-states we're
> > comparing,
> > since her workload seems to be fairly close to a steady state.
> > 
> > > This is bascially my point of view, but there is a question about
> > > it.  If
> > > over 10 seconds you consume 10J and by running twice as fast you
> > > would
> > > consume only 6J, then how do you account for the nest 5
> > > seconds?  If the
> > > machine is then idle for the next 5 seconds, maybe you would end
> > > up
> > > consuming 8J in total over the 10 seconds.  But if you take
> > > advantage of
> > > the free 5 seconds to pack in another job, then you end up
> > > consuming 12J.
> > > 
> > 
> > Geometrically, such an oscillatory workload with periods of idling
> > and
> > periods of activity would give an average power consumption along
> > the
> > line that passes through the points corresponding to both states on
> > the
> > CPU's power curve -- IOW your average power consumption will just
> > be the
> > weighted average of the power consumption of each state (with the
> > duty
> > cycle t_i/t_total of each state being its weight):
> > 
> > P_avg = t_0/t_total * P_0 + t_1/t_total * P_1
> > 
> > Your energy usage would just be 10s times that P_avg, since you're
> > assuming that the total runtime of the workload is fixed at 10s
> > independent of how long the CPU actually takes to complete the
> > computation.  In cases where the P-state during the period of
> > activity
> > t_1 is equal or lower to the maximum efficiency P-state, that line
> > segment is guaranteed to lie below the power curve, indicating that
> > such
> > oscillation is more efficient than running the workload fixed to
> > its
> > average P-state.
> > 
> > That said, this scenario doesn't really seem very relevant to your
> > case,
> > since the last workload you've provided turbostat traces for seems
> > to
> > show almost no oscillation.  If there was such an oscillation, your
> > total energy usage would still be greater for oscillations between
> > idle
> > and some P-state different from the most efficient one.  Such an
> > oscillation doesn't explain the anomaly we're seeing on your
> > traces,
> > which show more energy-efficient instantaneous behavior for a P-
> > state 2x
> > the one reported by your processor as the most energy-efficient.
> 
> All the turbostat output and graphs I have sent recently were just
> for
> continuous spinning:
> 
> for(;;);
> 
> Now I am trying running for the percentage of the time corresponding
> to
> 10 / P for pstate P (ie 0.5 of the time for pstate 20), and then
> sleeping,
> to see whether one can just add the sleeping power consumption of the
> machine to compute the efficiency as Rafael suggested.
> 
Before doing comparison try freezing uncore.

wrmsr -a 0x620 0x0808

to Freeze uncore at 800MHz. Any other value is fine.

Thanks,
Srinivas

> julia
> 
> > > > However, I think that you are interested in answering a
> > > > different
> > > > question: Given a specific amount of time (say T) to run the
> > > > workload,
> > > > what frequency to run the CPUs doing the work at in order to
> > > > get the
> > > > maximum amount of work done per unit of energy spent by the
> > > > system (as
> > > > a whole)?  Or, given 2 different frequency levels, which of
> > > > them to
> > > > run the CPUs at to get more work done per energy unit?
> > > 
> > > This is the approach where you assume that the machine will be
> > > idle in any
> > > leftover time.  And it accounts for the energy consumed in that
> > > idle time.
> > > 
> > > > The work / energy ratio can be estimated as
> > > > 
> > > > W / E = C * f / P(f)
> > > > 
> > > > where C is a constant and P(f) is the power drawn by the whole
> > > > system
> > > > while the CPUs doing the work are running at frequency f, and
> > > > of
> > > > course for the system discussed previously it is greater in the
> > > > 2 GHz
> > > > case.
> > > > 
> > > > However P(f) can be divided into two parts, P_1(f) that really
> > > > depends
> > > > on the frequency and P_0 that does not depend on it.  If P_0 is
> > > > large
> > > > enough to dominate P(f), which is the case in the 10-20 range
> > > > of
> > > > P-states on the system in question, it is better to run the
> > > > CPUs doing
> > > > the work faster (as long as there is always enough work to do
> > > > for
> > > > them; see below).  This doesn't mean that P(f) is not a convex
> > > > function of f, though.
> > > > 
> > > > Moreover, this assumes that there will always be enough work
> > > > for the
> > > > system to do when running the busy CPUs at 2 GHz, or that it
> > > > can go
> > > > completely idle when it doesn't do any work, but let's see what
> > > > happens if the amount of work to do is W_1 = C * 1 GHz * T and
> > > > the
> > > > system cannot go completely idle when the work is done.
> > > > 
> > > > Then, nothing changes for the busy CPUs running at 1 GHz, but
> > > > in the 2
> > > > GHz case we get W = W_1 and E = P(2 GHz) * T/2 + P_0 * T/2,
> > > > because
> > > > the busy CPUs are only busy 1/2 of the time, but power P_0 is
> > > > drawn by
> > > > the system regardless.  Hence, in the 2 GHz case (assuming P(2
> > > > GHz) =
> > > > 120 W and P_0 = 90 W), we get
> > > > 
> > > > W / E = 2 * C * 1 GHz / (P(2 GHz) + P_0) = 0.0095 * C * 1 GHz
> > > > 
> > > > which is slightly less than the W / E ratio at 1 GHz
> > > > approximately
> > > > equal to 0.01 * C * 1 GHz (assuming P(1 GHz) = 100 W), so in
> > > > these
> > > > conditions it would be better to run the busy CPUs at 1 GHz.
> > > 
> > > OK, I'll try to measure this.
> > > 
> > > thanks,
> > > julia

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ