[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0hsCjKA3EisK9s_S8Vb9Tgm4eps1FTKvUSfd9_JPh5wBQ@mail.gmail.com>
Date: Mon, 3 Jan 2022 20:58:48 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Julia Lawall <julia.lawall@...ia.fr>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
Francisco Jerez <currojerez@...eup.net>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
Len Brown <lenb@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Linux PM <linux-pm@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: cpufreq: intel_pstate: map utilization into the pstate range
On Mon, Jan 3, 2022 at 7:23 PM Julia Lawall <julia.lawall@...ia.fr> wrote:
>
> > > > Can you please run the 32 spinning threads workload (ie. on one
> > > > package) and with P-state locked to 10 and then to 20 under turbostat
> > > > and send me the turbostat output for both runs?
> > >
> > > Attached.
> > >
> > > Pstate 10: spin_minmax_10_dahu-9_5.15.0freq_schedutil_11.turbo
> > > Pstate 20: spin_minmax_20_dahu-9_5.15.0freq_schedutil_11.turbo
> >
> > Well, in both cases there is only 1 CPU running and it is running at
> > 1 GHz (ie. P-state 10) all the time as far as I can say.
>
> It looks better now. I included 1 core (core 0) for pstates 10, 20, and
> 21, and 32 cores (socket 0) for the same pstates.
OK, so let's first consider the runs where 32 cores (entire socket 0)
are doing the work.
This set of data clearly shows that running the busy cores at 1 GHz
takes less energy than running them at 2 GHz (the ratio of these
numbers is roughly 2/3 if I got that right). This means that P-state
10 is more energy efficient than P-state 20, as expected.
However, the cost of running at 2.1 GHz is much greater than the cost
of running at 2 GHz and I'm still thinking that this is attributable
to some kind of voltage increase between P-state 20 and P-state 21
(which, interestingly enough, affects the second "idle" socket too).
In the other set of data, where only 1 CPU is doing the work, P-state
10 is still more energy-efficient than P-state 20, but it takes more
time to do the work at 1 GHz, so the energy lost due to leakage
increases too and it is "leaked" by all of the CPUs in the package
(including the idle ones in core C-states), so overall this loss
offsets the gain from using a more energy-efficient P-state. At the
same time, socket 1 can spend more time in PC2 when the busy CPU is
running at 2 GHz (which means less leakage in that socket), so with 1
CPU doing the work the total cost of running at 2 GHz is slightly
smaller than the total cost of running at 1 GHz. [Note how important
it is to take the other CPUs in the system into account in this case,
because there are simply enough of them to affect one-CPU measurements
in a significant way.]
Still, when going from 2 GHz to 2.1 GHz, the voltage jump causes the
energy to increase significantly again.
Powered by blists - more mailing lists