lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 30 Oct 2019 08:54:35 -0700
From:   "Doug Smythies" <dsmythies@...us.net>
To:     "'Srinivas Pandruvada'" <srinivas.pandruvada@...ux.intel.com>,
        <vincent.guittot@...aro.org>
Cc:     <linux-kernel@...r.kernel.org>, <linux-pm@...r.kernel.org>,
        "'Peter Zijlstra'" <peterz@...radead.org>,
        "'Ingo Molnar'" <mingo@...nel.org>,
        "'Linus Torvalds'" <torvalds@...ux-foundation.org>,
        "'Thomas Gleixner'" <tglx@...utronix.de>, <sargun@...gun.me>,
        <tj@...nel.org>, <xiexiuqi@...wei.com>, <xiezhipeng1@...wei.com>
Subject: RE: [PATCH] Revert "sched/fair: Fix O(nr_cgroups) in the load balancing path"

On 2019.10.29 13:00 Srinivas Pandruvada wrote:
> On Tue, 2019-10-29 at 12:34 -0700, Srinivas Pandruvada wrote:
>> On Fri, 2019-10-25 at 08:55 -0700, Doug Smythies wrote:
>> 
>> [...]
>> 
>>> Experiment method:
>>> 
>>> enable only idle state 1
>> Dountil stopped
>>>   apply a 100% load (all CPUs)
>>>   after awhile (about 50 seconds) remove the load.
>>>   allow a short transient delay (1 second).
>>>   measure the processor package joules used over the next 149
>>> seconds.
>>> Enduntil
>>> 
>>> Kernel k5.4-rc2 + reversion (this method)
>>> Average processor package power: 9.148 watts (128 samples, > 7
>>> hours)
>>> Minimum: 9.02 watts
>>> Maximum: 9.29 watts
>>> Note: outlyer data point group removed, as it was assumed the
>>> computer
>>> had something to do and wasn't actually "idle".
>>> 
>>> Kernel 5.4-rc2:
>>> Average processor package power: 9.969 watts (150 samples, > 8
>>> hours)
>>> Or 9% more energy for the idle phases of the work load.
>>> Minimum: 9.15 watts
>>> Maximum: 13.79 watts (51% more power)

>>  Hi Doug,

Hi Srinivas,

>> 
>> Do you have intel_pstate_tracer output?

Yes, I have many many runs of intel_pstate_tracer.py
and many plots of pstate and CPU frequency lingering high
for a very very long time after the load is removed.
Here is one example (my reference: results/teo041):

The load is removed at test time 539.047 seconds,
and requested pstates do start to fall. Example,
cpu 4 at time 539.052, pstate request goes from
38 (the max for an i7-2600K) to 32. The last CPU
to reduce the pstate request is CPU 0 at time
539.714, but only to 25.

Then, CPU 4 doesn't run the driver for another
5.9 seconds, and even then only reduces its request
to pstate 21.

CPU 4 remains the defining CPU, and doesn't run the
driver again until time 577.235 seconds, at which time
its pstate request drops to 18, even with 0 load.
So, 38 seconds, and still only at pstate 18.

>> I guess that when started
>> request to measure the measure joules, it started at higher P-state
>> without revert.

No, not really. The main difference is in the time it takes to fully
drop to the lowest pstate.

>> Other way is check by fixing the max and min scaling frequency to
>> some frequency, then we shouldn't see power difference.

Yes, I did that, to learn the numbers.

> I mean not significant power difference.

For idle state 1, at least for my processor (i7-2600K), the difference
is huge. Keep in mind that (at least for my processor) a CPU in idle
state 1 does not relinquish its vote into the CPU frequency PLL, thus
the highest request dictates the CPU frequency.

Here are the idle state 1 powers (42 percent is the minimum for my
processor. For reference, with all idle state enabled, the idle
power is 3.68 watts and the processor package temperature is about
25 degrees, independent of the requested pstate):

Min-percent watts	temp
42		8.7	35
50		10.0	36
60		12.0	37
70		14.4	38
80		17.3	41
90		21	43
100		21	43

Note that the 90 (pstate 35) and 100 (pstate 38)
powers are the same due to this (I assume):

cpu5: MSR_TURBO_RATIO_LIMIT: 0x23242526
35 * 100.0 = 3500.0 MHz max turbo 4 active cores
36 * 100.0 = 3600.0 MHz max turbo 3 active cores
37 * 100.0 = 3700.0 MHz max turbo 2 active cores
38 * 100.0 = 3800.0 MHz max turbo 1 active cores

And can be verified by looking at the request
And granted MSRs directly:

At 100% min percent:

Requested:
doug@s15:~/temp-k-git/linux$ sudo rdmsr --bitfield 15:8 -d -a 0x199
38
38
38
38
38
38
38
38

Granted:
doug@s15:~/temp-k-git/linux$ sudo rdmsr --bitfield 15:8 -d -a 0x198
35
35
35
35
35
35
35
35

> Also to get real numbers, need
> to use some power meter measuring CPU power.

Well, I have one, but for the box AC only. It just didn't seem
worth the overhead. Yes, I used the joules MSR directly. I also
do a sanity check by checking that the processor package temperature
makes sense for the calculated processor package watts. I added a
temperature column above.

> If I can get your script,
> I may be able to measure that.

Hmmm... This was actually a saga all by itself, mainly my own
fault. I am running a bit of a mess here so that I could minimize 
the time between the multiple load drops from 100% to 0% so as
to make it easier to follow via the intel_pstate_tracer data.
Load methods aside, the rest is pretty simple:

doug@s15:~/idle$ cat load-no-load-forever2
#!/bin/dash

#
# load-no-load-forever2. Smyhies 2019.10.21
#       Just trying to get some debug data.
#       apply load (real, not via disabling all idle
#       states) then no load. loop forever.
#       load version 2.

echo "load-no-load-forever2. Start. Doug Smythies 2019.10.21"

while [ 1 ];
do
  ~/c/waiter 9 2 4 2000000000 0 1 > /dev/null
  sleep 1
  sudo ~/c/measure_energy 149
done

Where "measure_energy.c" just samples the
joules MSR over an interval, sometimes a longer
interval than turbostat will allow (with full
accuracy), because I know that the joules counter
did not wrap around.
In a separate e-mail I'll send you the c programs,
although I seem to recall that Intel strips out
attached c programs from e-mails.

Hope this helps.

... Doug


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ