lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 10 Sep 2014 09:39:30 -0700
From:	Dirk Brandewie <dirk.brandewie@...il.com>
To:	Anup Chenthamarakshan <anupc@...omium.org>,
	Dirk Brandewie <dirk.brandewie@...il.com>
CC:	Sameer Nanda <snanda@...omium.org>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] intel_pstate: track and export frequency residency stats
 via sysfs.

On 09/09/2014 04:22 PM, Anup Chenthamarakshan wrote:
> On Tue, Sep 09, 2014 at 08:15:13AM -0700, Dirk Brandewie wrote:
>> On 09/08/2014 05:10 PM, Anup Chenthamarakshan wrote:
>>> Exported stats appear in
>>> <sysfs>/devices/system/cpu/intel_pstate/time_in_state as follows:
>>>
>>> ## CPU 0
>>> 400000 3647
>>> 500000 24342
>>> 600000 144150
>>> 700000 202469
>>> ## CPU 1
>>> 400000 4813
>>> 500000 22628
>>> 600000 149564
>>> 700000 211885
>>> 800000 173890
>>>
>>> Signed-off-by: Anup Chenthamarakshan <anupc@...omium.org>
>>
>> What is this information being used for?
>
> I'm using P-state residency information in power consumption tests to calculate
> proportion of time spent in each P-state across all processors (one global set
> of percentages, corresponding to each P-state). This is used to validate new
> changes from the power perspective. Essentially, sanity checks to flag changes
> with large difference in P-state residency.
>
> So far, we've been using the data exported by acpi-cpufreq to track this.
>
>>
>> Tracking the current P state request for each core is only part of the
>> story.  The processor aggregates the requests from all cores and then decides
>> what frequency the package will run at, this evaluation happens at ~1ms time
>> frame.  If a core is idle then it loses its vote for that package frequency will
>> be and its frequency will be zero even though it may have been requesting
>> a high P state when it went idle.  Tracking the residency of the requested
>> P state doesn't provide much useful information other than ensuring the the
>> requests are changing over time IMHO.
>
> This is exactly why we're trying to track it.

My point is that you are tracking the residency of the request and not
the P state the package was running at.  On a lightly loaded system
it is not unusual for a core that was very busy and requesting a high
P state to go idle for several seconds.  In this case that core would
lose its vote for the package P state but the stats would show that
the P state was high for a very long time when its real frequency
was zero.

There are a couple of ways to get what I consider better information
about what is actually going on.

   The current turbostat provides C state residency and calculates the
   average/effective frequency of the core over its sample time.
   Turbostat will also measure the power consumption from the CPU point
   of view if your processor supports the RAPL registers.

   Reading MSR 0x198 MSR_IA32_PERF_STATUS will tell you what the core
   would run at if it not idle, this reflects the decision that the
   package made based on current requests.

   Using perf to collect power:pstate_sample event will give information
   about each sample on the core and give you timestamps to detect idle
   times.

   Using perf to collect power:cpu_frequency will show when the P state
   request was changed on each core and is triggered by intel_pstate and
   acpi_cpufreq.

   Powertop collects that same information as turbostat and a bunch of
   other information useful in seeing where you could be burning power
   for no good reason.

For getting an idea of real power turbostat is the easiest to use and
is available on most systems.  Using perf will give you a very fine grained
view of what is going on as well as point to the culprit for bad
behaviour in most cases.

>
>>
>> This interface will not be supportable with upcoming processors using
>> hardware P states as documented in volume 3 of the current SDM Section 14.4
>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>> The OS will have no way of knowing what the P state requests are for a
>> given core are.
>
> Will there be any means to determine the proportion of time spent in different
> HWP-states when HWP gets enabled (maybe at a package level)?
>
Not that I am aware of :-(  There is MSR_PPERF section 14.4.5.1 that will give
the CPUs view of the amount of productive work/scalability of the current load.

--Dirk
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ