[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54107EC2.1010501@intel.com>
Date: Wed, 10 Sep 2014 09:39:30 -0700
From: Dirk Brandewie <dirk.brandewie@...il.com>
To: Anup Chenthamarakshan <anupc@...omium.org>,
Dirk Brandewie <dirk.brandewie@...il.com>
CC: Sameer Nanda <snanda@...omium.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Viresh Kumar <viresh.kumar@...aro.org>,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] intel_pstate: track and export frequency residency stats
via sysfs.
On 09/09/2014 04:22 PM, Anup Chenthamarakshan wrote:
> On Tue, Sep 09, 2014 at 08:15:13AM -0700, Dirk Brandewie wrote:
>> On 09/08/2014 05:10 PM, Anup Chenthamarakshan wrote:
>>> Exported stats appear in
>>> <sysfs>/devices/system/cpu/intel_pstate/time_in_state as follows:
>>>
>>> ## CPU 0
>>> 400000 3647
>>> 500000 24342
>>> 600000 144150
>>> 700000 202469
>>> ## CPU 1
>>> 400000 4813
>>> 500000 22628
>>> 600000 149564
>>> 700000 211885
>>> 800000 173890
>>>
>>> Signed-off-by: Anup Chenthamarakshan <anupc@...omium.org>
>>
>> What is this information being used for?
>
> I'm using P-state residency information in power consumption tests to calculate
> proportion of time spent in each P-state across all processors (one global set
> of percentages, corresponding to each P-state). This is used to validate new
> changes from the power perspective. Essentially, sanity checks to flag changes
> with large difference in P-state residency.
>
> So far, we've been using the data exported by acpi-cpufreq to track this.
>
>>
>> Tracking the current P state request for each core is only part of the
>> story. The processor aggregates the requests from all cores and then decides
>> what frequency the package will run at, this evaluation happens at ~1ms time
>> frame. If a core is idle then it loses its vote for that package frequency will
>> be and its frequency will be zero even though it may have been requesting
>> a high P state when it went idle. Tracking the residency of the requested
>> P state doesn't provide much useful information other than ensuring the the
>> requests are changing over time IMHO.
>
> This is exactly why we're trying to track it.
My point is that you are tracking the residency of the request and not
the P state the package was running at. On a lightly loaded system
it is not unusual for a core that was very busy and requesting a high
P state to go idle for several seconds. In this case that core would
lose its vote for the package P state but the stats would show that
the P state was high for a very long time when its real frequency
was zero.
There are a couple of ways to get what I consider better information
about what is actually going on.
The current turbostat provides C state residency and calculates the
average/effective frequency of the core over its sample time.
Turbostat will also measure the power consumption from the CPU point
of view if your processor supports the RAPL registers.
Reading MSR 0x198 MSR_IA32_PERF_STATUS will tell you what the core
would run at if it not idle, this reflects the decision that the
package made based on current requests.
Using perf to collect power:pstate_sample event will give information
about each sample on the core and give you timestamps to detect idle
times.
Using perf to collect power:cpu_frequency will show when the P state
request was changed on each core and is triggered by intel_pstate and
acpi_cpufreq.
Powertop collects that same information as turbostat and a bunch of
other information useful in seeing where you could be burning power
for no good reason.
For getting an idea of real power turbostat is the easiest to use and
is available on most systems. Using perf will give you a very fine grained
view of what is going on as well as point to the culprit for bad
behaviour in most cases.
>
>>
>> This interface will not be supportable with upcoming processors using
>> hardware P states as documented in volume 3 of the current SDM Section 14.4
>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
>> The OS will have no way of knowing what the P state requests are for a
>> given core are.
>
> Will there be any means to determine the proportion of time spent in different
> HWP-states when HWP gets enabled (maybe at a package level)?
>
Not that I am aware of :-( There is MSR_PPERF section 14.4.5.1 that will give
the CPUs view of the amount of productive work/scalability of the current load.
--Dirk
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists