linux-kernel - Re: [PATCH 1/3] Added runqueue clock normalized with cpufreq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D0E9F20.6080606@sssup.it>
Date:	Mon, 20 Dec 2010 01:11:12 +0100
From:	Tommaso Cucinotta <tommaso.cucinotta@...up.it>
To:	Harald Gustafsson <hgu1972@...il.com>
CC:	Peter Zijlstra <peterz@...radead.org>,
	Dario Faggioli <raistlin@...ux.it>,
	Harald Gustafsson <harald.gustafsson@...csson.com>,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Claudio Scordino <claudio@...dence.eu.com>,
	Michael Trimarchi <trimarchi@...is.sssup.it>,
	Fabio Checconi <fabio@...dalf.sssup.it>,
	Juri Lelli <juri.lelli@...il.com>
Subject: Re: [PATCH 1/3] Added runqueue clock normalized with cpufreq

Il 17/12/2010 20:31, Harald Gustafsson ha scritto:
>>> We already did the very same thing (for another EU Project called
>>> FRESCOR), although it was done in an userspace sort of daemon. It was
>>> also able to consider other "high level" parameters like some estimation
>>> of the QoS of each application and of the global QoS of the system.
>>>
>>> However, converting the basic mechanism into a CPUfreq governor should
>>> be easily doable... The only problem is finding the time for that! ;-P
>> Ah, I think Harald will solve that for you,.. :)
> Yes, I don't mind doing that. Could you point me to the right part of
> the FRESCOR code, Dario?

Hi there,

I'm sorry to join so late this discussion, but the unprecedented 20cm of 
snow in Pisa had some non-negligible drawbacks on my return flight from 
Perth :-).

Let me try to briefly recap what the outcomes of FRESCOR were, w.r.t. 
power management (but usually I'm not that brief :-) ):

1. from a requirements analysis phase, it comes out that it should be 
possible to specify the individual runtimes for each possible frequency, 
as it is well-known that the way computation times scale to CPU 
frequency is application-dependent (and platform-dependent); this 
assumes that as a developer I can specify the possible configurations of 
my real-time app, then the OS will be free to pick the CPU frequency 
that best suites its power management logic (i.e., keeping the minimum 
frequency by which I can meet all the deadlines).

   Requirements Analysis:
   
http://www.frescor.org/index.php?mact=Uploads,cntnt01,getfile,0&cntnt01showtemplate=false&cntnt01upload_id=62&cntnt01returnid=54

   Proposed API:
   
http://www.frescor.org/index.php?mact=Uploads,cntnt01,getfile,0&cntnt01showtemplate=false&cntnt01upload_id=105&cntnt01returnid=54

   I also attach the API we implemented, however consider it is a mix of 
calls for doing both what I wrote above, and building an OS-independent 
abstraction layer for dealing with CPU frequency scaling (and not only) 
on the heterogeneous OSes we had in FRESCOR;

2. this was also assuming, at an API level, a quite static settings 
(typical of hard RT), in which I configure the system and don't change 
its frequency too often; for example, implications of power switches on 
hard real-time requirements (i.e., time windows in which the CPU is not 
operating during the switch, and limits on the max sustainable switching 
frequencies by apps and the like) have not been stated through the API;

3. for soft real-time contexts and Linux (consider FRESCOR targeted both 
hard RT on RT OSes and soft RT on Linux), we played with a much simpler 
trivial linear scaling, which is exactly what has been proposed and 
implemented by someone in this thread on top of SCHED_DEADLINE (AFAIU); 
however, there's a trick which cannot be neglected, i.e., *change 
protocol* (see 5); benchmarks on MPEG-2 decoding times showed that the 
linear approximation is not that bad, but the best interpolating ratio 
between the computing times in different CPU frequencies do not 
perfectly conform to the frequencies ratios; we didn't make any attempt 
of extensive evaluation over different workloads so far. See Figure 4.1 
in D-AQ2v2:

   
http://www.frescor.org/index.php?mact=Uploads,cntnt01,getfile,0&cntnt01showtemplate=false&cntnt01upload_id=82&cntnt01returnid=54

4. I would say that, given the tendency to over-provision the runtime 
(WCET) for hard real-time contexts, it would not bee too much of a 
burden for a hard RT developer to properly over-provision the required 
budget in presence of a trivial runtime rescaling policy like in 2.; 
however, in order to make everybody happy, it doesn't seem a bad idea to 
have something like:
   4a) use the fine runtimes specified by the user if they are available;
   4b) use the trivially rescaled runtimes if the user only specified a 
single runtime, of course it should be clear through the API what is the 
frequency the user is referring its runtime to, in such case (e.g., 
maximum one ?)

5. Mode Change Protocol: whenever a frequency switch occurs (e.g., 
dictated by the non-RT workload fluctuations), runtimes cannot simply be 
rescaled instantaneously: keeping it short, the simplest thing we can do 
is relying on the various CBS servers implemented in the scheduler to 
apply the change from the next "runtime recharge", i.e., the next 
period. This creates the potential problem that the RT tasks have a 
non-negligible transitory for the instances crossing the CPU frequency 
switch, in which they do not have enough runtime for their work. Now, 
the general "rule of thumb" is straightforward: make room first, then 
"pack", i.e., we need to consider 2 distinct cases:

   5a) we want to *increase the CPU frequency*; we can immediately 
increase the frequency, then the RT applications will have a temporary 
over-provisioning of runtime (still tuned for the slower frequency 
case), however as soon as we're sure the CPU frequency switch completed, 
we can lower the runtimes to the new values;

   5b) we want to *decrease the CPU frequency*; unfortunately, here we 
need to proceed in the other way round: first, we need to increase the 
runtimes of the RT applications to the new values, then, as soon as 
we're sure all the scheduling servers made the change (waiting at most 
for a time equal to the maximum configured RT period), then we can 
actually perform the frequency switch. Of course, before switching the 
frequency, there's an assumption: that the new runtimes after the freq 
decrease are still schedulable, so the CPU freq switching logic needs to 
be aware of the allocated RT reservations.

The protocol in 5. has been implemented completely in user-space as a 
modification to the powernowd daemon, in the context of an extended 
version of a paper in which we were automagically guessing the whole set 
of scheduling parameters for periodic RT applications (EuroSys 2010). 
The modified powernowd was considering both the whole RT utilization as 
imposed by the RT reservations, and the non-RT utilization as measured 
on the CPU. The paper will appear on ACM TECS, but who knows when, so 
here u can find it (see Section 7.5 "Power Management"):

   http://retis.sssup.it/~tommaso/publications/ACM-TECS-2010.pdf

(last remark: no attempt to deal with multi-cores and their various 
power switching capabilities, on this paper . . .)

Last, but not least, the whole point in the above discussion is the 
assumption that it is meaningful to have a CPU frequency switching 
policy, as opposed to merely CPU idle-ing. Perhaps on old embedded CPUs 
this is still the case. Unfortunately, from preliminary measurements 
made on a few systems I use every day through a cheap power measurement 
device attached on the power cable, I could actually see that for RT 
workloads only it is worth to leave the system at the maximum frequency 
and exploit the much higher time spent in idle mode(s), except when the 
system is completely idle.

If you're interested, I can share the collected data sets.

Bye (and apologies for the length).

     T.

-- 
Tommaso Cucinotta, Computer Engineering PhD, Researcher
ReTiS Lab, Scuola Superiore Sant'Anna, Pisa, Italy
Tel +39 050 882 024, Fax +39 050 882 003
http://retis.sssup.it/people/tommaso


View attachment "frsh_energy_management.h" of type "text/x-chdr" (11690 bytes)