linux-kernel - Re: [PATCH RFC 0/4] Scheduler idle notifiers and users

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <84EBD7CD-1085-4B33-BF71-8CE104AE2933@antoniou-consulting.com>
Date:	Tue, 21 Feb 2012 15:31:27 +0200
From:	Pantelis Antoniou <panto@...oniou-consulting.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Russell King - ARM Linux <linux@....linux.org.uk>,
	Saravana Kannan <skannan@...eaurora.org>,
	Ingo Molnar <mingo@...e.hu>, linaro-kernel@...ts.linaro.org,
	Nicolas Pitre <nico@...xnic.net>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Oleg Nesterov <oleg@...hat.com>, cpufreq@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Anton Vorontsov <anton.vorontsov@...aro.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Mike Chan <mike@...roid.com>, Dave Jones <davej@...hat.com>,
	Todd Poynor <toddpoynor@...gle.com>, kernel-team@...roid.com,
	linux-arm-kernel@...ts.infradead.org,
	Arjan Van De Ven <arjan@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH RFC 0/4] Scheduler idle notifiers and users


On Feb 21, 2012, at 2:56 PM, Peter Zijlstra wrote:

> On Tue, 2012-02-21 at 14:38 +0200, Pantelis Antoniou wrote:
>> 
>> If we go to all the trouble of integrating cpufreq/cpuidle/sched into scheduler
>> callbacks, we should place hooks into the thermal framework/PM as well.
>> 
>> It will pretty common to have per core temperature readings, on most
>> modern SoCs. 
>> 
>> It is quite conceivable to have a case with a multi-core CPU where due
>> to load imbalance, one (or more) of the cores is running at full speed
>> while the rest are mostly idle. What you want do, for best performance
>> and conceivably better power consumption, is not to throttle either 
>> frequency or lowers voltage to the overloaded CPU but to migrate the
>> load to one of the cooler CPUs.
>> 
>> This affects CPU capacity immediately, i.e. you shouldn't schedule more
>> load on a CPU that its too hot, since you'll only end up triggering thermal 
>> shutdown. The ideal solution would be to round robin
>> the load from the hot CPU to the cooler ones, but not so fast that we lose
>> due to the migration of state from one CPU to the other.
>> 
>> In a nutshell, the processing capacity of a core is not static, i.e. it
>> might degrade over time due to the increase of temperature caused by the
>> previous load.
>> 
>> What do you think? 
> 
> This is called core-hopping, and yes that's a nice goal, although I
> would like to do that after we get the 'simple' bits up and running. I
> suspect it'll end up being slightly more complex than we'd like to due
> to the fact that the goal conflicts with wanting to aggregate things on
> cpu0 due to cpu0 being special for a host of reasons.
> 
> 

Hi Peter,

Agreed. We need to get there step by step, and I think that per-task load tracking
is the first one. We do have other metrics besides load that can influence the
scheduler decisions, with the most obvious being power consumption.

BTW, since we're going to the trouble of calculating per-task load with 
increased accuracy, how about having some thought of translating the load numbers
in an absolute format. I.e. with the CPUs now having fluctuating performance
(due to cpufreq etc.) one would say that each CPU would have an X bogomips 
(or some else absolute) capacity per OPP. Perhaps having such a bogomips number
calculated per-task would make things easier.

Perhaps the same can be done with power/energy, i.e. have a per-task power
consumption figure that we can use for scheduling, according to the available
power budget per CPU.

Dunno, it might not be feasible ATM, but having a power-aware scheduler would
assume some kind of power measurement, no?

Regards

-- Pantelis


 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/