lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 13 May 2009 18:41:00 +0530
From:	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
To:	Linux Kernel <linux-kernel@...r.kernel.org>,
	Suresh B Siddha <suresh.b.siddha@...el.com>,
	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arjan van de Ven <arjan@...radead.org>
Cc:	Ingo Molnar <mingo@...e.hu>, Dipankar Sarma <dipankar@...ibm.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Vatsa <vatsa@...ux.vnet.ibm.com>,
	Gautham R Shenoy <ego@...ibm.com>,
	Andi Kleen <andi@...stfloor.org>,
	Gregory Haskins <gregory.haskins@...il.com>,
	Mike Galbraith <efault@....de>,
	Thomas Gleixner <tglx@...utronix.de>,
	Arun Bharadwaj <arun@...ux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
Subject: [RFC PATCH v2 0/2] Saving power by cpu evacuation
	sched_max_capacity_pct=n

Hi,

The idea of extending sched_mc_powersavings tunable for cpu evacuation
was discussed at http://lwn.net/Articles/330309/ 

The summary of the discussion is as follows:

* Using sched_mc=3,4,5 to evacuate 1,2,4 cores is completely
  non-intuitive and broken interface.  Ingo wanted to see if we can
  model a global percentile tunable that would map to core throttling.

* Peter Zijlstra wanted more justifications for throttling at the core
  level.  Throttling may be a resource management problem rather than
  scheduler/load balancer

* CPU hotplug and cpuset/cgroup based cpu throttling are viable
  alternatives to this approach.  

Changes in v2:

* Created a percentage knob sched_max_capacity_pct=n
  Defaults to 100, can be set to 75 or 50 to evacuate cores

* This patch is still a hack for discussion and has many
  limitations.

v1: http://lkml.org/lkml/2009/4/26/202

Into and parts from previous post for quick reference:
------------------------------------------------------

Objective:
----------

* Framework to evacuate tasks from cpus in order to force the cpu
  cores to stay at idle.  Forcefully idling cores and packages can
  reduce power consumption.

* Fast response time and low OS overhead to moved tasks away from
  selected cpu packages.  CPU hotplug is too heavyweight for this
  purpose

Use cases:
---------

* Ability to throttle the number of cores used in the system along
  with other power saving controls like cpufreq governors can enable
  the system to operate at a more power efficient operating point and
  still meet the design objectives.
 
* Facilitate thermal management by evacuating cores from hot cpu
  packages

Alternatives:
-------------

* CPU hotplug: Heavy weight and slow.  Setting up and tear down of
  data structures involved.  May need new fast or light weight
  notifications

* CPUSets: Exclusive CPU sets and partitioned sched domains involve
  rebuilding sched domains and relatively heavy weight for the purpose

The following patch is against 2.6.30-rc5 and will work only in an
under utilised system (No of tasks <= number of cores).

Test results for ebizzy 8 threads at various sched_max_capacity_pct
settings. The test platform is dual socket quad core x86 system
(pre-Nehalem).

This is an interesting characteristics of the ebizzy benchmark where
the following command line improved in performance as we evacuated
cores!  Perhaps cross-cache traffic... I will verify that next time.

ebizzy -s 4096 -t 8 -S 30

sched_mc_power_savings was set to 2 in the experiment

-----------------------------------------------------------------
sched_max_capacity_pct	No Cores	Performance	AvgPower	
			used		Records/sec	(Watts)
-----------------------------------------------------------------
100			8		1.00x		1.00y
 87			7		1.03x		0.98y
 75			6		1.06x		0.95y
 62			5		1.26x		0.91y
 50			4		1.15x		0.86y
-----------------------------------------------------------------
		
There were wide run variation with ebizzy.  The purpose of the above
data is to justify use of core evacuation for power vs performance
trade-offs.  The patch does not yet work for kernbench and other
complex workloads/benchmarks. I even tried SPECjbb and did not get the
expected CPU utilisation at various settings to reduce power
consumption.  The utilisation/power was much lower than expected.

ToDo:
-----

* Identify good benchmark to demonstrate benefits of cpu evacuation

* Make the core evacuation predictable under different system load
  conditions and workload characteristics.  This is turning out to be
  a major challenge in this approach.

* Enhance framework to control which particular packages/cores will be
  evacuated, this is needed for thermal management.  The
  CPU hotplug/cpuset approach will solve this problem.

I can experiment with different benchmarks/platforms and post results
while the framework is being discussed.

Please let me know you comments and suggestions.

Thanks,
Vaidy

---

Vaidyanathan Srinivasan (2):
      sched: loadbalancer hacks for forced packing of tasks
      sched: add sched_max_capacity_pct


 kernel/sched.c |   65 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 64 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ