lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080625191100.GI21892@dirshya.in.ibm.com>
Date:	Thu, 26 Jun 2008 00:41:00 +0530
From:	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
To:	Linux Kernel <linux-kernel@...r.kernel.org>,
	Suresh B Siddha <suresh.b.siddha@...el.com>,
	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Dipankar Sarma <dipankar@...ibm.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Vatsa <vatsa@...ux.vnet.ibm.com>,
	Gautham R Shenoy <ego@...ibm.com>
Subject: [RFC v1] Tunable sched_mc_power_savings=n

Hi,

The existing power saving loadbalancer CONFIG_SCHED_MC attempts to run
the workload in the system on minimum number of CPU packages and tries
to keep rest of the CPU packages idle for longer duration. Thus
consolidating workloads to fewer packages help other packages to be in
idle state and save power.

echo 1 > /sys/devices/system/cpu/sched_mc_power_savings is used to
turn on this feature.

When enabled, this tunable would influence the loadbalancer decision
in find_busiest_group().  Two parameters are extracted at the this
time.  group_leader is the group that is almost full and has just
enough capacity to pull few (one) tasks while group_min is the group
that has too few tasks and if we can move them to group_leader, then
this group can go completely idle.

The default criteria to select group_leader and group_min would catch
long running threads on various packages and pull them to single
package.  The group_capacity limits the number of tasks that is being
pulled and we are expected to have one task per core in a package and
all the core in a package are loaded.

This default criteria for selection when sched_mc_power_savings=1 has
a good balance of power savings and least performance impact.  The
conservative approach taken towards consolidation makes the selection
criteria workload dependent.  Long running steady state workloads are
placed correct, but not bursty workload.  

The idea being proposed is to enhance the tunable with varied degrees
of consolidation that can work best for different workload
characteristics.  echo 2 > /sys/.../sched_mc_power_savings could
enable more aggressive consolidation than the default.

I am presently working on different criteria that can help consolidate
different types of workload with varied degrees of power savings and
performance impact.  

Advantages:

* Enterprise workloads on large hardware configurations may need
  aggressive consolidation strategy
* Performance impact on server is different from desktop or laptops.
  Interactivity is less of a concern on large enterprise servers while
  workload response times and performance per watt is more significant
* Aggressive power savings even with marginal performance penalty is
  is a useful tunable for servers since it may provide good
  performance-per-watt at low utilisation
* This tunable can influence other parts of scheduler like wakeup
  biasing for overall task consolidation  

Proposed changes:

* Add more values to sched_mc_power_savings tunable (bit flags?)
* Enable different consolidation strategy based on the value
* Evaluate different strategy against different workloads and design
  heuristics for auto tuning
* Modify selection of group_leader by changing the spare capacity
  evaluation 
* Increase group capacity of the group leader to avoid pulling tasks
  away from group_leader within a short time
* Choose different load_idx while evaluating and selecting the load
* Use the sched_mc_power_savings settings outside of load balancer
  like in task wakeup biasing
* Design power saving loadbalancer in combination with process wakeup
  biasing in order to consolidate bursty and short running jobs to
  less CPU packages in an idle or under-utilised system.

Disadvantages:

* More tunable settings will lead to sub-optimal performance if not
  exploited correctly.  Once the tunable criteria is established and
  we have good heuristics, we can have a default setting that can
  automatically choose the right technique.

I will send the changes in criteria and their impact in subsequent
RFCs.  I would like to solicit feedback on the overall idea and inputs
from people who have already attempted similar changes.

Thanks,
Vaidy


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ