lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 8 Feb 2010 15:35:55 +0530
From:	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
To:	Suresh B Siddha <suresh.b.siddha@...el.com>,
	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Ingo Molnar <mingo@...e.hu>, Gautham R Shenoy <ego@...ibm.com>,
	Arun Bharadwaj <arun@...ux.vnet.ibm.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: BUG: sched_mc_powersavings broken on pre-Nehalem x86 platforms

Hi Peter,

sched_mc_powersavings is broken in pre-Nehalem x86 platforms due to
contradictory SD flags at MC level and CPU level.  SD_PREFER_SIBLING being set
at MC level is expected to do the following:

a) Disable consolidating tasks to single group in the parent sched domain
(generally single cpu package) 

b) Spread tasks equally across groups at the parent sched domain.

While SD_POWERSAVINGS_BALANCE set at a sched domain will enable logic to
consolidate tasks within minimum number of groups at that sched domain.

Basically SD_POWERSAVINGS_BALANCE at one sched domain and its child domain
having SD_PREFER_SIBLING is contradicting and disabling the
SD_POWERSAVINGS_BALANCE logic in 

        if (local_group && (sds->this_nr_running >= sgs->group_capacity ||
                                !sds->this_nr_running))
                sds->power_savings_balance = 0;

Since sgs.group_capacity is set to '1' by SD_PREFER_SIBLING in child
sched domain.

The attached patch will fix the expected behavior for sched_mc_powersavings > 0
while objective (b) is still an open issue.

The following condition in find_busiest_group()
	sds.max_load <= sds.busiest_load_per_task

	treats unequally loaded groups as balanced as longs they are below
	capacity

Test Results:

The following patch was tested on dual socket quad core non-threaded Xeon:

Running 4 while(1) loops in shell:

echo 1 > /sys/devices/system/cpu/sched_mc_powersavings

Without Patch:
        Running 1 task in one quad core package and 3 in another.
        This is effectively the baseline behavior with sched_mc=0

With patch:
        All 4 tasks running in one quad core package.
        Expected behavior for sched_mc_powersavings>0

--Vaidy

    Fix for sched_mc_powersavigs for pre-Nehalem platforms.
    Child sched domain should clear SD_PREFER_SIBLING if parent will have 
    SD_POWERSAVINGS_BALANCE because they are contradicting.

    Sets the flags correctly based on sched_mc_power_savings.
    
    Signed-off-by: Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6550415..ef6b7cd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -866,7 +866,10 @@ static inline int sd_balance_for_mc_power(void)
 	if (sched_smt_power_savings)
 		return SD_POWERSAVINGS_BALANCE;
 
-	return SD_PREFER_SIBLING;
+	if (!sched_mc_power_savings)
+		return SD_PREFER_SIBLING;
+
+	return 0;
 }
 
 static inline int sd_balance_for_package_power(void)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists