lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Fri, 17 Apr 2009 13:41:17 +0800
From:	Miao Xie <miaox@...fujitsu.com>
To:	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
CC:	Linux-Kernel <linux-kernel@...r.kernel.org>
Subject: [RFC][PATCH] sched: fix the nice-unfairness on SMP when offline a
 CPU

I tested the fairness of scheduler on my multi-core box(2 CPUs * 2 Cores), and
found the nice-fairness was broken when I offlined a CPU. The CPU time gotten
by half of tasks was half as much as the others.

A test program which reproduces the problem on current kernel is attached.
This program forks a lot of child tasks, then the parent task gets the loop
count of every task and figures out the average and standard deviation every
5 seconds. (All of the child tasks do the same work - repeat doing sqrt)

Steps to reproduce:
 # echo 0 > /sys/devices/system/cpu/cpu3/online
 # ./sched-fair -p 8 -i 5 -v

By debuging, we found it is caused by the __cpu_power of the sched group. If
I offlined a CPU, the partition of sched groups in the CPU-level sched domain
is:
	+-----------+----------+
	| CPU0 CPU1 |   CPU2   |
	+-----------+----------+
and the __cpu_power of each sched group was 1024. It is strange that the first
sched group had two logic CPUs, the __cpu_power should be double times of the
second sched group. If both of the sched groups' __cpu_power was 1024, the load
balance program would balance the load fifty-fifty between these two sched
group, so half of the test tasks was moved to logic CPU2, and they got less CPU
time.

The code that caused this problem is following:
static void init_sched_groups_power(int cpu, struct sched_domain *sd)
{
	[snip]
	/*
	 * For perf policy, if the groups in child domain share resources
	 * (for example cores sharing some portions of the cache hierarchy
	 * or SMT), then set this domain groups cpu_power such that each group
	 * can handle only one task, when there are other idle groups in the
	 * same sched domain.
	 */
	if (!child || (!(sd->flags & SD_POWERSAVINGS_BALANCE) &&
		       (child->flags &
			(SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES)))) {
		sg_inc_cpu_power(sd->groups, SCHED_LOAD_SCALE);
		return;
	}
	[snip]
}
According to the above comment, this design was in view of performance. But I
found there was no regression after applying this patch.

Test result on multi-core x86_64 box:
Before applying this patch:
AVERAGE		STD-DEV
1297.500	432.518

After applying this patch:
AVERAGE		STD-DEV
1297.250	118.857

Test result on hyper-threading x86_64 box:
Before applying this patch:
AVERAGE		STD-DEV
536.750		176.265

After applying this patch:
AVERAGE		STD-DEV
535.625		53.979

Maybe we need more test for it.

Signed-off-by: Miao Xie <miaox@...fujitsu.com>
---
 kernel/sched.c |   11 +----------
 1 files changed, 1 insertions(+), 10 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 5724508..07b08b2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7956,16 +7956,7 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
 
 	sd->groups->__cpu_power = 0;
 
-	/*
-	 * For perf policy, if the groups in child domain share resources
-	 * (for example cores sharing some portions of the cache hierarchy
-	 * or SMT), then set this domain groups cpu_power such that each group
-	 * can handle only one task, when there are other idle groups in the
-	 * same sched domain.
-	 */
-	if (!child || (!(sd->flags & SD_POWERSAVINGS_BALANCE) &&
-		       (child->flags &
-			(SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES)))) {
+	if (!child) {
 		sg_inc_cpu_power(sd->groups, SCHED_LOAD_SCALE);
 		return;
 	}
-- 
1.6.0.3


View attachment "sched-fair.c" of type "text/x-csrc" (9884 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ