lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20100409.112013.200619992.igawa@mxs.nes.nec.co.jp>
Date:	Fri, 09 Apr 2010 11:20:13 +0900 (JST)
From:	Masayuki Igawa <igawa@....nes.nec.co.jp>
To:	peterz@...radead.org
Cc:	sjayaraman@...e.de, linux-kernel@...r.kernel.org, mingo@...e.hu
Subject: Re: High priority threads causing severe CPU load imbalances

From: Peter Zijlstra <peterz@...radead.org>
Subject: Re: High priority threads causing severe CPU load imbalances
Date: Thu, 08 Apr 2010 18:15:44 +0200

> On Tue, 2010-04-06 at 22:05 +0530, Suresh Jayaraman wrote:
>> Perhaps there is a chance that with more CPUs, different number of high
>> priority threads the problem could get worser as I mentioned above..?
> 
> One thing that could be happening (triggered by what Igawa-san said,
> although his case is more complicated by involving the cgroup stuff) is
> that f_b_g() ends up selecting a group that contains these niced tasks
> and then f_b_q() will not find a suitable source queue because all of
> them will have but a single runnable task on it and hence we simply
> bail.
> 
> We'd somehow have to teach update_*_lb_stats() not to consider groups
> where nr_running <= nr_cpus. I don't currently have a patch for that,
> but I think that is the direction you might need to look in.

I made a patch for my understanding the load_balance()'s behavior.
This patch reduced CPU load imbalances but not perfect.
---
Cpu0  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 90.1%us,  0.0%sy,  0.0%ni,  9.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  : 98.7%us,  0.3%sy,  0.0%ni,  1.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  : 96.1%us,  1.0%sy,  0.0%ni,  3.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  : 99.0%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu7  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8032460k total,   807628k used,  7224832k free,    30692k buffers
Swap:        0k total,        0k used,        0k free,   347308k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND         
 9872 root      20   0 66128  632  268 R   99  0.0   0:13.69 4 bash            
 9876 root      20   0 66128  632  268 R   99  0.0   0:10.31 2 bash            
 9877 root      20   0 66128  632  268 R   99  0.0   0:10.79 3 bash            
 9871 root      20   0 66128  632  268 R   99  0.0   0:13.70 0 bash            
 9873 root      20   0 66128  632  268 R   99  0.0   0:13.68 1 bash            
 9874 root      20   0 66128  632  268 R   98  0.0   0:10.00 6 bash            
 9875 root      20   0 66128  632  268 R   92  0.0   0:11.22 4 bash            
 9878 root      20   0 66128  632  268 R   91  0.0   0:10.03 7 bash            
---
Also, this patch caused ping-pong load balances..

This patch is regards the sched_group as a idle sched_group
if local sched_group's cpu is CPU_IDLE.

But the state is not stable because active_load_balance() runs at this situation IIUC.


I'll investigate more.

===
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 5a5ea2c..806be90 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2418,6 +2418,7 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 	int i;
 	unsigned int balance_cpu = -1, first_idle_cpu = 0;
 	unsigned long avg_load_per_task = 0;
+	int idle_group = 0;
 
 	if (local_group)
 		balance_cpu = group_first_cpu(group);
@@ -2440,6 +2441,12 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 			}
 
 			load = target_load(i, load_idx);
+			/* This group is idle if it has a idle cpu. */
+			if (idle == CPU_IDLE) {
+				idle_group = 1;
+				sgs->group_load = 0;
+				sgs->sum_weighted_load = 0;
+			}
 		} else {
 			load = source_load(i, load_idx);
 			if (load > max_cpu_load)
@@ -2451,6 +2458,10 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 		sgs->group_load += load;
 		sgs->sum_nr_running += rq->nr_running;
 		sgs->sum_weighted_load += weighted_cpuload(i);
+		if (!idle_group) {
+			sgs->group_load += load;
+			sgs->sum_weighted_load += weighted_cpuload(i);
+		}
 
 	}
 
===


Thanks.
-- 
Masayuki Igawa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ