lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070711164708.48907792@frecb000686.frec.bull.fr>
Date:	Wed, 11 Jul 2007 16:47:08 +0200
From:	Sébastien Dugué <sebastien.dugue@...l.net>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	linux-kernel <linux-kernel@...r.kernel.org>,
	Linux RT Users <linux-rt-users@...r.kernel.org>,
	Darren Hart <dvhltc@...ibm.com>,
	john stultz <johnstul@...ibm.com>,
	Jean Pierre Dion <jean-pierre.dion@...l.net>,
	Gilles Carry <Gilles.Carry@....bull.net>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: [Patch RT] Fix CFS load balancing for RT tasks


  Hi Ingo, all

  there seems to be something wrong with the way the CFS balances (or does not
balance) RT tasks. This was evidenced using the sched_football testcase
available from the RT wiki (http://rt.wiki.kernel.org/index.php/IBM_Test_Cases)
which I modified and attached to this mail.

  The testcase starts a number of threads which fall into 3 categories:

	1 referee thread: SCHED_FIFO, RT prio 5
	ncpus defensive threads: SCHED_FIFO, RT prio 4
	ncpus offensive threads: SCHED_FIFO, RT prio 3

	(ncpus being the number of CPUs)

  To make a long story short, the defensive threads should end up distributed
among all CPUs, but that's not the case. For example, on a dual HT Xeon box,
after task migration stabilizes we have the following running on the different
CPUs:

  CPU 0: defense2
  CPU 1: referee offense2 offense3 offense4 defense3
  CPU 2: offense1
  CPU 3: defense1 defense4

which clearly show the imbalance between CPU 2 and CPU 3 where offense1
should not be allowed to run while the higher prio defense1 and defense4
are sharing the same CPU.

  The following patch fixes this by re-enabling the RT overload detection
for the CFS. It may not be the right solution, maybe it should be incorporated
into the other load balancing mechanisms. I did not digg deep enough yet
to make that call ;-)

  P.S. Thanks to Steven Rostedt for logdev which is proving invaluable in
       cases like this.

  Sébastien.


------------------

  The RT overload mechanism of the O(1) scheduler has not been activated
in the new CFS.

  This patch fixes that by inserting calls to inc_rt_tasks() and dec_rt_tasks()
in enqueue_task_rt() and dequeue_task_rt() respectively, which enables the
balance_rt_tasks() to be run in the rt_overload case.


Signed-off-by: Sébastien Dugué <sebastien.dugue@...l.net>

---
 kernel/sched_rt.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6.21.5-rt20/kernel/sched_rt.c
===================================================================
--- linux-2.6.21.5-rt20.orig/kernel/sched_rt.c	2007-07-11 10:46:26.000000000 +0200
+++ linux-2.6.21.5-rt20/kernel/sched_rt.c	2007-07-11 10:46:50.000000000 +0200
@@ -32,6 +32,8 @@ enqueue_task_rt(struct rq *rq, struct ta
 
 	list_add_tail(&p->run_list, array->queue + p->prio);
 	__set_bit(p->prio, array->bitmap);
+
+	inc_rt_tasks(p, rq);
 }
 
 /*
@@ -44,6 +46,8 @@ dequeue_task_rt(struct rq *rq, struct ta
 
 	update_curr_rt(rq, now);
 
+	dec_rt_tasks(p, rq);
+
 	list_del(&p->run_list);
 	if (list_empty(array->queue + p->prio))
 		__clear_bit(p->prio, array->bitmap);

View attachment "sched_football.c" of type "text/x-csrc" (5452 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ