linux-kernel - Re: [RFC] scheduler: improve SMP fairness in CFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070725120358.GA30755@elte.hu>
Date:	Wed, 25 Jul 2007 14:03:58 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Tong Li <tong.n.li@...el.com>
Cc:	linux-kernel@...r.kernel.org, Chris Snook <csnook@...hat.com>
Subject: Re: [RFC] scheduler: improve SMP fairness in CFS


* Ingo Molnar <mingo@...e.hu> wrote:

> > This patch extends CFS to achieve better fairness for SMPs. For 
> > example, with 10 tasks (same priority) on 8 CPUs, it enables each task 
> > to receive equal CPU time (80%). [...]
> 
> hm, CFS should already offer reasonable long-term SMP fairness. It 
> certainly works on a dual-core box, i just started 3 tasks of the same 
> priority on 2 CPUs, and on vanilla 2.6.23-rc1 the distribution is 
> this:
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  7084 mingo     20   0  1576  248  196 R   67  0.0   0:50.13 loop
>  7083 mingo     20   0  1576  244  196 R   66  0.0   0:48.86 loop
>  7085 mingo     20   0  1576  244  196 R   66  0.0   0:49.45 loop
> 
> so each task gets a perfect 66% of CPU time.
> 
> prior CFS, we indeed did a 50%/50%/100% split - so for example on 
> v2.6.22:
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2256 mingo     25   0  1580  248  196 R  100  0.0   1:03.19 loop
>  2255 mingo     25   0  1580  248  196 R   50  0.0   0:31.79 loop
>  2257 mingo     25   0  1580  248  196 R   50  0.0   0:31.69 loop
> 
> but CFS has changed that behavior.
> 
> I'll check your 10-tasks-on-8-cpus example on an 8-way box too, maybe 
> we regressed somewhere ...

ok, i just tried it on an 8-cpu box and indeed, unlike the dual-core 
case, the scheduler does not distribute tasks well enough:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2572 mingo     20   0  1576  244  196 R  100  0.0   1:03.61 loop
 2578 mingo     20   0  1576  248  196 R  100  0.0   1:03.59 loop
 2576 mingo     20   0  1576  248  196 R  100  0.0   1:03.52 loop
 2571 mingo     20   0  1576  244  196 R  100  0.0   1:03.46 loop
 2569 mingo     20   0  1576  244  196 R   99  0.0   1:03.36 loop
 2570 mingo     20   0  1576  244  196 R   95  0.0   1:00.55 loop
 2577 mingo     20   0  1576  248  196 R   50  0.0   0:31.88 loop
 2574 mingo     20   0  1576  248  196 R   50  0.0   0:31.87 loop
 2573 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop
 2575 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop

but this is relatively easy to fix - with the patch below applied, it 
looks a lot better:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2681 mingo     20   0  1576  244  196 R   85  0.0   3:51.68 loop
 2688 mingo     20   0  1576  244  196 R   81  0.0   3:46.35 loop
 2682 mingo     20   0  1576  244  196 R   80  0.0   3:43.68 loop
 2685 mingo     20   0  1576  248  196 R   80  0.0   3:45.97 loop
 2683 mingo     20   0  1576  248  196 R   80  0.0   3:40.25 loop
 2679 mingo     20   0  1576  244  196 R   80  0.0   3:33.53 loop
 2680 mingo     20   0  1576  244  196 R   79  0.0   3:43.53 loop
 2686 mingo     20   0  1576  244  196 R   79  0.0   3:39.31 loop
 2687 mingo     20   0  1576  244  196 R   78  0.0   3:33.31 loop
 2684 mingo     20   0  1576  244  196 R   77  0.0   3:27.52 loop

they now nicely converte to the expected 80% long-term CPU usage.

so, could you please try the patch below, does it work for you too?

	Ingo

--------------------------->
Subject: sched: increase SCHED_LOAD_SCALE_FUZZ
From: Ingo Molnar <mingo@...e.hu>

increase SCHED_LOAD_SCALE_FUZZ that adds a small amount of
over-balancing: to help distribute CPU-bound tasks more fairly on SMP
systems.

the problem of unfair balancing was noticed and reported by Tong N Li.

10 CPU-bound tasks running on 8 CPUs, v2.6.23-rc1:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2572 mingo     20   0  1576  244  196 R  100  0.0   1:03.61 loop
 2578 mingo     20   0  1576  248  196 R  100  0.0   1:03.59 loop
 2576 mingo     20   0  1576  248  196 R  100  0.0   1:03.52 loop
 2571 mingo     20   0  1576  244  196 R  100  0.0   1:03.46 loop
 2569 mingo     20   0  1576  244  196 R   99  0.0   1:03.36 loop
 2570 mingo     20   0  1576  244  196 R   95  0.0   1:00.55 loop
 2577 mingo     20   0  1576  248  196 R   50  0.0   0:31.88 loop
 2574 mingo     20   0  1576  248  196 R   50  0.0   0:31.87 loop
 2573 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop
 2575 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop

v2.6.23-rc1 + patch:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2681 mingo     20   0  1576  244  196 R   85  0.0   3:51.68 loop
 2688 mingo     20   0  1576  244  196 R   81  0.0   3:46.35 loop
 2682 mingo     20   0  1576  244  196 R   80  0.0   3:43.68 loop
 2685 mingo     20   0  1576  248  196 R   80  0.0   3:45.97 loop
 2683 mingo     20   0  1576  248  196 R   80  0.0   3:40.25 loop
 2679 mingo     20   0  1576  244  196 R   80  0.0   3:33.53 loop
 2680 mingo     20   0  1576  244  196 R   79  0.0   3:43.53 loop
 2686 mingo     20   0  1576  244  196 R   79  0.0   3:39.31 loop
 2687 mingo     20   0  1576  244  196 R   78  0.0   3:33.31 loop
 2684 mingo     20   0  1576  244  196 R   77  0.0   3:27.52 loop

so they now nicely converte to the expected 80% long-term CPU usage.

Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
 include/linux/sched.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -681,7 +681,7 @@ enum cpu_idle_type {
 #define SCHED_LOAD_SHIFT	10
 #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
 
-#define SCHED_LOAD_SCALE_FUZZ	(SCHED_LOAD_SCALE >> 5)
+#define SCHED_LOAD_SCALE_FUZZ	(SCHED_LOAD_SCALE >> 1)
 
 #ifdef CONFIG_SMP
 #define SD_LOAD_BALANCE		1	/* Do load balancing on this domain. */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/