linux-kernel - Re: [patch] CFS scheduler, -v12

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070521085703.GA18755@elte.hu>
Date:	Mon, 21 May 2007 10:57:03 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	William Lee Irwin III <wli@...omorphy.com>
Cc:	Dmitry Adamushko <dmitry.adamushko@...il.com>,
	Peter Williams <pwil3058@...pond.net.au>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [patch] CFS scheduler, -v12


* William Lee Irwin III <wli@...omorphy.com> wrote:

> cfs should probably consider aggregate lag as opposed to aggregate 
> weighted load. Mainline's convergence to proper CPU bandwidth 
> distributions on SMP (e.g. N+1 tasks of equal nice on N cpus) is 
> incredibly slow and probably also fragile in the presence of arrivals 
> and departures partly because of this. [...]

hm, have you actually tested CFS before coming to this conclusion?

CFS is fair even on SMP. Consider for example the worst-case 
3-tasks-on-2-CPUs workload on a 2-CPU box:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2658 mingo     20   0  1580  248  200 R   67  0.0   0:56.30 loop
 2656 mingo     20   0  1580  252  200 R   66  0.0   0:55.55 loop
 2657 mingo     20   0  1576  248  200 R   66  0.0   0:55.24 loop

66% of CPU time for each task. The 'TIME+' column shows a 2% spread 
between the slowest and the fastest loop after just 1 minute of runtime 
(and the spread gets narrower with time). Mainline does a 50% / 50% / 
100% split:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3121 mingo     25   0  1584  252  204 R  100  0.0   0:13.11 loop
 3120 mingo     25   0  1584  256  204 R   50  0.0   0:06.68 loop
 3119 mingo     25   0  1584  252  204 R   50  0.0   0:06.64 loop

and i fixed that in CFS.

or consider a sleepy workload like massive_intr, 3-tasks-on-2-CPUs:

  europe:~> head -1 /proc/interrupts
             CPU0       CPU1

  europe:~> ./massive_intr 3 10
  002623  00000722
  002621  00000720
  002622  00000721

Or a 5-tasks-on-2-CPS workload:

  europe:~> ./massive_intr 5 50
  002649  00002519
  002653  00002492
  002651  00002478
  002652  00002510
  002650  00002478

that's around 1% of spread.

load-balancing is a performance vs. fairness tradeoff so we wont be able 
to make it precisely fair because that's hideously expensive on SMP 
(barring someone showing a working patch of course) - but in CFS i got 
quite close to having it very fair in practice.

> [...] Tong Li's DWRR repairs the deficit in mainline by synchronizing 
> epochs or otherwise bounding epoch dispersion. This doesn't directly 
> translate to cfs. In cfs cpu should probably try to figure out if its 
> aggregate lag (e.g. via minimax) is above or below average, and push 
> to or pull from the other half accordingly.

i'd first like to see a demonstration of a problem to solve, before 
thinking about more complex solutions ;-)

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/