linux-kernel - scheduler scalability - cgroups, cpusets and load-balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1201600428.28547.87.camel@lappy>
Date:	Tue, 29 Jan 2008 10:53:48 +0100
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	linux-kernel <linux-kernel@...r.kernel.org>
Cc:	Ingo Molnar <mingo@...e.hu>, vatsa <vatsa@...ux.vnet.ibm.com>,
	Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
	Paul Jackson <pj@....com>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Steve Grubb <sgrubb@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Gregory Haskins <ghaskins@...ell.com>,
	Dmitry Adamushko <dmitry.adamushko@...il.com>,
	"Li, Tong N" <tong.n.li@...el.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Paul Menage <menage@...gle.com>,
	David Rientjes <rientjes@...gle.com>
Subject: scheduler scalability - cgroups, cpusets and load-balancing

Hi All,

Some of the fancy new scheduler features such as the cgroup load
balancer (load_balance_monitor) and the real-time load balancer are a
bit of an scalability issue. They all seem to want a rather strong
global bound to keep a global fairness (which is quite understandable).

[ my own interest is currently real-time group scheduling on multiple
  cpus, and that seems to require _very_ strong bonds ]

I think the current stuff would scale up to 8 maybe 16 cpus, but after
that I'd be real worried.

Now we want distributions to enable most of these features. Distros seem
to want containers, but distros also need to support 128+ cpu machines,
so how are we going to solve this.

My thoughts were to make stronger use of disjoint cpu-sets. cgroups and
cpusets are related, in that cpusets provide a property to a cgroup.
However, load_balance_monitor()'s interaction with sched domains
confuses me - it might DTRT, but I can't tell.

[ It looks to me it balances a group over the largest SD the current cpu
  has access to, even though that might be larger than the SD associated
  with the cpuset of that particular cgroup. ]

Also the RT load-balance needs to become aware of such these sets, I
think Paul J and Steven once talked about it, but can't quite remember
where that ended. From my POV there should be sched-domain based balance
information, not global.

By cutting the problem into smaller pieces, and adding tunables to
weaken to global fairness, I think we can give administrators enough
freedom to make use of these features, even on the largest of machines.

[ so I'd move the load_balance_monitor() tunables into cpusets as well,
  I can imagine a smaller cpuset wanting a stronger fairness than a much
  larger cpuset. ]

I understand its a somewhat hand-wavey email, but I wanted to start
discussion on the issue, or have someone show me I'm wrong and can stop
worrying :-).

Peter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/