linux-kernel - [RFC] Splitting scheduler into two halves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <0DA73B5D686AEC4AAEF6054BE04DA1CD116C50EA@SHSMSX102.ccr.corp.intel.com>
Date:	Fri, 28 Feb 2014 02:13:32 +0000
From:	"Du, Yuyang" <yuyang.du@...el.com>
To:	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	"Van De Ven, Arjan" <arjan.van.de.ven@...el.com>,
	"Brown, Len" <len.brown@...el.com>,
	"Wysocki, Rafael J" <rafael.j.wysocki@...el.com>
Subject: [RFC] Splitting scheduler into two halves

Hi Peter/Ingo and all,

With the advent of more cores and heterogeneous architectures, the scheduler is required to be more complex (power efficiency) and diverse (big.little). For the scheduler to address that challenge as a whole, it is costly but not necessary. This proposal argues that the scheduler be spitted into two parts: top half (task scheduling) and bottom half (load balance). Let the bottom half take charge of the incoming requirements.

The two halves are rather orthogonal in functionality. The task scheduling (top half) seeks for *ONE* CPU to execute running tasks fairly (priority included), while the load balance (bottom half) aims for *ALL* CPUs to maximize the throughput of the computing power. The goal of task scheduling is pretty unique and clear, and CFS and RT in that part are exactly approaching the goal. The load balance, however, is constrained to meet more goals, to name a few, performance (throughput/responsiveness), power consumption, architecture differences, etc. Those things are often hard to achieve because they may conflict and are difficult to estimate and plan. So, shall we declare the independence of the two, give them freedom to pursue their own "happiness".

We take an incremental development method. As a starting point, we did three things (but did not change one single line of real-work code):
	1)	Remove load balance from fair.c into load_balance.c (~3000 lines of codes). As a result, fair.c/rt.c and load_balance.c have very little intersection.
	2)	Define struct sched_lb_class that consists of the following members to umbrella the load balance entry points.
		a.	const struct sched_lb_class *next;
		b.	int (*fork_balance) (struct task_struct *p, int sd_flags, int wake_flags);
		c.	int (*exec_balance) (struct task_struct *p, int sd_flags, int wake_flags);
		d.	int (*wakeup_balance) (struct task_struct *p, int sd_flags, int wake_flags);
		e.	void (*idle_balance) (int this_cpu, struct rq *this_rq);
		f.	void (*periodic_rebalance) (int cpu, enum cpu_idle_type idle);
		g.	void (*nohz_idle_balance) (int this_cpu, enum cpu_idle_type idle);
		h.	void (*start_periodic_balance) (struct rq *rq, int cpu);
		i.	void (*check_nohz_idle_balance) (struct rq *rq, int cpu);
	3)	Insert another layer of indirection to wrap the implemented functions in sched_lb_class. Implement a default load balance class that is just the previous load balance.

The next to do is to continue redesigning and refactoring to make life easier toward more powerful and diverse load balance. And more importantly, this RFC solicits a discussion to get early feedback on the big proposed change.

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/