linux-kernel - Re: [PATCH v4] sched: automated per session task groups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1291747889.2032.985.camel@laptop>
Date:	Tue, 07 Dec 2010 19:51:29 +0100
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Colin Walters <walters@...bum.org>, Ray Lee <ray-lk@...rabbit.org>,
	Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...e.hu>,
	Oleg Nesterov <oleg@...hat.com>,
	Markus Trippelsdorf <markus@...ppelsdorf.de>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4] sched: automated per session task groups

On Sun, 2010-12-05 at 12:47 -0800, Linus Torvalds wrote:
> Nice levels are _not_ about group scheduling. They're about
> priorities. And since the cgroup code doesn't even support priority
> levels for the groups, it's a really *horrible* match. 

It does in fact, nice maps to a weight, we then schedule so that each
entity (be it task or group) gets a proportional amount of time relative
to the other entities (of the same parent).

The scheduler basically solves the following differential equation:
  dt_i = w_i * dt / \Sum_j w_j

For tasks we map nice to weight like:

static const int prio_to_weight[40] = {
 /* -20 */     88761,     71755,     56483,     46273,     36291,
 /* -15 */     29154,     23254,     18705,     14949,     11916,
 /* -10 */      9548,      7620,      6100,      4904,      3906,
 /*  -5 */      3121,      2501,      1991,      1586,      1277,
 /*   0 */      1024,       820,       655,       526,       423,
 /*   5 */       335,       272,       215,       172,       137,
 /*  10 */       110,        87,        70,        56,        45,
 /*  15 */        36,        29,        23,        18,        15,
};

For groups we expose the weight directly in cgroupfs://cpu.shares with a
default equivalent to nice-0 (1024).

So 'nice make -j9' will run make and all its children with weight=110,
if this task hierarchy has ~9 runnable tasks it will get about as much
time as a single nice-0 competing task.

[ 9*110 = 990, 1*1024 = 1024, which gives: 49% vs 51% ]

Now group scheduling is in fact closely related to nice, the only thing
group scheduling does is:

  w_i = \unit * \Prod_j { w_i,j / \Sum_k w_k,j }, where:

     j \elem i and its parents
     k \elem entities of group j (where a task is a trivial group)

Where we compute a task's effective weight (w_i) by multiplying it with
the effective weight of their ancestors.

Suppose a grouped make -j9 against 1 competing task (all nice-0 or
equivalent), and make's 9 active children [a..i] in the group G:

        R
      /   \
     t     G
          / \
         a...i

So w_t = 1024, w_G = 1024 and w_[a..i] = 1024.

Now, per the above the effective weight (weight as in the root group) of
each grouped task is:

  w_[a..i] = 1024 * 1024/2048 * 1024/9216 ~= 56
  w_t      = 1024 * 1024/2048             = 512

[ \Sum w_[a..i] = 512, vs 512 gives: 50% vs 50% ]

So effectively: nice make -j9, and stuffing the make -j9 in a group are
roughly equivalent.

The only difference between groups and nice is the interface, with nice
you set the weight directly, with groups you set it implicitly,
depending on the runnable task state.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/