lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 07 Dec 2010 19:51:29 +0100
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Colin Walters <walters@...bum.org>, Ray Lee <ray-lk@...rabbit.org>,
	Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...e.hu>,
	Oleg Nesterov <oleg@...hat.com>,
	Markus Trippelsdorf <markus@...ppelsdorf.de>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4] sched: automated per session task groups

On Sun, 2010-12-05 at 12:47 -0800, Linus Torvalds wrote:
> Nice levels are _not_ about group scheduling. They're about
> priorities. And since the cgroup code doesn't even support priority
> levels for the groups, it's a really *horrible* match. 

It does in fact, nice maps to a weight, we then schedule so that each
entity (be it task or group) gets a proportional amount of time relative
to the other entities (of the same parent).

The scheduler basically solves the following differential equation:
  dt_i = w_i * dt / \Sum_j w_j


For tasks we map nice to weight like:

static const int prio_to_weight[40] = {
 /* -20 */     88761,     71755,     56483,     46273,     36291,
 /* -15 */     29154,     23254,     18705,     14949,     11916,
 /* -10 */      9548,      7620,      6100,      4904,      3906,
 /*  -5 */      3121,      2501,      1991,      1586,      1277,
 /*   0 */      1024,       820,       655,       526,       423,
 /*   5 */       335,       272,       215,       172,       137,
 /*  10 */       110,        87,        70,        56,        45,
 /*  15 */        36,        29,        23,        18,        15,
};

For groups we expose the weight directly in cgroupfs://cpu.shares with a
default equivalent to nice-0 (1024).

So 'nice make -j9' will run make and all its children with weight=110,
if this task hierarchy has ~9 runnable tasks it will get about as much
time as a single nice-0 competing task.

[ 9*110 = 990, 1*1024 = 1024, which gives: 49% vs 51% ]


Now group scheduling is in fact closely related to nice, the only thing
group scheduling does is:

  w_i = \unit * \Prod_j { w_i,j / \Sum_k w_k,j }, where:

     j \elem i and its parents
     k \elem entities of group j (where a task is a trivial group)

Where we compute a task's effective weight (w_i) by multiplying it with
the effective weight of their ancestors.

Suppose a grouped make -j9 against 1 competing task (all nice-0 or
equivalent), and make's 9 active children [a..i] in the group G:


        R
      /   \
     t     G
          / \
         a...i

So w_t = 1024, w_G = 1024 and w_[a..i] = 1024.

Now, per the above the effective weight (weight as in the root group) of
each grouped task is:

  w_[a..i] = 1024 * 1024/2048 * 1024/9216 ~= 56
  w_t      = 1024 * 1024/2048             = 512

[ \Sum w_[a..i] = 512, vs 512 gives: 50% vs 50% ]

So effectively: nice make -j9, and stuffing the make -j9 in a group are
roughly equivalent.

The only difference between groups and nice is the interface, with nice
you set the weight directly, with groups you set it implicitly,
depending on the runnable task state.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ