linux-kernel - Re: RFC: documentation of the autogroup feature [v2]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161125160456.GP3092@twins.programming.kicks-ass.net>
Date:   Fri, 25 Nov 2016 17:04:56 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
Cc:     Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...nel.org>,
        linux-man <linux-man@...r.kernel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: RFC: documentation of the autogroup feature [v2]

On Fri, Nov 25, 2016 at 04:04:25PM +0100, Michael Kerrisk (man-pages) wrote:
> >>        ┌─────────────────────────────────────────────────────┐
> >>        │FIXME                                                │
> >>        ├─────────────────────────────────────────────────────┤
> >>        │How do the nice value of  a  process  and  the  nice │
> >>        │value of an autogroup interact? Which has priority?  │
> >>        │                                                     │
> >>        │It  *appears*  that the autogroup nice value is used │
> >>        │for CPU distribution between task groups,  and  that │
> >>        │the  process nice value has no effect there.  (I.e., │
> >>        │suppose two  autogroups  each  contain  a  CPU-bound │
> >>        │process,  with  one  process  having nice==0 and the │
> >>        │other having nice==19.  It appears  that  they  each │
> >>        │get  50%  of  the CPU.)  It appears that the process │
> >>        │nice value has effect only with respect to  schedul‐ │
> >>        │ing  relative to other processes in the *same* auto‐ │
> >>        │group.  Is this correct?                             │
> >>        └─────────────────────────────────────────────────────┘
> > 
> > Yup, entity nice level affects distribution among peer entities.
> 
> Huh! I only just learned about this via my experiments while
> investigating autogroups. 
> 
> How long have things been like this? Always? (I don't think
> so.) Since the arrival of CFS? Since the arrival of
> autogrouping? (I'm guessing not.) Since some other point?
> (When?)

Ever since cfs-cgroup, this is a fundamental design point of cgroups,
and has therefore always been the case for autogroups (as that is
nothing more than an application of the cgroup code).

> It seems to me that this renders the traditional process
> nice pretty much useless. (I bet I'm not the only one who'd 
> be surprised by the current behavior.)

Its really rather fundamental to how the whole hierarchical things
works.

CFS is a weighted fair queueing scheduler; this means each entity
receives:

               w_i
  dt_i = dt --------
	    \Sum w_j


		CPU
	  ______/ \______
	 /    |     |	 \
        A     B     C     D


So if each entity {A,B,C,D} has equal weight, then they will receive
equal time. Explicitly, for C you get:


                      w_C
  dt_C = dt -----------------------
            (w_A + w_B + w_C + w_D)


Extending this to a hierarchy, we get:


		CPU
	  ______/ \______
	 /    |     |	 \
        A     B     C     D
	           / \
		  E   F

Where C becomes a 'server' for entities {E,F}. The weight of C does not
depend on its child entities. This way the time of {E,F} becomes a
straight product of their ratio with C. That is; the whole thing
becomes, where l denotes the level in the hierarchy and i an
entity on that level:

                 l      w_g,i
  dt_l,i = dt \Prod  ----------
                g=0  \Sum w_g,j


Or more concretely, for E:

                      w_E
  dt_1,E = dt_0,C -----------
                  (w_E + w_F)

                        w_C               w_E
         = dt ----------------------- -----------
              (w_A + w_B + w_C + w_D) (w_E + w_F)


And this 'trivially' extends to SMP, with the tricky bit being that the
sums over all entities end up being machine wide, instead of per CPU,
which is a real and royal pain for performance.


Note that this property, where the weight of the server entity is
independent from its child entities is a desired feature. Without that
it would be impossible to control the relative weights of groups, and
that is the sole parameter of the WFQ model.

It is also why Linus so likes autogroups, each session competes equally
amongst one another.