linux-kernel - Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160926121025.GC5016@twins.programming.kicks-ass.net>
Date:   Mon, 26 Sep 2016 14:10:25 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Christian Borntraeger <borntraeger@...ibm.com>
Cc:     Yuyang Du <yuyang.du@...el.com>, Ingo Molnar <mingo@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        vincent.guittot@...aro.org, Morten.Rasmussen@....com,
        dietmar.eggemann@....com, pjt@...gle.com, bsegall@...gle.com
Subject: Re: group scheduler regression since 4.3 (bisect 9d89c257d
 sched/fair: Rewrite runnable load and utilization average tracking)

On Mon, Sep 26, 2016 at 02:01:43PM +0200, Christian Borntraeger wrote:
> They applied ok on next from 9/13. Things go even worse.
> With this host configuration:
> 
> CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED ADDRESS
> 0   0    0    0      0    0:0:0:0         yes    yes        0
> 1   0    0    0      0    1:1:1:1         yes    yes        1
> 2   0    0    0      1    2:2:2:2         yes    yes        2
> 3   0    0    0      1    3:3:3:3         yes    yes        3
> 4   0    0    1      2    4:4:4:4         yes    yes        4
> 5   0    0    1      2    5:5:5:5         yes    yes        5
> 6   0    0    1      3    6:6:6:6         yes    yes        6
> 7   0    0    1      3    7:7:7:7         yes    yes        7
> 8   0    0    1      4    8:8:8:8         yes    yes        8
> 9   0    0    1      4    9:9:9:9         yes    yes        9
> 10  0    0    1      5    10:10:10:10     yes    yes        10
> 11  0    0    1      5    11:11:11:11     yes    yes        11
> 12  0    0    1      6    12:12:12:12     yes    yes        12
> 13  0    0    1      6    13:13:13:13     yes    yes        13
> 14  0    0    1      7    14:14:14:14     yes    yes        14
> 15  0    0    1      7    15:15:15:15     yes    yes        15
> 
> the guest was running either on 0-3 or on 4-15, but never
> used the full system. With group scheduling disabled everything was good
> again. So looks like that this bug has also some dependency on on the
> host topology.

OK, so CPU affinities that unevenly straddle topology boundaries like
that are hard (and is generally not recommended), but its not
immediately obvious why it would be so much worse with cgroups enabled.