linux-kernel - Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <4c4e8838-9a6a-62b9-a8b7-48e4d375604e@de.ibm.com>
Date:   Mon, 26 Sep 2016 14:01:43 +0200
From:   Christian Borntraeger <borntraeger@...ibm.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Yuyang Du <yuyang.du@...el.com>, Ingo Molnar <mingo@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        vincent.guittot@...aro.org, Morten.Rasmussen@....com,
        dietmar.eggemann@....com, pjt@...gle.com, bsegall@...gle.com
Subject: Re: group scheduler regression since 4.3 (bisect 9d89c257d
 sched/fair: Rewrite runnable load and utilization average tracking)

On 09/26/2016 01:53 PM, Peter Zijlstra wrote:
> On Mon, Sep 26, 2016 at 01:42:05PM +0200, Christian Borntraeger wrote:
>> On 09/26/2016 12:56 PM, Peter Zijlstra wrote:
> 
>>> One of the differences in the old and new thing is being addressed by
>>> these patches:
>>>
>>>   https://lkml.kernel.org/r/1473666472-13749-1-git-send-email-vincent.guittot@linaro.org
>>>
>>> Could you see if those patches make a difference? If not, we'll have to
>>> go poke elsewhere ofcourse ;-)
>>
>> Those patches do not apply cleanly on v4.7, linux/master or next/master.
>> Is there a good branch to test these patches?
> 
> They seemed to apply for me on tip/sched/core, I pushed out a branch for
> you that has them on.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/propagate
> 
> I didn't boot the result though; but they applied without issue.

They applied ok on next from 9/13. Things go even worse.
With this host configuration:

CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED ADDRESS
0   0    0    0      0    0:0:0:0         yes    yes        0
1   0    0    0      0    1:1:1:1         yes    yes        1
2   0    0    0      1    2:2:2:2         yes    yes        2
3   0    0    0      1    3:3:3:3         yes    yes        3
4   0    0    1      2    4:4:4:4         yes    yes        4
5   0    0    1      2    5:5:5:5         yes    yes        5
6   0    0    1      3    6:6:6:6         yes    yes        6
7   0    0    1      3    7:7:7:7         yes    yes        7
8   0    0    1      4    8:8:8:8         yes    yes        8
9   0    0    1      4    9:9:9:9         yes    yes        9
10  0    0    1      5    10:10:10:10     yes    yes        10
11  0    0    1      5    11:11:11:11     yes    yes        11
12  0    0    1      6    12:12:12:12     yes    yes        12
13  0    0    1      6    13:13:13:13     yes    yes        13
14  0    0    1      7    14:14:14:14     yes    yes        14
15  0    0    1      7    15:15:15:15     yes    yes        15

the guest was running either on 0-3 or on 4-15, but never
used the full system. With group scheduling disabled everything was good
again. So looks like that this bug has also some dependency on on the
host topology.

Christian