linux-kernel - Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtDLLzt2s0_p2uw3dt8MRH0o8_Tus6P5Uze+6gBHFm-rVg@mail.gmail.com>
Date:   Thu, 13 Oct 2016 12:58:59 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Joseph Salisbury <joseph.salisbury@...onical.com>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Mike Galbraith <efault@....de>
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

Hi,

On 12 October 2016 at 18:21, Joseph Salisbury
<joseph.salisbury@...onical.com> wrote:
> On 10/12/2016 08:20 AM, Vincent Guittot wrote:
>> On 8 October 2016 at 13:49, Mike Galbraith <efault@....de> wrote:
>>> On Sat, 2016-10-08 at 13:37 +0200, Vincent Guittot wrote:
>>>> On 8 October 2016 at 10:39, Ingo Molnar <mingo@...nel.org> wrote:
>>>>> * Peter Zijlstra <peterz@...radead.org> wrote:
>>>>>
>>>>>> On Fri, Oct 07, 2016 at 03:38:23PM -0400, Joseph Salisbury wrote:
>>>>>>> Hello Peter,
>>>>>>>
>>>>>>> A kernel bug report was opened against Ubuntu [0].  After a
>>>>>>> kernel
>>>>>>> bisect, it was found that reverting the following commit
>>>>>>> resolved this bug:
>>>>>>>
>>>>>>> commit 3d30544f02120b884bba2a9466c87dba980e3be5
>>>>>>> Author: Peter Zijlstra <peterz@...radead.org>
>>>>>>> Date:   Tue Jun 21 14:27:50 2016 +0200
>>>>>>>
>>>>>>>     sched/fair: Apply more PELT fixes
>>>> This patch only speeds up the update of task group load in order to
>>>> reflect the new load balance but It should not change the final value
>>>> and as a result the final behavior. I will try to reproduce it in my
>>>> target later today
>>> FWIW, I tried and failed w/wo autogroup on 4.8 and master.
>> Me too
>>
>> Is it possible to get some dump of  /proc/sched_debug while the problem occurs ?
>>
>> Vincent
>>
>>>         -Mike
>
> The output from /proc/shed_debug can be seen here:
> http://paste.ubuntu.com/23312351/

I have looked at the dump and there is something very odd for
system.slice task group where the display manager is running.
system.slice->tg_load_avg is around 381697 but  tg_load_avg is
normally equal to Sum of system.slice[cpu]->tg_load_avg_contrib
whereas Sum of system.slice[cpu]->tg_load_avg_contrib = 1013 in our
case. We can have some differences because the dump of
/proc/shed_debug is not atomic and some changes can happen but nothing
like this difference.

The main effect of this quite high value is that the weight/prio of
the sched_entity that represents system.slice in root cfs_rq is very
low (lower than task with the smallest nice prio) so the system.slice
task group will not get the CPU quite often compared to the user.slice
task group: less than 1% for the system.slice where lightDM and xorg
are running compared 99% for the user.slice where the stress tasks are
running. This is confirmed by the se->avg.util_avg value of the task
groups which reflect how much time each task group is effectively
running on a CPU:
system.slice[CPU3].se->avg.util_avg = 8 whereas
user.slice[CPU3].se->avg.util_avg = 991

This difference of weight/priority explains why the system becomes
unresponsive. For now, I can't explain is why
system.slice->tg_load_avg = 381697 whereas is should be around 1013
and how the patch can generate this situation.

Is it possible to have a dump of /proc/sched_debug before starting
stress command ? to check if the problem is there from the beginning
but not seen because not overloaded. Or if it the problem comes when
user starts to load the system

Thanks,

>
> Ingo, the latest scheduler bits also still exhibit the bug:
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>
>