linux-kernel - Re: [PATCH 2/2] sched/fair: Always propagate runnable_load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtAasz-YAUGRKciY_Pab13n+2QGU6-xC=j7ef+jvqGwFtQ@mail.gmail.com>
Date:   Thu, 27 Apr 2017 10:29:10 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mike Galbraith <efault@....de>, Paul Turner <pjt@...gle.com>,
        Chris Mason <clm@...com>, kernel-team@...com
Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg

On 27 April 2017 at 00:52, Tejun Heo <tj@...nel.org> wrote:
> Hello,
>
> On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
>> On 24 April 2017 at 22:14, Tejun Heo <tj@...nel.org> wrote:
>> Can the problem be on the load balance side instead ?  and more
>> precisely in the wakeup path ?
>> After looking at the trace, it seems that task placement happens at
>> wake up path and if it fails to select the right idle cpu at wake up,
>> you will have to wait for a load balance which is alreayd too late
>
> Oh, I was tracing most of scheduler activities and the ratios of
> wakeups picking idle CPUs were about the same regardless of cgroup
> membership.  I can confidently say that the latency issue that I'm
> seeing is from load balancer picking the wrong busiest CPU, which is
> not to say that there can be other problems.

ok. Is there any trace that you can share ? your behavior seems
different of mine

>
>> > another queued wouldn't report the correspondingly higher
>>
>> It will as load_avg includes the runnable_load_avg so whatever load is
>> in runnable_load_avg will be in load_avg too. But at the contrary,
>> runnable_load_avg will not have the blocked that is going to wake up
>> soon in the case of schbench
>
> Decaying contribution of blocked tasks don't affect the busiest CPU
> selection.  Without cgroup, runnable_load_avg is immediately increased
> and decreased as tasks enter and leave the queue and otherwise we end
> up with CPUs which are idle when there are threads queued on different
> CPUs accumulating scheduling latencies.
>
> The patch doesn't change how the busiest CPU is picked.  It already
> uses runnable_load_avg.  The change that cgroup causes is that it
> blocks updates to runnable_load_avg from newly scheduled or sleeping
> tasks.
>
> The issue isn't about whether runnable_load_avg or load_avg should be
> used but the unexpected differences in the metrics that the load

I think that's the root of the problem. I explain a bit more my view
on the other thread

> balancer uses depending on whether cgroup is used or not.
>
>> One last thing, the load_avg of an idle CPU can stay blocked for a
>> while (until a load balance happens that will update blocked load) and
>> can be seen has "busy" whereas it is not. Could it be a reason of your
>> problem ?
>
> AFAICS, the load balancer doesn't use load_avg.
>
> Thanks.
>
> --
> tejun