lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170426225202.GC11348@wtj.duckdns.org>
Date:   Wed, 26 Apr 2017 15:52:02 -0700
From:   Tejun Heo <tj@...nel.org>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mike Galbraith <efault@....de>, Paul Turner <pjt@...gle.com>,
        Chris Mason <clm@...com>, kernel-team@...com
Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg

Hello,

On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
> On 24 April 2017 at 22:14, Tejun Heo <tj@...nel.org> wrote:
> Can the problem be on the load balance side instead ?  and more
> precisely in the wakeup path ?
> After looking at the trace, it seems that task placement happens at
> wake up path and if it fails to select the right idle cpu at wake up,
> you will have to wait for a load balance which is alreayd too late

Oh, I was tracing most of scheduler activities and the ratios of
wakeups picking idle CPUs were about the same regardless of cgroup
membership.  I can confidently say that the latency issue that I'm
seeing is from load balancer picking the wrong busiest CPU, which is
not to say that there can be other problems.

> > another queued wouldn't report the correspondingly higher
> 
> It will as load_avg includes the runnable_load_avg so whatever load is
> in runnable_load_avg will be in load_avg too. But at the contrary,
> runnable_load_avg will not have the blocked that is going to wake up
> soon in the case of schbench

Decaying contribution of blocked tasks don't affect the busiest CPU
selection.  Without cgroup, runnable_load_avg is immediately increased
and decreased as tasks enter and leave the queue and otherwise we end
up with CPUs which are idle when there are threads queued on different
CPUs accumulating scheduling latencies.

The patch doesn't change how the busiest CPU is picked.  It already
uses runnable_load_avg.  The change that cgroup causes is that it
blocks updates to runnable_load_avg from newly scheduled or sleeping
tasks.

The issue isn't about whether runnable_load_avg or load_avg should be
used but the unexpected differences in the metrics that the load
balancer uses depending on whether cgroup is used or not.

> One last thing, the load_avg of an idle CPU can stay blocked for a
> while (until a load balance happens that will update blocked load) and
> can be seen has "busy" whereas it is not. Could it be a reason of your
> problem ?

AFAICS, the load balancer doesn't use load_avg.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ