lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161018120744.GZ3142@twins.programming.kicks-ass.net>
Date:   Tue, 18 Oct 2016 14:07:44 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Joseph Salisbury <joseph.salisbury@...onical.com>,
        Ingo Molnar <mingo@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Mike Galbraith <efault@....de>, omer.akram@...onical.com
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

On Tue, Oct 18, 2016 at 12:15:11PM +0100, Dietmar Eggemann wrote:
> On 18/10/16 10:07, Peter Zijlstra wrote:
> > On Mon, Oct 17, 2016 at 11:52:39PM +0100, Dietmar Eggemann wrote:

> > On IRC you mentioned that adding list_add_leaf_cfs_rq() to
> > online_fair_sched_group() cures this, this would actually match with
> > unregister_fair_sched_group() doing list_del_leaf_cfs_rq() and avoid
> > a few instructions on the enqueue path, so that's all good.
> 
> Yes, I was able to recreate a similar problem (not related to the cpu
> masks) on ARM64 (6 logical cpus). I created 100 2. level tg's but only
> put one task (no cpu affinity, so it could run on multiple cpus) in one
> of these tg's (mainly to see the related cfs_rq's in /proc/sched_debug).
> 
> I get a remaining .tg_load_avg : 49898 for cfs_rq[x]:/tg_1

Ah, and since all those CPUs are online, we decay all that load away. OK
makes sense now.

>  > I'm just not immediately seeing how that cures things. The only relevant
> > user of the leaf_cfs_rq list seems to be update_blocked_averages() which
> > is called from the balance code (idle_balance() and
> > rebalance_domains()). But neither should call that for offline (or
> > !present) CPUs.
> 
> Assuming this is load from the 99 2. level tg's which never had a task
> running, putting list_add_leaf_cfs_rq() into online_fair_sched_group()
> for all cpus makes sure that all the 'blocked load' get's decayed.
> 
> Doing what Vincent just suggested, not initializing tg se's w/ 1024 but
> w/ 0 instead prevents this from being necessary.

Indeed. I just worry about the cases where we do no propagate the load
up, eg. the stuff fixed by:

  1476695653-12309-5-git-send-email-vincent.guittot@...aro.org

If we hit an intermediary cgroup with 0 load, we might get some
interactivity issues.

But it could be I got lost again :-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ