linux-kernel - Re: [PATCH v2] sched/fair: fix imbalance due to CPU affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtDE6OuTcjOROoxd_KSLERsFEowjxGodL9J+64bgUscocA@mail.gmail.com>
Date:   Fri, 5 Jul 2019 14:23:18 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2] sched/fair: fix imbalance due to CPU affinity

On Tue, 2 Jul 2019 at 16:29, Valentin Schneider
<valentin.schneider@....com> wrote:
>
>
>
> On 02/07/2019 11:00, Vincent Guittot wrote:
> >> Does that want a
> >>
> >> Cc: stable@...r.kernel.org
> >> Fixes: afdeee0510db ("sched: Fix imbalance flag reset")
> >
> > I was not sure that this has been introduced by this patch or
> > following changes. I haven't been able to test it on such old kernel
> > with my platform
> >
>
> Right, seems like
>
>   65a4433aebe3 ("sched/fair: Fix load_balance() affinity redo path")
>
> also played in this area. From surface level it looks like it only reduced
> the amount of CPUs the load_balance() redo can use (and interestingly it
> mentions the exact same bug as you observed, through triggered slightly
> differently).
>
> I'd be inclined to say that the issue was introduced by afdeee0510db, since
> from looking at the code from that time I can see the issue happening:

I agree that the patch seems to be the root cause when reading code.
But it also means that the bug is there for almost  5 years and has
never been seen before I did some functional tests on my rework of the
load balance
That's why a real test would have confirmed that nothing else happens
in the meantime

>
> - try to pull from a CPU with only tasks pinned to itself
> - set sgc->imbalance
> - redo with a CPU that sees no big imbalance
> - goto out_balanced
> - env.LBF_ALL_PINNED is still set but we clear sgc->imbalance
>
> >>
> >> ?
> >>