linux-kernel - Re: [PATCH 1/1] sched/fair: Fix unfairness caused by missing load decay

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFpoUr1D5URSODXKaOOHaBbmkNiNNbsCL1WCcEAWrdhJ8gXs8w@mail.gmail.com>
Date:   Sat, 1 May 2021 16:33:49 +0200
From:   Odin Ugedal <odin@...dal.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Odin Ugedal <odin@...d.al>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        "open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/1] sched/fair: Fix unfairness caused by missing load decay

ons. 28. apr. 2021 kl. 17:36 skrev Vincent Guittot <vincent.guittot@...aro.org>:
> You can keep both fixes tags

ACK

> If the cfs_rq is already in the list list_add_leaf_cfs_rq() will exit
> early but if it's not, we don't have to make sure that the whole
> branch in the list

Yeah, thats right. Calling list_add_leaf_cfs_rq once "too much" doesnt
hurt after all.

> In fact, we can break as soon as list_add_leaf_cfs_rq() and
> cfs_rq_throttled() return true

ACK, that makes sense.

> When a cfs_rq is throttled, it is not accounted in its parent anymore
> so we don't have to update and propagate the load down.

Okay. Still need to wrap my head around this a bit more I guess. I
have looked a bit around, and there
is actually a similar issue as "this one" for the case when a
throttled cgroup is "moved" via cpuset. It is however waaay
harder to reproduce, but it is doable, and it _will_ happen in real
life systems if the timing is "correct". I will dig deeper
and finish the patch for that case some time next week (hopefully). I
think that however deserve a separate patchset,
so I will come back with that later.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 33b1ee31ae0f..18441ce7316c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11026,10 +11026,10 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
>         for_each_sched_entity(se) {
>                 cfs_rq = cfs_rq_of(se);
>
> -               if (cfs_rq_throttled(cfs_rq))
> -                       break;
> +               if (!cfs_rq_throttled(cfs_rq))
> +                       update_load_avg(cfs_rq, se, UPDATE_TG);
>
> -               update_load_avg(cfs_rq, se, UPDATE_TG);
> +               list_add_leaf_cfs_rq(cfs_rq);
>         }
> }

Sent a v2 with something like this now; that exit if
(list_add_leaf_cfs_rq(cfs_rq) && throttled). Since this loop start at
the parent of
the cfs_rq of the supplied se, I added a list_add_leaf_cfs_rq to the
top in order to insert the leaf cfs_rq as well.

Thanks
Odin