linux-kernel - Re: [PATCH] sched/fair: fix runnable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtC9bkMQJsWw6Z2QD0RrV=qN7yMFviVnSeTpDp=-vLBL0g@mail.gmail.com>
Date:   Thu, 27 Feb 2020 14:10:47 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Ben Segall <bsegall@...gle.com>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Phil Auld <pauld@...hat.com>, Parth Shah <parth@...ux.ibm.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Hillf Danton <hdanton@...a.com>, zhout@...aldi.net
Subject: Re: [PATCH] sched/fair: fix runnable_avg for throttled cfs

On Thu, 27 Feb 2020 at 12:20, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>
> On 26.02.20 21:01, Vincent Guittot wrote:
> > On Wed, 26 Feb 2020 at 20:04, <bsegall@...gle.com> wrote:
> >>
> >> Vincent Guittot <vincent.guittot@...aro.org> writes:
> >>
> >>> When a cfs_rq is throttled, its group entity is dequeued and its running
> >>> tasks are removed. We must update runnable_avg with current h_nr_running
> >>> and update group_se->runnable_weight with new h_nr_running at each level
>
>                                               ^^^
>
> Shouldn't this be 'current' rather 'new' h_nr_running for
> group_se->runnable_weight? IMHO, you want to cache the current value
> before you add/subtract task_delta.

hmm... it can't be current in both places. In my explanation,
"current" means the current situation when we started to throttle cfs
and "new" means the new situation after we finished to throttle the
cfs. I should probably use old and new to prevent any
misunderstanding.

That being said, we need to update runnable_avg with the old
h_nr_running: the one before we started to throttle the cfs which is
the value saved in group_se->runnable_weight. Once we have updated
runnable_avg, we save the new h_nr_running in
group_se->runnable_weight that will be used for next updates.

>
> >>> of the hierarchy.
> >>
> >> You'll also need to do this for task enqueue/dequeue inside of a
> >> throttled hierarchy, I'm pretty sure.
> >
> > AFAICT, this is already done with patch "sched/pelt: Add a new
> > runnable average signal" when task is enqueued/dequeued inside a
> > throttled hierarchy
> >
> >>
> >>>
> >>> Fixes: 9f68395333ad ("sched/pelt: Add a new runnable average signal")
> >>> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> >>> ---
> >>> This patch applies on top of tip/sched/core
> >>>
> >>>  kernel/sched/fair.c | 10 ++++++++++
> >>>  1 file changed, 10 insertions(+)
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index fcc968669aea..6d46974e9be7 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -4703,6 +4703,11 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
> >>>
> >>>               if (dequeue)
> >>>                       dequeue_entity(qcfs_rq, se, DEQUEUE_SLEEP);
> >>> +             else {
> >>> +                     update_load_avg(qcfs_rq, se, 0);
> >>> +                     se_update_runnable(se);
> >>> +             }
> >>> +
> >>>               qcfs_rq->h_nr_running -= task_delta;
> >>>               qcfs_rq->idle_h_nr_running -= idle_task_delta;
> >>>
> >>> @@ -4772,6 +4777,11 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> >>>               cfs_rq = cfs_rq_of(se);
> >>>               if (enqueue)
> >>>                       enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
> >>> +             else {
> >>> +                     update_load_avg(cfs_rq, se, 0);
> >>
> >>
> >>> +                     se_update_runnable(se);
> >>> +             }
> >>> +
> >>>               cfs_rq->h_nr_running += task_delta;
> >>>               cfs_rq->idle_h_nr_running += idle_task_delta;