[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAMjfnNHu_JdDSi1BSMoXKgGkm2+G_QLsUBDAhq1wFvZQ@mail.gmail.com>
Date: Fri, 28 Dec 2018 18:25:37 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Tejun Heo <tj@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Sargun Dhillon <sargun@...gun.me>,
Xie XiuQi <xiexiuqi@...wei.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, xiezhipeng1@...wei.com,
huawei.libin@...wei.com,
linux-kernel <linux-kernel@...r.kernel.org>,
Dmitry Adamushko <dmitry.adamushko@...il.com>,
Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH] sched: fix infinity loop in update_blocked_averages
On Fri, 28 Dec 2018 at 17:54, Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
> On Fri, Dec 28, 2018 at 10:30:07AM +0100, Vincent Guittot wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index d1907506318a..88b9118b5191 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -7698,7 +7698,8 @@ static void update_blocked_averages(int cpu)
> > > * There can be a lot of idle CPU cgroups. Don't let fully
> > > * decayed cfs_rqs linger on the list.
> > > */
> > > - if (cfs_rq_is_decayed(cfs_rq))
> > > + if (cfs_rq_is_decayed(cfs_rq) &&
> > > + rq->tmp_alone_branch == &rq->leaf_cfs_rq_list)
> > > list_del_leaf_cfs_rq(cfs_rq);
> >
> > This patch reduces the cases but I don't thinks it's enough because it
> > doesn't cover the case of unregister_fair_sched_group()
> > And we can still break the ordering of the cfs_rq
>
> So, if unregister_fair_sched_group() can corrupt list, the bug is
> there regardless of a9e7f6544b9ce, right?
I don't think so because without a9e7f6544b9ce, the insertion in the
list is done only once and we can't call unregister_fair_sched_group
while an enqueue is ongoing so tmp_alone_branch always point to
rq->leaf_cfs_rq_list.
a9e7f6544b9ce enables to have tmp_alone_branch not pointing to
rq->leaf_cfs_rq_list
>
> Is there a reason why we're building a dedicated list for avg
> propagation? AFAICS, it's just doing depth first walk, which can be
we have this list of sched group of the rq to update the load of each
cfs_rq and as a result the load of the task group. This is then used
to compute the share of a task group between CPUs.
This list must be ordered to correctly propagate the updates from leafs to root.
> done without extra space as long as each node has the parent pointer,
> which they do. Is the dedicated list an optimization?
It prevents to parse and walk all task group struct every time.
Instead, you just have to follow a linked list
Regards,
Vincent
>
> Thanks.
>
> --
> tejun
Powered by blists - more mailing lists