[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDkrf6YjrgQVgm52DAB9EWQe3q5BZZCvzm9g4k=BjCEUw@mail.gmail.com>
Date: Fri, 18 Jan 2019 11:16:38 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Sargun Dhillon <sargun@...gun.me>
Cc: LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Tejun Heo <tj@...nel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Gabriel Hartmann <ghartmann@...flix.com>,
gabriel.hartmann@...il.com
Subject: Re: Crash in list_add_leaf_cfs_rq due to bad tmp_alone_branch
On Wed, 9 Jan 2019 at 23:43, Sargun Dhillon <sargun@...gun.me> wrote:
>
> On Wed, Jan 9, 2019 at 2:14 PM Sargun Dhillon <sargun@...gun.me> wrote:
> >
> > I picked up c40f7d74c741a907cfaeb73a7697081881c497d0 sched/fair: Fix
> > infinite loop in update_blocked_averages() by reverting a9e7f6544b9c
> > and put it on top of 4.19.13. In addition to this, I uninlined
> > list_add_leaf_cfs_rq for debugging.
> >
> > This revealed a new bug that we didn't get to because we kept getting
> > crashes from the previous issue. When we are running with cgroups that
> > are rapidly changing, with CFS bandwidth control, and in addition
> > using the cpusets cgroup, we see this crash. Specifically, it seems to
> > occur with cgroups that are throttled and we change the allowed
> > cpuset.
Thanks for the context, I will try to reproduce the problem and
understand how we can stop in the middle of walking to the
sched_entity branch with a parent not already added
How many cgroup level have you got in you setup ?
> >
>
> This patch from Gabriel should fix the problem:
>
>
> [PATCH] sched/fair: Reset tmp_alone_branch on cfs_rq delete
>
> When a child cfs_rq is added to the leaf cfs_rq list before its parent
> tmp_alone_branch is set to point to the child in preparation for the
> parent being added.
>
> If the child is deleted before the parent is added then tmp_alone_branch
> points to a freed cfs_rq. Any future reference to tmp_alone_branch will
> result in a use after free.
So, the patch below is a temporary fix that helps to recover from the
situation where tmp_alone_branch doesn't finished back to
rq->leaf_cfs_rq_list
But this situation should not happened at the beginning
>
> Signed-off-by: Gabriel Hartmann <gabriel.hartmann@...il.com>
> Reported-by: Sargun Dhillon <sargun@...gun.me>
> ---
> kernel/sched/fair.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7137bc343b4a..0987629cbb76 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -347,6 +347,11 @@ static inline void list_add_leaf_cfs_rq(struct
> cfs_rq *cfs_rq)
> static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
> {
> if (cfs_rq->on_list) {
> + struct rq *rq = rq_of(cfs_rq);
> +
> + if (rq->tmp_alone_branch == &cfs_rq->leaf_cfs_rq_list)
> + rq->tmp_alone_branch = &rq->leaf_cfs_rq_list;
> +
> list_del_rcu(&cfs_rq->leaf_cfs_rq_list);
> cfs_rq->on_list = 0;
> }
Powered by blists - more mailing lists