[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <FA47D36D6EC9FE4CB463299737C09B9901B6D847@SHSMSX102.ccr.corp.intel.com>
Date: Tue, 12 Nov 2013 06:38:24 +0000
From: "Wang, Xiaoming" <xiaoming.wang@...el.com>
To: Paul Turner <pjt@...gle.com>
CC: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
"Liu, Chuansheng" <chuansheng.liu@...el.com>,
"Zhang, Dongxing" <dongxing.zhang@...el.com>
Subject: RE: [PATCH] [sched]: pick the NULL entity caused the panic.
> -----Original Message-----
> From: Paul Turner [mailto:pjt@...gle.com]
> Sent: Tuesday, November 12, 2013 11:10 AM
> To: Wang, Xiaoming
> Cc: Ingo Molnar; Peter Zijlstra; LKML; Liu, Chuansheng; Zhang, Dongxing
> Subject: Re: [PATCH] [sched]: pick the NULL entity caused the panic.
>
> On Tue, Nov 12, 2013 at 8:29 AM, Wang, Xiaoming
> <xiaoming.wang@...el.com> wrote:
> > cfs_rq get its group run queue but the value of
> > cfs_rq->nr_running maybe zero, which will cause
> > the panic in pick_next_task_fair.
> > So the evaluated of cfs_rq->nr_running is needed.
> >
> > Signed-off-by: xiaoming wang <xiaoming.wang@...el.com>
> > Signed-off-by: Zhang Dongxing <dongxing.zhang@...el.com>
> > ---
> > kernel/sched/fair.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 7c70201..2d440f9 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3708,7 +3708,7 @@ static struct task_struct
> *pick_next_task_fair(struct rq *rq)
> > se = pick_next_entity(cfs_rq);
> > set_next_entity(cfs_rq, se);
> > cfs_rq = group_cfs_rq(se);
> > - } while (cfs_rq);
> > + } while (cfs_rq && cfs_rq->nr_running);
> >
> > p = task_of(se);
> > if (hrtick_enabled(rq))
>
> This can only happen when something else has already corrupted the
> rb-tree. Breaking out here is going to cause you to instead try
> treating a group entity as a task, which will crash just as badly.
>
> Could you describe what was being run when this crash occurred?
>
> > --
> > 1.7.1
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
Dear Paul
How about moving cfs_rq->nr_running into loop. What I worried is that cfs_rq->nr_running
may zero because cfs_rq is coming from cfs_rq = group_cfs_rq(se) again. We haven't known the
reproduction exactly, panic happened only on random test and unstable.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2d440f9..7f2f8b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3701,14 +3701,13 @@ static struct task_struct *pick_next_task_fair(struct rq *rq)
struct cfs_rq *cfs_rq = &rq->cfs;
struct sched_entity *se;
- if (!cfs_rq->nr_running)
- return NULL;
-
do {
+ if (!cfs_rq->nr_running)
+ return NULL;
se = pick_next_entity(cfs_rq);
set_next_entity(cfs_rq, se);
cfs_rq = group_cfs_rq(se);
- } while (cfs_rq && cfs_rq->nr_running);
+ } while (cfs_rq);
p = task_of(se);
if (hrtick_enabled(rq))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists