linux-kernel - RE: [PATCH] [sched]: pick the NULL entity caused the panic.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <FA47D36D6EC9FE4CB463299737C09B9901B6D847@SHSMSX102.ccr.corp.intel.com>
Date:	Tue, 12 Nov 2013 06:38:24 +0000
From:	"Wang, Xiaoming" <xiaoming.wang@...el.com>
To:	Paul Turner <pjt@...gle.com>
CC:	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	"Liu, Chuansheng" <chuansheng.liu@...el.com>,
	"Zhang, Dongxing" <dongxing.zhang@...el.com>
Subject: RE: [PATCH] [sched]: pick the NULL entity caused the panic.



> -----Original Message-----
> From: Paul Turner [mailto:pjt@...gle.com]
> Sent: Tuesday, November 12, 2013 11:10 AM
> To: Wang, Xiaoming
> Cc: Ingo Molnar; Peter Zijlstra; LKML; Liu, Chuansheng; Zhang, Dongxing
> Subject: Re: [PATCH] [sched]: pick the NULL entity caused the panic.
> 
> On Tue, Nov 12, 2013 at 8:29 AM, Wang, Xiaoming
> <xiaoming.wang@...el.com> wrote:
> > cfs_rq get its group run queue but the value of
> > cfs_rq->nr_running maybe zero, which will cause
> > the panic in pick_next_task_fair.
> > So the evaluated of cfs_rq->nr_running is needed.
> >
> > Signed-off-by: xiaoming wang <xiaoming.wang@...el.com>
> > Signed-off-by: Zhang Dongxing <dongxing.zhang@...el.com>
> > ---
> >  kernel/sched/fair.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 7c70201..2d440f9 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3708,7 +3708,7 @@ static struct task_struct
> *pick_next_task_fair(struct rq *rq)
> >                 se = pick_next_entity(cfs_rq);
> >                 set_next_entity(cfs_rq, se);
> >                 cfs_rq = group_cfs_rq(se);
> > -       } while (cfs_rq);
> > +       } while (cfs_rq && cfs_rq->nr_running);
> >
> >         p = task_of(se);
> >         if (hrtick_enabled(rq))
> 
> This can only happen when something else has already corrupted the
> rb-tree.  Breaking out here is going to cause you to instead try
> treating a group entity as a task, which will crash just as badly.
> 
> Could you describe what was being run when this crash occurred?
> 
> > --
> > 1.7.1
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
Dear Paul
	How about moving cfs_rq->nr_running into loop. What I worried is that cfs_rq->nr_running
may zero because cfs_rq is coming from cfs_rq = group_cfs_rq(se) again. We haven't known the 
reproduction exactly, panic happened only on random test and unstable. 

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2d440f9..7f2f8b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3701,14 +3701,13 @@ static struct task_struct *pick_next_task_fair(struct rq *rq)
        struct cfs_rq *cfs_rq = &rq->cfs;
        struct sched_entity *se;

-       if (!cfs_rq->nr_running)
-               return NULL;
-
        do {
+               if (!cfs_rq->nr_running)
+                       return NULL;
                se = pick_next_entity(cfs_rq);
                set_next_entity(cfs_rq, se);
                cfs_rq = group_cfs_rq(se);
-       } while (cfs_rq && cfs_rq->nr_running);
+       } while (cfs_rq);

        p = task_of(se);
        if (hrtick_enabled(rq))

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/