[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071114152930.GA1690@elte.hu>
Date: Wed, 14 Nov 2007 16:29:30 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Oleg Nesterov <oleg@...sign.ru>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Grant Wilson <grant.wilson@....co.uk>,
Peter Zijlstra <peterz@...radead.org>,
"Rafael J. Wysocki" <rjw@...k.pl>,
Srivatsa Vaddagiri <vatsa@...ibm.com>,
linux-kernel@...r.kernel.org
Subject: Re: 2.6.24-rc1-gb4f5550 oops
* Oleg Nesterov <oleg@...sign.ru> wrote:
> > [18073.371126] Unable to handle kernel NULL pointer dereference at 0000000000000120 RIP:
> > [18073.371134] [<ffffffff8023572e>] check_preempt_wakeup+0x6e/0x110
> > [18073.371144] PGD 81f9067 PUD 81c8067 PMD 0
> > [18073.371151] Oops: 0000 [1] PREEMPT SMP
> > [18073.371157] CPU 2
> > [18073.371161] Modules linked in: vfat fat
> > [18073.371168] Pid: 4639, comm: kwin Not tainted 2.6.24-rc1 #1
> > [18073.371171] RIP: 0010:[<ffffffff8023572e>] [<ffffffff8023572e>] check_preempt_wakeup+0x6e/0x110
> > [18073.371177] RSP: 0018:ffff810008531a78 EFLAGS: 00010006
> > [18073.371179] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > [18073.371183] RDX: ffff810004441bf0 RSI: ffff81000801e860 RDI: ffff81000444ab80
> > [18073.371186] RBP: ffff810008531aa8 R08: 000000d0d47a4a90 R09: 0000000000000000
> > [18073.371188] R10: ffff810004441bf0 R11: 0000000000000001 R12: ffff810006520400
> > [18073.371190] R13: ffff81000801e860 R14: ffff81000a63a000 R15: ffff81000443d8e0
> > [18073.371193] FS: 00002b7d646a86f0(0000) GS:ffff810004c11780(0000) knlGS:0000000000000000
> > [18073.371196] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [18073.371199] CR2: 0000000000000120 CR3: 0000000008495000 CR4: 00000000000006e0
> > [18073.371202] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [18073.371211] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [18073.371214] Process kwin (pid: 4639, threadinfo ffff810008530000, task ffff81000840a860)
> > [18073.371216] Stack: ffff81000444ab80 0000000000000001 ffff81000801e860 ffff81000444ab80
> > [18073.371231] 0000000000000002 ffff81000443d8e0 ffff810008531b38 ffffffff8023061e
> > [18073.371238] 0000000000000000 ffff810004441b80 0000000000000002 0000000100000000
> > [18073.371245] Call Trace:
> > [18073.371250] [<ffffffff8023061e>] try_to_wake_up+0x2fe/0x3a0
>
> I suspect I see the bug in that area, but I am not sure it can explain
> this trace completely.
there's a fix pending from Dmitry - please see below. It took days for
Grant to trigger the crash so it needs some time to be confirmed but it
could explain the crash in theory.
Ingo
---------------------->
Subject: sched: fix __set_task_cpu() SMP race
From: Dmitry Adamushko <dmitry.adamushko@...il.com>
Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core
system, which crashes can only be explained via runqueue corruption.
there is a narrow SMP race in __set_task_cpu(): after ->cpu is set up to
a new value, task_rq_lock(p, ...) can be successfuly executed on another
CPU. We must ensure that updates of per-task data have been completed by
this moment.
this bug has been hiding in the Linux scheduler for an eternity (we
never had any explicit barrier for task->cpu in set_task_cpu() - so the
bug was introduced in 2.5.1), but only became visible via
set_task_cfs_rq() being accidentally put after the task->cpu update. It
also probably needs a sufficiently out-of-order CPU to trigger.
Reported-by: Grant Wilson <grant.wilson@....co.uk>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@...il.com>
Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
kernel/sched.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -217,15 +217,15 @@ static inline struct task_group *task_gr
}
/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
-static inline void set_task_cfs_rq(struct task_struct *p)
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu)
{
- p->se.cfs_rq = task_group(p)->cfs_rq[task_cpu(p)];
- p->se.parent = task_group(p)->se[task_cpu(p)];
+ p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
+ p->se.parent = task_group(p)->se[cpu];
}
#else
-static inline void set_task_cfs_rq(struct task_struct *p) { }
+static inline void set_task_cfs_rq(struct task_struct *p, unsigned int cpu) { }
#endif /* CONFIG_FAIR_GROUP_SCHED */
@@ -1023,10 +1023,16 @@ unsigned long weighted_cpuload(const int
static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
{
+ set_task_cfs_rq(p, cpu);
#ifdef CONFIG_SMP
+ /*
+ * After ->cpu is set up to a new value, task_rq_lock(p, ...) can be
+ * successfuly executed on another CPU. We must ensure that updates of
+ * per-task data have been completed by this moment.
+ */
+ smp_wmb();
task_thread_info(p)->cpu = cpu;
#endif
- set_task_cfs_rq(p);
}
#ifdef CONFIG_SMP
@@ -7111,7 +7117,7 @@ void sched_move_task(struct task_struct
tsk->sched_class->put_prev_task(rq, tsk);
}
- set_task_cfs_rq(tsk);
+ set_task_cfs_rq(tsk, task_cpu(tsk));
if (on_rq) {
if (unlikely(running))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists