[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2375c9f90912212319v63dc692bg13b918fe6ea03299@mail.gmail.com>
Date: Tue, 22 Dec 2009 15:19:44 +0800
From: Américo Wang <xiyou.wangcong@...il.com>
To: Xiaotian Feng <xtfeng@...il.com>
Cc: Eric Paris <eparis@...hat.com>, linux-kernel@...r.kernel.org,
mingo@...e.hu, peterz@...radead.org, efault@....de
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking,
WARNs and BUGs
[Fix top-posting]
On Tue, Dec 22, 2009 at 1:42 PM, Xiaotian Feng <xtfeng@...il.com> wrote:
>
> On Tue, Dec 22, 2009 at 8:17 AM, Eric Paris <eparis@...hat.com> wrote:
>> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
>> I'm exploding in the scheduler. I'm running (and building) kernel
>> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5 There are three distinct
>> signatures of problems. Some boots I'll see all 3 of these failures
>> sometimes only 1 or 2 of them. That's the reason they are kinda split
>> up in dmesg.
>>
>> 1) gcc/3141 is trying to acquire lock:
>> (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
>>
>> but task is already holding lock:
>> (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>>
>> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>>
>> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
>> kernel/sched_fair.c
>>
>> Full backtraces are in the attached dmesg.
>>
> Does a revert of cd29fe6f2637cc2ccbda5ac65f5332d6bf5fa3c6 fix this problem?
I don't think so...
I think the most suspicious commit here is ab19cb23. It kicked
"local_irq_save()"
out, which means if the task is selected to run on another cpu which doesn't
disable irq, we will have a page fault, thun we will try to hold mm->mmap_sem
while we are holding rq->lock already.
Does the following untested patch fix the problem?
NOT-signed-off-by: WANG Cong <xiyou.wangcong@...il.com>
------
diff --git a/kernel/sched.c b/kernel/sched.c
index 87f1f47..221ab59 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2408,13 +2408,13 @@ static int try_to_wake_up(struct task_struct
*p, unsigned int state,
if (p->sched_class->task_waking)
p->sched_class->task_waking(rq, p);
- __task_rq_unlock(rq);
+ task_rq_unlock(rq);
cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
if (cpu != orig_cpu)
set_task_cpu(p, cpu);
- rq = __task_rq_lock(p);
+ rq = task_rq_lock(p);
update_rq_clock(rq);
WARN_ON(p->state != TASK_WAKING);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists