linux-kernel - Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1261471720.4937.9.camel@laptop>
Date:	Tue, 22 Dec 2009 09:48:40 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Eric Paris <eparis@...hat.com>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, efault@....de
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking,
 WARNs and BUGs

On Mon, 2009-12-21 at 19:17 -0500, Eric Paris wrote:
> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
> I'm exploding in the scheduler.  I'm running (and building) kernel
> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5  There are three distinct
> signatures of problems.  Some boots I'll see all 3 of these failures
> sometimes only 1 or 2 of them.  That's the reason they are kinda split
> up in dmesg.
> 
> 1) gcc/3141 is trying to acquire lock:
>  (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
> 
> but task is already holding lock:
>  (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83

This is due to the pagefalut happening while holding the rq->lock, so
its an artefact of 3).

> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()

Worrying, but probably due to the same problem as 3)

> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
>       kernel/sched_fair.c

Right, hard to tell where exactly it goes bang, but could you please try
reverting the below patch.

What I suspect happens is that we his the task_cpu(p)==cpu case, we then
don't do __set_task_cpu()->set_task_rq(), which sets the group
scheduling pointers (you seem to have cgroup scheduling enabled).

If those pointers are wild all kinds of interesting bits can happen,
including 3) and possibly 2).

If this revert doesn't help, could you please also provide the output of
addr2line -e vmlinux <FAULT_IP> ?

---
commit 738d2be4301007f054541c5c4bf7fb6a361c9b3a
Author: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Date:   Wed Dec 16 18:04:42 2009 +0100

    sched: Simplify set_task_cpu()
    
    Rearrange code a bit now that its a simpler function.
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
    Cc: Mike Galbraith <efault@....de>
    LKML-Reference: <20091216170518.269101883@...llo.nl>
    Signed-off-by: Ingo Molnar <mingo@...e.hu>

diff --git a/kernel/sched.c b/kernel/sched.c
index f92ce63..8a2bfd3 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2034,11 +2034,8 @@ task_hot(struct task_struct *p, u64 now, struct sched_domain *sd)
 	return delta < (s64)sysctl_sched_migration_cost;
 }
 
-
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
-	int old_cpu = task_cpu(p);
-
 #ifdef CONFIG_SCHED_DEBUG
 	/*
 	 * We should never call set_task_cpu() on a blocked task,
@@ -2049,11 +2046,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 
 	trace_sched_migrate_task(p, new_cpu);
 
-	if (old_cpu != new_cpu) {
-		p->se.nr_migrations++;
-		perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS,
-				     1, 1, NULL, 0);
-	}
+	if (task_cpu(p) == new_cpu)
+		return;
+
+	p->se.nr_migrations++;
+	perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
 
 	__set_task_cpu(p, new_cpu);
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/