lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 22 Dec 2009 15:50:31 +0800
From:	Américo Wang <xiyou.wangcong@...il.com>
To:	Xiaotian Feng <xtfeng@...il.com>
Cc:	Eric Paris <eparis@...hat.com>, linux-kernel@...r.kernel.org,
	mingo@...e.hu, peterz@...radead.org, efault@....de
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, 
	WARNs and BUGs

On Tue, Dec 22, 2009 at 3:41 PM, Xiaotian Feng <xtfeng@...il.com> wrote:
> On Tue, Dec 22, 2009 at 3:19 PM, Américo Wang <xiyou.wangcong@...il.com> wrote:
>> [Fix top-posting]
>>
>> On Tue, Dec 22, 2009 at 1:42 PM, Xiaotian Feng <xtfeng@...il.com> wrote:
>>>
>>> On Tue, Dec 22, 2009 at 8:17 AM, Eric Paris <eparis@...hat.com> wrote:
>>>> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
>>>> I'm exploding in the scheduler.  I'm running (and building) kernel
>>>> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5  There are three distinct
>>>> signatures of problems.  Some boots I'll see all 3 of these failures
>>>> sometimes only 1 or 2 of them.  That's the reason they are kinda split
>>>> up in dmesg.
>>>>
>>>> 1) gcc/3141 is trying to acquire lock:
>>>>  (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
>>>>
>>>> but task is already holding lock:
>>>>  (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>>>>
>>>> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>>>>
>>>> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
>>>>      kernel/sched_fair.c
>>>>
>>>> Full backtraces are in the attached dmesg.
>>>>
>>> Does a revert of cd29fe6f2637cc2ccbda5ac65f5332d6bf5fa3c6 fix this problem?
>>
>>
>> I don't think so...
>>
>> I think the most suspicious commit here is ab19cb23. It kicked
>> "local_irq_save()"
>> out, which means if the task is selected to run on another cpu which doesn't
>> disable irq, we will have a page fault, thun we will try to hold mm->mmap_sem
>> while we are holding rq->lock already.
>
> The page fault is from kernel  NULL pointer deref.  You should connect
> the lockdep warning and kernel BUG together.
>

Interesting.

1) Doesn't this NULL ptr def expose that we have a potential problem?

2) For NULL ptr def problem, commit 3a7e73a2e2 seems more suspicious..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ