linux-kernel - Re: v2.6.26-rc9: kernel BUG at kernel/sched.c:5858!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <19f34abd0807100806y3df8a231w78626ad18a910bad@mail.gmail.com>
Date:	Thu, 10 Jul 2008 17:06:38 +0200
From:	"Vegard Nossum" <vegard.nossum@...il.com>
To:	"Dmitry Adamushko" <dmitry.adamushko@...il.com>
Cc:	Yanmin <yanmin_zhang@...ux.intel.com>,
	"Rusty Russell" <rusty@...tcorp.com.au>,
	"Ingo Molnar" <mingo@...e.hu>,
	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>,
	"Dhaval Giani" <dhaval@...ux.vnet.ibm.com>,
	"Gautham R Shenoy" <ego@...ibm.com>,
	"Heiko Carstens" <heiko.carstens@...ibm.com>, miaox@...fujitsu.com,
	"Lai Jiangshan" <laijs@...fujitsu.com>,
	"Avi Kivity" <avi@...ranet.com>, linux-kernel@...r.kernel.org
Subject: Re: v2.6.26-rc9: kernel BUG at kernel/sched.c:5858!

On Thu, Jul 10, 2008 at 4:16 PM, Vegard Nossum <vegard.nossum@...il.com> wrote:
>> Regarding new crashes. Do you get them
>>
>> (1) after a few cpu offline / onlines ?
>> (2) on a freshly booted system?
>> (3) (1) or (2) but only with Miao Xie's patch (should not be (2) then)
>> (4) something else?
>
> Without Miao Xie's patch, I regularly get a crash on the first cpu-up.
> So I am using it all the time. With this patch applied, the new
> crashes can happen from anywhere between 2 minutes to 20 while running
> a few different looping scripts simultaneously:
>
> 1. cpu up/down
> 2. grep -r . /sys
> 3. swapon/swapoff
> 4. cat /dev/cpu/*/msr

Inhibiting #1 kept the machine alive for at least 25 minutes. Then I
started it and it hung after 492 rounds of cpu up/down, with this new
report:

list_add corruption. next->prev should be prev (f782d090), but was
00000000. (next=f20b8438).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:27!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3860, comm: bash Not tainted (2.6.26-rc9-00059-gb190333 #5)
EIP: 0060:[<c0294b80>] EFLAGS: 00210086 CPU: 0
EIP is at __list_add+0x40/0x60
EAX: 00000061 EBX: f782d090 ECX: 00000002 EDX: 00000002
ESI: 00200282 EDI: c0a8de8c EBP: e7dd3e84 ESP: e7dd3e6c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process bash (pid: 3860, ti=e7dd2000 task=e7e0afd0 task.ti=e7dd2000)
Stack: c067fd30 f782d090 00000000 f20b8438 f20b81b0 00200282 e7dd3e8c c0294baa
       e7dd3e98 c019bf5e f782d070 e7dd3ec8 c019c4b1 c0a8deac 000000d0 e7de2a00
       c1da1ddc 00000005 00000000 f20b81b0 fff95000 fff98000 fff96000 e7dd3ed4
Call Trace:
 [<c0294baa>] ? list_add+0xa/0x10
 [<c019bf5e>] ? __mem_cgroup_add_list+0x3e/0x40
 [<c019c4b1>] ? mem_cgroup_charge_common+0x231/0x260
 [<c019c522>] ? mem_cgroup_charge+0x12/0x20
 [<c01843e7>] ? do_wp_page+0x117/0x550
 [<c01864c1>] ? handle_mm_fault+0x1b1/0x770
 [<c01866f1>] ? handle_mm_fault+0x3e1/0x770
 [<c014cc95>] ? down_read_trylock+0x55/0x60
 [<c0120d98>] ? do_page_fault+0x298/0x700
 [<c0584b26>] ? _spin_unlock_irq+0x36/0x60
 [<c01402db>] ? sigprocmask+0x7b/0xf0
 [<c0104df5>] ? restore_nocheck+0x12/0x15
 [<c0120b00>] ? do_page_fault+0x0/0x700
 [<c0584e6a>] ? error_code+0x72/0x78
 =======================
Code: 75 2d 89 08 89 41 04 89 02 89 50 04 83 c4 10 5b 5e 5d c3 89 4c
24 0c 89 54 24 08 89 5c 24 04 c7 04 24 30 fd 67 c0 e8 80 0c ea ff <0
f> 0b eb fe 89 5c 24 0c 89 74 24 08 89 4c 24 04 c7 04 24 80 fd
EIP: [<c0294b80>] __list_add+0x40/0x60 SS:ESP 0068:e7dd3e6c
---[ end trace 89a65901b268513f ]---

The list corruption now has a completely different backtrace, but they
both were 0 instead of some other (expected) value. This fits with the
theory that something is zeroed that shouldn't be.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/