linux-kernel - Re: recent -git: BUG in free_thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4887B3BA.2050602@qualcomm.com>
Date:	Wed, 23 Jul 2008 15:42:02 -0700
From:	Max Krasnyansky <maxk@...lcomm.com>
To:	Vegard Nossum <vegard.nossum@...il.com>
CC:	Suresh Siddha <suresh.b.siddha@...el.com>,
	LKML <linux-kernel@...r.kernel.org>,
	the arch/x86 maintainers <x86@...nel.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Dmitry Adamushko <dmitry.adamushko@...il.com>
Subject: Re: recent -git: BUG in free_thread_xstate

Vegard Nossum wrote:
> On Wed, Jul 23, 2008 at 10:31 PM, Suresh Siddha
> <suresh.b.siddha@...el.com> wrote:
>> On Wed, Jul 23, 2008 at 01:07:04PM -0700, Vegard Nossum wrote:
>>> Hi,
>>>
>>> I just got this on c010b2f76c3032e48097a6eef291d8593d5d79a6 (-git from
>>> yesterday):
>> Do you see this in 2.6.26 aswell? I suspect it is coming from post 2.6.26
>> changes.
> 
> Yep. Got this on 2.6.26 now:
> 
> BUG: unable to handle kernel paging request at 00664381
> IP: [<c010b884>] free_thread_xstate+0x4/0x30
> *pde = 00000000
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 3796, comm: bash Not tainted (2.6.26 #1)
> EIP: 0060:[<c010b884>] EFLAGS: 00210246 CPU: 0
> EIP is at free_thread_xstate+0x4/0x30
> EAX: 00664001 EBX: f3870000 ECX: 00000004 EDX: f4b544e8
> ESI: f4bdef28 EDI: c07feda0 EBP: f5325bd0 ESP: f5325bcc
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process bash (pid: 3796, ti=f5324000 task=f4b53fc0 task.ti=f5324000)
> Stack: f3870000 f5325bdc c010b8bd f4bddfa0 f5325be8 c0132b89 f4bddfa0 f5325bf4
>        c0133fd1 f4b77e00 f5325bfc c01368a7 f5325c14 c0172b8c 00200282 c0752b40
>        00000001 00000009 f5325c30 c0139cd3 c0803d00 c0803d00 c0803d00 00200046
> Call Trace:
>  [<c010b8bd>] ? free_thread_info+0xd/0x20
>  [<c0132b89>] ? free_task+0x19/0x30
>  [<c0133fd1>] ? __put_task_struct+0x51/0xa0
>  [<c01368a7>] ? delayed_put_task_struct+0x27/0x30
>  [<c0172b8c>] ? rcu_process_callbacks+0x6c/0xb0
>  [<c0139cd3>] ? __do_softirq+0x83/0x100
>  [<c0139df5>] ? do_softirq+0xa5/0xb0
>  [<c0139f95>] ? irq_exit+0x95/0xa0
>  [<c0107e4d>] ? do_IRQ+0x4d/0xa0
>  [<c01057b2>] ? common_interrupt+0x2e/0x34
>  [<c013549e>] ? vprintk+0x1be/0x420
>  [<c010aea5>] ? native_sched_clock+0xb5/0x110
>  [<c010aea5>] ? native_sched_clock+0xb5/0x110
>  [<c013571b>] ? printk+0x1b/0x20
>  [<c012cbec>] ? cpu_attach_domain+0x3ec/0x410
>  [<c010aea5>] ? native_sched_clock+0xb5/0x110
>  [<c01979e1>] ? check_bytes_and_report+0x21/0xc0
>  [<c0197d8f>] ? check_object+0xdf/0x1f0
>  [<c0126c17>] ? sd_free_ctl_entry+0x37/0x50
>  [<c0157895>] ? mark_held_locks+0x65/0x80
>  [<c0199055>] ? kfree+0xb5/0x120
>  [<c0157a24>] ? trace_hardirqs_on+0xd4/0x160
>  [<c0126c17>] ? sd_free_ctl_entry+0x37/0x50
>  [<c0126c17>] ? sd_free_ctl_entry+0x37/0x50
>  [<c0126c17>] ? sd_free_ctl_entry+0x37/0x50
>  [<c012cc3e>] ? detach_destroy_domains+0x2e/0x50
>  [<c012cc9b>] ? update_sched_domains+0x3b/0x50
>  [<c014d467>] ? notifier_call_chain+0x37/0x70
>  [<c014d4d9>] ? __raw_notifier_call_chain+0x19/0x20
>  [<c055c858>] ? _cpu_down+0x78/0x240
>  [<c015d92f>] ? cpu_maps_update_begin+0xf/0x20
>  [<c055ca4b>] ? cpu_down+0x2b/0x40
>  [<c055dc69>] ? store_online+0x39/0x80
>  [<c055dc30>] ? store_online+0x0/0x80
>  [<c02faf6b>] ? sysdev_store+0x2b/0x40
>  [<c01dcdd2>] ? sysfs_write_file+0xa2/0x100
>  [<c019eb76>] ? vfs_write+0x96/0x130
>  [<c01dcd30>] ? sysfs_write_file+0x0/0x100
>  [<c019f23d>] ? sys_write+0x3d/0x70
>  [<c0104cdb>] ? sysenter_past_esp+0x78/0xd1
>  =======================
> Code: 04 00 00 00 00 c7 04 24 00 00 04 00 e8 96 f8 08 00 a3 b4 a5 80
> c0 c9 c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 53 <8b>
> 90 80 03 00 00 89 c3 85 d2 74 14 a1 b4 a5 80 c0 e8 d6 e4 08
> EIP: [<c010b884>] free_thread_xstate+0x4/0x30 SS:ESP 0068:f5325bcc
> Kernel panic - not syncing: Fatal exception in interrupt
> 
> I'm not sure what to make of this. It looks related to the rebuilding
> of sched domains that we saw earlier. But this reproduces on both
> v2.6.26 and latest -git (though not with that backtrace).

Based on the trace above it seems that we panic even before calling into 
cpusets. (ie I do not see rebuild_sched_domains() in there). Which means 
it must be something different. The problem we had before was that 
cpusets where screwing up domain rebuild sequence during cpu hotplug 
handling.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/