[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101008213429.GB7223@sgi.com>
Date: Fri, 8 Oct 2010 16:34:29 -0500
From: Russ Anderson <rja@....com>
To: linux-kernel <linux-kernel@...r.kernel.org>,
Yinghai Lu <yinghai@...nel.org>, tglx@...utronix.de,
"H. Peter Anvin" <h.peter.anvin@...el.com>
Cc: Russ Anderson <rja@....com>, Jack Steiner <steiner@....com>
Subject: [BUG] x86: bootmem broken on SGI UV
[BUG] x86: bootmem broken on SGI UV
Recent community kernels do not boot on SGI UV x86 hardware with
more than one socket. I suspect the problem is due to recent
bootmem/e820 changes.
What is happening is the e280 table defines a memory range.
BIOS-e820: 0000000100000000 - 0000001080000000 (usable)
The SRAT table shows that memory range is spread over two nodes.
SRAT: Node 0 PXM 0 100000000-800000000
SRAT: Node 1 PXM 1 800000000-1000000000
SRAT: Node 0 PXM 0 1000000000-1080000000
Previously, the kernel early_node_map[] would show three entries
with the proper node.
[ 0.000000] 0: 0x00100000 -> 0x00800000
[ 0.000000] 1: 0x00800000 -> 0x01000000
[ 0.000000] 0: 0x01000000 -> 0x01080000
The problem is recent community kernel early_node_map[] shows
only two entries with the node 0 entry overlapping the node 1
entry.
0: 0x00100000 -> 0x01080000
1: 0x00800000 -> 0x01000000
This results in the range 0x800000 -> 0x1000000 getting freed twice
(by free_all_memory_core_early()) resulting in nasty warnings.
Queued invalidation will be enabled to support x2apic and Intr-remapping.
BUG: Bad page state in process swapper pfn:800000
page:ffffea001c000000 count:0 mapcount:0 mapping:(null) index:0x0
page flags: 0x60000000080000(buddy)
Pid: 0, comm: swapper Not tainted 2.6.36-rc6-next-20101006-medusa #23
Call Trace:
[<ffffffff810a590f>] ? dump_page+0xc7/0xcc
[<ffffffff810a61d8>] bad_page+0xeb/0xfd
[<ffffffff810a6a27>] free_pages_prepare+0x68/0xa3
[<ffffffff810a7a39>] __free_pages_ok+0x1d/0xf6
[<ffffffff810a7cf8>] __free_pages+0x22/0x24
[<ffffffff816c7871>] __free_pages_bootmem+0x55/0x57
[<ffffffff816a7b48>] free_all_memory_core_early+0xeb/0x156
[<ffffffff816a07a7>] numa_free_all_bootmem+0x7f/0x8b
[<ffffffff81398dbc>] ? _etext+0x0/0x24
[<ffffffff8169f5ea>] mem_init+0x1e/0xec
[<ffffffff81689bea>] start_kernel+0x1c8/0x3ea
[<ffffffff81689140>] ? early_idt_handler+0x0/0x71
[<ffffffff816892af>] x86_64_start_reservations+0xb6/0xba
[<ffffffff81689401>] x86_64_start_kernel+0x14e/0x15d
Disabling lock debugging due to kernel taint
Shortly there after the kernel dies trying to get a page off the
freelist.
Calibrating delay loop (skipped), value calculated using timer frequency.. 5333.99 BogoMIPS (lpj=10667996)
pid_max: default: 32768 minimum: 301
general protection fault: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 0, comm: swapper Tainted: G B 2.6.36-rc6-next-20101006-medusa #23 /Stoutland Platform
RIP: 0010:[<ffffffff810a5557>] [<ffffffff810a5557>] __rmqueue+0x7c/0x311
RSP: 0000:ffffffff816019b8 EFLAGS: 00010002
RAX: dead000000200200 RBX: dead000000200200 RCX: ffffea0037fe4e28
RDX: dead000000100100 RSI: ffff880ffffda010 RDI: 0000000000000006
RBP: ffffffff816019f8 R08: 0000000000000001 R09: ffff880ffffda078
R10: 0000000000000000 R11: 0000000000000000 R12: dead000000100100
R13: ffffea0037fe4e00 R14: 0000000000000286 R15: ffff880ffffd9e00
FS: 0000000000000000(0000) GS:ffff880075c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001604000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81600000, task ffffffff8160c020)
Stack:
0000000000000000 ffff880ffffd9e00 0000000000000000 ffffea0037fe5bf0
<0> ffff880075c16158 0000000000000003 0000000000000286 ffff880ffffd9e00
<0> ffffffff81601b28 ffffffff810a81d4 ffff88107ffd8e00 ffffffff81600000
Call Trace:
[<ffffffff810a81d4>] get_page_from_freelist+0x2f7/0x787
[<ffffffff810a6bd7>] ? zone_watermark_ok+0x25/0xb8
[<ffffffff810a8a3c>] __alloc_pages_nodemask+0x160/0x6ac
[<ffffffff810a8a3c>] ? __alloc_pages_nodemask+0x160/0x6ac
[<ffffffff811d7f2b>] ? cpumask_next_and+0x2d/0x3e
[<ffffffff810d2aa2>] alloc_page_interleave+0x36/0x80
[<ffffffff810d2bcf>] alloc_pages_current+0x7c/0xd1
[<ffffffff8102f1f9>] __change_page_attr_set_clr+0x75f/0xc1f
[<ffffffff810c8635>] ? vm_unmap_aliases+0x179/0x188
[<ffffffff8102f830>] change_page_attr_set_clr+0x177/0x3ed
[<ffffffff8102fe40>] ? set_memory_nx+0x3b/0x3d
[<ffffffff8102fc2a>] set_memory_x+0x3b/0x3d
[<ffffffff8169931b>] efi_enter_virtual_mode+0x22a/0x269
[<ffffffff81689d99>] start_kernel+0x377/0x3ea
[<ffffffff81689140>] ? early_idt_handler+0x0/0x71
[<ffffffff816892af>] x86_64_start_reservations+0xb6/0xba
[<ffffffff81689401>] x86_64_start_kernel+0x14e/0x15d
Code: 0f 84 a9 00 00 00 48 8b 4c 0a 68 49 bc 00 01 10 00 00 00 ad de 48 bb 00 02 20 00 00 00 ad de 4c 8d 69 d8 49 8b 55 28 49 8b 45 30 <48> 89 42 08 48 89 10 4d 89 65 28 49 89 5d 30 0f ba 71 d8 13 b8
RIP [<ffffffff810a5557>] __rmqueue+0x7c/0x311
RSP <ffffffff816019b8>
---[ end trace 4eaa2a86a8e2da22 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
I have not tracked the exact change that caused the regression, but suspect
it involves the recent bootmem/e820 changes.
Attached is the full boot output.
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@....com
View attachment "output.2.6.36-rc6.fail" of type "text/plain" (105838 bytes)
Powered by blists - more mailing lists