[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080812085334.fa1b8f0f.randy.dunlap@oracle.com>
Date: Tue, 12 Aug 2008 08:53:34 -0700
From: Randy Dunlap <randy.dunlap@...cle.com>
To: Hugh Dickins <hugh@...itas.com>
Cc: lkml <linux-kernel@...r.kernel.org>
Subject: Re: 2.6.27-rc2-git5 BUG: unable to handle kernel paging request
On Tue, 12 Aug 2008 15:10:09 +0100 (BST) Hugh Dickins wrote:
> On Mon, 11 Aug 2008, Randy Dunlap wrote:
> > on x86_64, SMP, 8 GB RAM:
> >
> > BUG: unable to handle kernel paging request at ffffe20001d5ae00
> > IP: [<ffffffff8027c08f>] unmap_vmas+0x42d/0x7a0
> > PGD 28102067 PUD 28103067 PMD 0
> > Oops: 0000 [1] SMP
> > CPU 3
> > Modules linked in: lpfc(+) cciss ehci_hcd ohci_hcd uhci_hcd
> > Pid: 1382, comm: udevd Not tainted 2.6.27-rc2-git5 #1
> > RIP: 0010:[<ffffffff8027c08f>] [<ffffffff8027c08f>] unmap_vmas+0x42d/0x7a0
> > RSP: 0018:ffff88027dcffd68 EFLAGS: 00010246
> > RAX: 000000008631b98b RBX: ffffe20001d5ade8 RCX: ffff880183a27500
> > RDX: 0000000001d5ad00 RSI: 000000008631b98b RDI: ffff88027e549840
> > RBP: ffff88027dcffe38 R08: 000000017efc3402 R09: 000000ffffffffff
> > R10: ffff88017e84c9d8 R11: 0000000000000006 R12: 0000000000000020
> > R13: 00007fbedb007000 R14: ffff88027dd81038 R15: 00007fbedb10a000
> > FS: 00007fbedb777710(0000) GS:ffff88027f623c80(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: ffffe20001d5ae00 CR3: 000000027e12c000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process udevd (pid: 1382, threadinfo ffff88027dcfe000, task ffff88027d959bc0)
> > Stack: 0000000000000000 0000000000000000 ffff88027dcffe50 ffffffffffffffff
> > 0000000000000000 ffff88027e549840 ffff88027dcffe58 00000000003bc9f2
> > 0000000000000000 0000000108b985a8 00007fbedb10a000 ffff88027e12c7f8
> > Call Trace:
> > [<ffffffff8027ff11>] exit_mmap+0x75/0xed
> > [<ffffffff80232f3a>] mmput+0x42/0x98
> > [<ffffffff802369ff>] exit_mm+0xfd/0x108
> > [<ffffffff80237fce>] do_exit+0x272/0x84d
> > [<ffffffff8023861b>] do_group_exit+0x72/0xa2
> > [<ffffffff8023865d>] sys_exit_group+0x12/0x14
> > [<ffffffff8020beeb>] system_call_fastpath+0x16/0x1b
>
> No "Code:" line? Never mind, much more useful would be the
> "objdump -d vmlinux" extract for unmap_vmas() - please send me or
> the list that output if you still have or can reconstruct vmlinux.
I don't have vmlinux -- will see about reconstructing it.
Sorry about the missing Code: etc. lines.
(I blame the 2 blank lines after the Call Trace...)
Here they are:
Code: 48 85 c9 74 26 4c 89 e8 48 2b 41 08 48 8b 53 20 48 c1 e8 0c 48 03 81 88 00 00 00 48 39 d0 74 0b 48 c1 e2 0c 48 83 ca 40 49 89 16 <f6> 43 18 01 74 05 ff 4d a8 eb 25 41 89 f4 41 81 e4 ff 0f 00 00
RIP [<ffffffff8027c08f>] unmap_vmas+0x42d/0x7a0
RSP <ffff88027dcffd68>
CR2: ffffe20001d5ae00
> I'm pretty sure it's oopsing on line 755 of mm/memory.c, the PageAnon
> test in zap_pte_range(); but would like to confirm that and see if
> there's any more info to be gleaned from the registers above.
>
> It looks like a case of page table corruption. RAX and RSI appear to
> be holding pte 0x8631b98b, which has several bits wrong for a good pte;
> its pfn 0x8631b matches up with struct page pointer in RBX, and the
> faulting address to access page->mapping.
>
> The BIOS-e820 map from the start of dmesg would be useful confirmatory
> information too: that pfn isn't unreasonable itself, but you're using
> CONFIG_SPARSEMEM_VMEMMAP, so I presume it falls in one of the holes.
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000100 - 000000000009f400 (usable)
BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007fe50000 (usable)
BIOS-e820: 000000007fe50000 - 000000007fe58000 (ACPI data)
BIOS-e820: 000000007fe58000 - 0000000080000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fed00000 (reserved)
BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 000000027ffff000 (usable)
debug: ignoring loglevel setting.
last_pfn = 0x27ffff max_arch_pfn = 0x3ffffffff
last_pfn = 0x7fe50 max_arch_pfn = 0x3ffffffff
init_memory_mapping
0000000000 - 007fe00000 page 2M
007fe00000 - 007fe50000 page 4k
kernel direct mapping tables up to 7fe50000 @ 8000-c000
last_map_addr: 7fe50000 end: 7fe50000
init_memory_mapping
0100000000 - 027fe00000 page 2M
027fe00000 - 027ffff000 page 4k
kernel direct mapping tables up to 27ffff000 @ a000-16000
last_map_addr: 27ffff000 end: 27ffff000
RAMDISK: 7fd05000 - 7fe4f52d
> Have you been seeing other weirdness on this machine? It'd be great
> if you could try to reproduce this corruption or something like it,
> but not a lot we can tell from one instance. I wonder if it relates
> at all to [Bug 11237] corrupt PMD after resume - probably not but
> maybe - did you do a suspend/resume before getting this?
No, no suspend/resume done.
Full kernel log is attached.
---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/
View attachment "netcon-4821.log" of type "text/x-log" (79630 bytes)
Powered by blists - more mailing lists