[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <495E7FAD.6020305@shaw.ca>
Date: Fri, 02 Jan 2009 14:57:17 -0600
From: Robert Hancock <hancockr@...w.ca>
To: linux-kernel@...r.kernel.org
Cc: david@...g.hm, Andi Kleen <andi@...stfloor.org>
Subject: Re: early exception error
Cyrill Gorcunov wrote:
> [david@...g.hm - Fri, Jan 02, 2009 at 10:21:52AM -0800]
>> On Wed, 31 Dec 2008, david@...g.hm wrote:
>>
>>> On Thu, 1 Jan 2009, Andi Kleen wrote:
>>>
>>>> On Wed, Dec 31, 2008 at 12:59:08PM -0800, david@...g.hm wrote:
>>>>> On Wed, 31 Dec 2008, Andi Kleen wrote:
>>>>>
>>>>>>> on the picture you sent me i noticed the message
>>>>>>> "Your memory is not aligned you need to rebuild your
>>>>>>> kernel with bigger NODEMAP SIZE shift=20" and then
>>>>>>> srat code complains about "No NUMA code hash function found"
>>>>>>> which looks a bit scary. Btw, could you post this picture
>>>>>>> on some public resource so NUMA people could check it?
>>>>>> This case used to be handled cleanly (NUMA disabled), but perhaps
>>>>>> that has regressed. But still it sounds like something is going wrong,
>>>>>> unless his machine really has a very weird memory map.
>>>>> it shouldn't, it was one of the high-volume servers 4-5 years ago and only
>>>>> has 4G of ram in it
>>>> From looking at the screenshot Cyrill sent you seem to have a funny
>>>> SRAT with overlapping areas that is rejected in the end. I suspect the
>>>> fallback code doesn't handle this properly.
>>>>
>>>> Does it work when you boot with numa=noacpi ?
>>> it gets past the point where the bootmemory_debug messages flow by, but
>>> I get another oops (snapshot of the screen is at
>>> http://linux.lang.hm/linux/IMG00031.jpg )
>> oops, I misread your mail, IMG00031.jpg was with numa=off
>>
>> I just posted IMG00033.jpg which is with numa=noacpi and earlyprintk=vga
>> but not bootmem_debug
>>
>> David Lang
>>
>
> Thanks, David! Trying to understand what is going on :)
>
> Here is a new picture if someone would like to jump into
> the bug handling
>
> http://linux.lang.hm/linux/IMG00033.jpg
alloc_bootmem_core is a reasonably big function, it would be useful if
we could track down what line it's blowing up on.. Can you try to find
out what line that fault address (ffffffff8096452a in this crash) is on
as described in Documentation/BUG-HUNTING, i.e. build with
CONFIG_DEBUG_INFO enabled, run gdb on vmlinux and do:
l *0xffffffff8096452a
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists