[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADfvbxp6i0usg1XqjCAcTwVzfsF812ue2WcsjtSFZpr8Y2zNtw@mail.gmail.com>
Date: Fri, 25 Jan 2019 11:51:52 -0500
From: robert shteynfeld <robert.shteynfeld@...il.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Mikhail Zaslonko <zaslonko@...ux.ibm.com>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Gerald Schaefer <gerald.schaefer@...ibm.com>,
Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>,
Dave Hansen <dave.hansen@...el.com>,
Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Pavel Tatashin <pasha.tatashin@...cle.com>,
Steven Sistare <steven.sistare@...cle.com>,
Daniel Jordan <daniel.m.jordan@...cle.com>,
Bob Picco <bob.picco@...cle.com>
Subject: Re: kernel panic due to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2830bf6f05fb3e05bc4743274b806c821807a684
Could the unusual memory config be due to one empty DIMM slot on my
motherboard? I have 9 slots, but only 8 x 16G filled. The 6th slot
on the motherboard is empty -- which is a valid config according to
the manual.
On Fri, Jan 25, 2019 at 11:39 AM Michal Hocko <mhocko@...nel.org> wrote:
>
> On Fri 25-01-19 11:16:30, robert shteynfeld wrote:
> > Attached is the dmesg from patched kernel.
>
> Your Node1 physical memory range precedes Node0 which is quite unusual
> but it shouldn't be a huge problem on its own. But memory ranges are
> not aligned to the memory section
>
> [ 0.286954] Early memory node ranges
> [ 0.286955] node 1: [mem 0x0000000000001000-0x0000000000090fff]
> [ 0.286955] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff]
> [ 0.286956] node 1: [mem 0x0000000100000000-0x0000001423ffffff]
> [ 0.286956] node 0: [mem 0x0000001424000000-0x0000002023ffffff]
>
> As you can see the last pfn for the node1 is inside the section and
> Node0 starts right after. This is quite unusual as well. If for no other
> reasons then the memmap of those struct pages will be remote for one or
> the other. Actually I am not even sure we can handle that properly
> because we do expect 1:1 mapping between sections and nodes.
>
> Now it also makes some sense why 2830bf6f05fb ("mm, memory_hotplug:
> initialize struct pages for the full memory section") made any
> difference. We simply write over a potentially initialized struct page
> and blow up on that. I strongly suspect that the commit just uncovered
> a pre-existing problem. Let me think what we can do about that.
>
> > I'm not an expert at debugging the kernel, obviously. I tried setting
> > up a serial console before without much luck as part of this debugging
> > session.
>
> Ubuntu has a nice howto for netconsole configuration
> https://wiki.ubuntu.com/Kernel/Netconsole. It is quite important to get
> the actual failure.
> --
> Michal Hocko
> SUSE Labs
Powered by blists - more mailing lists