lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190125173315.GC20411@dhcp22.suse.cz>
Date:   Fri, 25 Jan 2019 18:33:15 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     robert shteynfeld <robert.shteynfeld@...il.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Mikhail Zaslonko <zaslonko@...ux.ibm.com>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Gerald Schaefer <gerald.schaefer@...ibm.com>,
        Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Steven Sistare <steven.sistare@...cle.com>,
        Daniel Jordan <daniel.m.jordan@...cle.com>,
        Bob Picco <bob.picco@...cle.com>
Subject: Re: kernel panic due to
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2830bf6f05fb3e05bc4743274b806c821807a684

On Fri 25-01-19 17:39:38, Michal Hocko wrote:
> On Fri 25-01-19 11:16:30, robert shteynfeld wrote:
> > Attached is the dmesg from patched kernel.
> 
> Your Node1 physical memory range precedes Node0 which is quite unusual
> but it shouldn't be a huge problem on its own. But memory ranges are
> not aligned to the memory section
> 
> [    0.286954] Early memory node ranges
> [    0.286955]   node   1: [mem 0x0000000000001000-0x0000000000090fff]
> [    0.286955]   node   1: [mem 0x0000000000100000-0x00000000dbdf8fff]
> [    0.286956]   node   1: [mem 0x0000000100000000-0x0000001423ffffff]
> [    0.286956]   node   0: [mem 0x0000001424000000-0x0000002023ffffff]
> 
> As you can see the last pfn for the node1 is inside the section and
> Node0 starts right after. This is quite unusual as well. If for no other
> reasons then the memmap of those struct pages will be remote for one or
> the other. Actually I am not even sure we can handle that properly
> because we do expect 1:1 mapping between sections and nodes.
> 
> Now it also makes some sense why 2830bf6f05fb ("mm, memory_hotplug:
> initialize struct pages for the full memory section") made any
> difference. We simply write over a potentially initialized struct page
> and blow up on that. I strongly suspect that the commit just uncovered
> a pre-existing problem. Let me think what we can do about that.

Appart from force aligning node's start the only other option is to
revert 2830bf6f05fb and handling the underlying issue in the hotplug
code. I really wanted to prevent that because memory hotplug assumes
sections to be in a single node at way too many places. Maybe somebody
has a more clever idea though.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ