linux-kernel - Re: [PATCH 2/2] x86/numa: instance all parsed numa node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1907090810490.1961@nanos.tec.linutronix.de>
Date:   Tue, 9 Jul 2019 08:12:54 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Pingfan Liu <kernelfans@...il.com>
cc:     x86@...nel.org, Michal Hocko <mhocko@...e.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Tony Luck <tony.luck@...el.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Oscar Salvador <osalvador@...e.de>,
        Pavel Tatashin <pavel.tatashin@...rosoft.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Stephen Rothwell <sfr@...b.auug.org.au>, Qian Cai <cai@....pw>,
        Barret Rhoden <brho@...gle.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] x86/numa: instance all parsed numa node

On Tue, 9 Jul 2019, Pingfan Liu wrote:
> On Mon, Jul 8, 2019 at 5:35 PM Thomas Gleixner <tglx@...utronix.de> wrote:
> > It can and it does.
> >
> > That's the whole point why we bring up all CPUs in the 'nosmt' case and
> > shut the siblings down again after setting CR4.MCE. Actually that's in fact
> > a 'let's hope no MCE hits before that happened' approach, but that's all we
> > can do.
> >
> > If we don't do that then the MCE broadcast can hit a CPU which has some
> > firmware initialized state. The result can be a full system lockup, triple
> > fault etc.
> >
> > So when the MCE hits a CPU which is still in the crashed kernel lala state,
> > then all hell breaks lose.
> Thank you for the comprehensive explain. With your guide, now, I have
> a full understanding of the issue.
> 
> But when I tried to add something to enable CR4.MCE in
> crash_nmi_callback(), I realized that it is undo-able in some case (if
> crashed, we will not ask an offline smt cpu to online), also it is
> needless. "kexec -l/-p" takes the advantage of the cpu state in the
> first kernel, where all logical cpu has CR4.MCE=1.
> 
> So kexec is exempt from this bug if the first kernel already do it.

No. If the MCE broadcast is handled by a CPU which is stuck in the old
kernel stop loop, then it will execute on the old kernel and eventually run
into the memory corruption which crashed the old one.

Thanks,

	tglx