linux-kernel - Re: [patch 0/6] Cure kexec() vs. mwait_play

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZH4eNL4Bf7yPItee@google.com>
Date:   Mon, 5 Jun 2023 10:41:08 -0700
From:   Sean Christopherson <seanjc@...gle.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Ashok Raj <ashok.raj@...ux.intel.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Tony Luck <tony.luck@...el.com>,
        Arjan van de Veen <arjan@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Eric Biederman <ebiederm@...ssion.com>
Subject: Re: [patch 0/6] Cure kexec() vs. mwait_play_dead() troubles

On Sat, Jun 03, 2023, Thomas Gleixner wrote:
> Hi!
> 
> Ashok observed triple faults when executing kexec() on a kernel which has
> 'nosmt' on the kernel commandline and HT enabled in the BIOS.
> 
> 'nosmt' brings up the HT siblings to the point where they initiliazed the
> CPU and then rolls the bringup back which parks them in mwait_play_dead().
> The reason is that all CPUs should have CR4.MCE set. Otherwise a broadcast
> MCE will immediately shut down the machine.

...

> This is only half safe because HLT can resume execution due to NMI, SMI and
> MCE. Unfortunately there is no real safe mechanism to "park" a CPU reliably,

On Intel.  On AMD, enabling EFER.SVME and doing CLGI will block everything except
single-step #DB (lol) and RESET.  #MC handling is implementation-dependent and
*might* cause shutdown, but at least there's a chance it will work.  And presumably
modern CPUs do pend the #MC until GIF=1.

> but there is at least one which prevents the NMI and SMI cause: INIT.
> 
>   3) If the system uses the regular INIT/STARTUP sequence to wake up
>      secondary CPUS, then "park" all CPUs including the "offline" ones
>      by sending them INIT IPIs.
> 
> The INIT IPI brings the CPU into a wait for wakeup state which is not
> affected by NMI and SMI, but INIT also clears CR4.MCE, so the broadcast MCE
> problem comes back.
> 
> But that's not really any different from a CPU sitting in the HLT loop on
> the previous kernel. If a broadcast MCE arrives, HLT resumes execution and
> the CPU tries to handle the MCE on overwritten text, pagetables etc.
> 
> So parking them via INIT is not completely solving the problem, but it
> takes at least NMI and SMI out of the picture.

Don't most SMM handlers rendezvous all CPUs?  I.e. won't blocking SMIs indefinitely
potentially cause problems too?

Why not carve out a page that's hidden across kexec() to hold whatever code+data
is needed to safely execute a HLT loop indefinitely?  E.g. doesn't the original
kernel provide the e820 tables for the post-kexec() kernel?  To avoid OOM after
many kexec(), reserving a page could be done iff the current kernel wasn't itself
kexec()'d.